Skip to contents

Geocodes Brazilian addresses based on CNEFE data. Addresses must be passed as a data frame in which each column describes one address field (street name, street number, neighborhood, etc). The input addresses are matched with CNEFE following different precision levels For more info, please see the Details section. The output coordinates use the geodetic reference system "SIRGAS2000", CRS(4674).

Usage

geocode(
  addresses_table,
  address_fields = setup_address_fields(),
  n_cores = 1,
  progress = TRUE,
  keep_matched_address = FALSE,
  cache = TRUE
)

Arguments

addresses_table

A data frame. The addresses to be geocoded. Each column must represent an address field.

address_fields

A character vector. The correspondence between each address field and the name of the column that describes it in the addresses_table. The setup_address_fields() function helps creating this vector and performs some checks on the input. Address fields passed as NULL are ignored and the function must receive at least one non-null field. If manually creating the vector, please note that the vector names should be the same names used in the setup_address_fields() parameters.

n_cores

A number. The number of cores to be used in parallel execution. Defaults to 1.

progress

A logical. Whether to display progress bars when downloading CNEFE data and when geocoding the addresses. Defaults to TRUE.

keep_matched_address

Logical. Whethe the output should include a column indicating the matched address of reference. Defaults to FALSE.

cache

A logical. Whether CNEFE data should be saved to/read from cache, reducing processing time in future calls. Defaults to TRUE. When FALSE, CNEFE data is downloaded to a temporary directory.

Value

Returns the data frame passed in addresses_table with the latitude (lat) and longitude (lon) of each matched address, as well as two columns (precision and match_type) indicating the precision level with which the address was matched.

Details

Precision categories:

Precision

The results of geocodebr are classified into six broad precision categories:

  • "numero"

  • "numero_interpolado"

  • "rua"

  • "cep"

  • "bairro"

  • "municipio"

  • NA (not found)

Each precision level can be disaggregated into more refined match types.

Match Type

The column match_type provides more refined information on how exactly each input address was matched with CNEFE. In every category, geocodebr takes the average latitude and longitude of the addresses included in CNEFE that match the input address based on combinations of different fields. In the strictest case, for example, the function finds a deterministic match for all of the fields of a given address (estado, municipio, logradouro, numero, cep, localidade). Think for example of a building with several apartments that match the same street address and number. In such case, the coordinates of the apartments will differ very slightly, and geocodebr takes the average of those coordinates. In a less rigorous example, in which only the fields (estado, municipio, rua, bairro) are matched, geocodebr calculates the average coordinates of all the addresses in CNEFE along that street and which fall within the same neighborhood.

The complete list of precision levels and match type categories are:

  • Precision: "numero"

    • match_type:

      • en01: logradouro, numero, cep e bairro

      • en02: logradouro, numero e cep

      • en03: logradouro, numero e bairro

      • en04: logradouro e numero

      • pn01: logradouro, numero, cep e bairro

      • pn02: logradouro, numero e cep

      • pn03: logradouro, numero e bairro

      • pn04: logradouro e numero

  • Precision: "numero_interpolado"

    • match_type:

      • ei01: logradouro, numero, cep e bairro

      • ei02: logradouro, numero e cep

      • ei03: logradouro, numero e bairro

      • ei04: logradouro e numero

      • pi01: logradouro, numero, cep e bairro

      • pi02: logradouro, numero e cep

      • pi03: logradouro, numero e bairro

      • pi04: logradouro e numero

  • Precision: "rua" (when input number is missing 'S/N')

    • match_type:

      • er01: logradouro, cep e bairro

      • er02: logradouro e cep

      • er03: logradouro e bairro

      • er04: logradouro

      • pr01: logradouro, cep e bairro

      • pr02: logradouro e cep

      • pr03: logradouro e bairro

      • pr04: logradouro

  • Precision: "cep"

    • match_type:

      • ec01 municipio, cep, localidade

      • ec02 municipio, cep

  • Precision: "bairro"

    • match_type:

      • eb01 municipio, localidade

  • Precision: "municipio"

    • match_type:

      • em01 municipio

Note: Match types starting with 'p' use probabilistic matching of the logradouro field, while types starting with 'e' use deterministic matching only. Match types with probabilistic matching ARE NOT implemented in geocodebr yet.

Examples


data_path <- system.file("extdata/small_sample.csv", package = "geocodebr")
input_df <- read.csv(data_path)

fields <- geocodebr::setup_address_fields(
  logradouro = "nm_logradouro",
  numero = "Numero",
  cep = "Cep",
  bairro = "Bairro",
  municipio = "nm_municipio",
  estado = "nm_uf"
)

df <- geocodebr::geocode(
  addresses_table = input_df,
  address_fields = fields,
  progress = FALSE
  )

head(df)
#>   id            nm_logradouro Numero       Cep               Bairro
#> 1  1 RUA MARIA LUCIA PACIFICO     17 26042-730           SANTA RITA
#> 2  2      RUA LEOPOLDINA TOME     46 25030-050           CENTENARIO
#> 3  3          RUA DONA JUDITE      0 23915-700          CAPUTERA II
#> 4  4     RUA ALEXANDRE AMARAL      0 23098-120           SANTISSIMO
#> 5  5                AVENIDA E    300 23860-000         PRAIA GRANDE
#> 6  6      RUA PRINCESA ISABEL    263 69921-026 ESTACAO EXPERIMENTAL
#>      nm_municipio code_muni          nm_uf       lon        lat match_type
#> 1     NOVA IGUACU   3303500 RIO DE JANEIRO -43.47118 -22.695496       en01
#> 2 DUQUE DE CAXIAS   3301702 RIO DE JANEIRO -43.31134 -22.779173       en01
#> 3  ANGRA DOS REIS   3300100 RIO DE JANEIRO -44.20848 -22.978837       er01
#> 4  RIO DE JANEIRO   3304557 RIO DE JANEIRO -43.51150 -22.868992       er01
#> 5     MANGARATIBA   3302601 RIO DE JANEIRO -43.97214 -22.929864       en01
#> 6      RIO BRANCO   1200401           ACRE -67.83559  -9.963436       en01
#>   precision
#> 1    number
#> 2    number
#> 3    street
#> 4    street
#> 5    number
#> 6    number