Geocodes Brazilian addresses based on CNEFE data. Addresses must be passed as a data frame in which each column describes one address field (street name, street number, neighborhood, etc). The input addresses are matched with CNEFE following different precision levels For more info, please see the Details section. The output coordinates use the geodetic reference system "SIRGAS2000", CRS(4674).
Usage
geocode(
addresses_table,
address_fields = setup_address_fields(),
n_cores = 1,
progress = TRUE,
keep_matched_address = FALSE,
cache = TRUE
)
Arguments
- addresses_table
A data frame. The addresses to be geocoded. Each column must represent an address field.
- address_fields
A character vector. The correspondence between each address field and the name of the column that describes it in the
addresses_table
. Thesetup_address_fields()
function helps creating this vector and performs some checks on the input. Address fields passed asNULL
are ignored and the function must receive at least one non-null field. If manually creating the vector, please note that the vector names should be the same names used in thesetup_address_fields()
parameters.- n_cores
A number. The number of cores to be used in parallel execution. Defaults to 1.
- progress
A logical. Whether to display progress bars when downloading CNEFE data and when geocoding the addresses. Defaults to
TRUE
.- keep_matched_address
Logical. Whethe the output should include a column indicating the matched address of reference. Defaults to
FALSE
.- cache
A logical. Whether CNEFE data should be saved to/read from cache, reducing processing time in future calls. Defaults to
TRUE
. WhenFALSE
, CNEFE data is downloaded to a temporary directory.
Value
Returns the data frame passed in addresses_table
with the latitude
(lat
) and longitude (lon
) of each matched address, as well as two
columns (precision
and match_type
) indicating the precision level with
which the address was matched.
Precision
The results of geocodebr are classified into six broad precision
categories:
"numero"
"numero_interpolado"
"rua"
"cep"
"bairro"
"municipio"
NA
(not found)
Each precision level can be disaggregated into more refined match types.
Match Type
The column match_type
provides more refined information on how exactly each
input address was matched with CNEFE. In every category, geocodebr takes
the average latitude and longitude of the addresses included in CNEFE that
match the input address based on combinations of different fields. In the
strictest case, for example, the function finds a deterministic match for all
of the fields of a given address (estado, municipio, logradouro, numero, cep,
localidade). Think for example of a building with several apartments that
match the same street address and number. In such case, the coordinates of
the apartments will differ very slightly, and geocodebr takes the average
of those coordinates. In a less rigorous example, in which only the fields
(estado, municipio, rua, bairro) are matched, geocodebr calculates the
average coordinates of all the addresses in CNEFE along that street and which
fall within the same neighborhood.
The complete list of precision levels and match type categories are:
Precision: "numero"
match_type:
en01: logradouro, numero, cep e bairro
en02: logradouro, numero e cep
en03: logradouro, numero e bairro
en04: logradouro e numero
pn01: logradouro, numero, cep e bairro
pn02: logradouro, numero e cep
pn03: logradouro, numero e bairro
pn04: logradouro e numero
Precision: "numero_interpolado"
match_type:
ei01: logradouro, numero, cep e bairro
ei02: logradouro, numero e cep
ei03: logradouro, numero e bairro
ei04: logradouro e numero
pi01: logradouro, numero, cep e bairro
pi02: logradouro, numero e cep
pi03: logradouro, numero e bairro
pi04: logradouro e numero
Precision: "rua" (when input number is missing 'S/N')
match_type:
er01: logradouro, cep e bairro
er02: logradouro e cep
er03: logradouro e bairro
er04: logradouro
pr01: logradouro, cep e bairro
pr02: logradouro e cep
pr03: logradouro e bairro
pr04: logradouro
Precision: "cep"
match_type:
ec01 municipio, cep, localidade
ec02 municipio, cep
Precision: "bairro"
match_type:
eb01 municipio, localidade
Precision: "municipio"
match_type:
em01 municipio
Note: Match types starting with 'p' use probabilistic matching of the logradouro field, while types starting with 'e' use deterministic matching only. Match types with probabilistic matching ARE NOT implemented in geocodebr yet.
Examples
data_path <- system.file("extdata/small_sample.csv", package = "geocodebr")
input_df <- read.csv(data_path)
fields <- geocodebr::setup_address_fields(
logradouro = "nm_logradouro",
numero = "Numero",
cep = "Cep",
bairro = "Bairro",
municipio = "nm_municipio",
estado = "nm_uf"
)
df <- geocodebr::geocode(
addresses_table = input_df,
address_fields = fields,
progress = FALSE
)
head(df)
#> id nm_logradouro Numero Cep Bairro
#> 1 1 RUA MARIA LUCIA PACIFICO 17 26042-730 SANTA RITA
#> 2 2 RUA LEOPOLDINA TOME 46 25030-050 CENTENARIO
#> 3 3 RUA DONA JUDITE 0 23915-700 CAPUTERA II
#> 4 4 RUA ALEXANDRE AMARAL 0 23098-120 SANTISSIMO
#> 5 5 AVENIDA E 300 23860-000 PRAIA GRANDE
#> 6 6 RUA PRINCESA ISABEL 263 69921-026 ESTACAO EXPERIMENTAL
#> nm_municipio code_muni nm_uf lon lat match_type
#> 1 NOVA IGUACU 3303500 RIO DE JANEIRO -43.47118 -22.695496 en01
#> 2 DUQUE DE CAXIAS 3301702 RIO DE JANEIRO -43.31134 -22.779173 en01
#> 3 ANGRA DOS REIS 3300100 RIO DE JANEIRO -44.20848 -22.978837 er01
#> 4 RIO DE JANEIRO 3304557 RIO DE JANEIRO -43.51150 -22.868992 er01
#> 5 MANGARATIBA 3302601 RIO DE JANEIRO -43.97214 -22.929864 en01
#> 6 RIO BRANCO 1200401 ACRE -67.83559 -9.963436 en01
#> precision
#> 1 number
#> 2 number
#> 3 street
#> 4 street
#> 5 number
#> 6 number