Geocodes Brazilian addresses based on CNEFE data. Addresses must be passed as a data frame in which each column describes one address field (street name, street number, neighborhood, etc). The input addresses are matched with CNEFE following 12 different case patterns. For more info, please see the Details section.
Usage
geocode_dani(
addresses_table,
address_fields = setup_address_fields(),
n_cores = 1,
progress = TRUE,
cache = TRUE
)
Arguments
- addresses_table
A data frame. The addresses to be geocoded. Each column must represent an address field.
- address_fields
A character vector. The correspondence between each address field and the name of the column that describes it in the
addresses_table
. Thesetup_address_fields()
function helps creating this vector and performs some checks on the input. Address fields passed asNULL
are ignored and the function must receive at least one non-null field. If manually creating the vector, please note that the vector names should be the same names used in thesetup_address_fields()
parameters.- n_cores
A number. The number of cores to be used in parallel execution. Defaults to 1.
- progress
A logical. Whether to display progress bars when downloading CNEFE data and when geocoding the addresses. Defaults to
TRUE
.- cache
A logical. Whether CNEFE data should be saved to/read from cache, reducing processing time in future calls. Defaults to
TRUE
. WhenFALSE
, CNEFE data is downloaded to a temporary directory.
Value
Returns the data frame passed in addresses_table
with the latitude
(lat
) and longitude (lon
) of each matched address, as well as another
column (match_type
) indicating the match level with which the address was
matched.
Details
The input addresses are deterministically matched with CNEFE following 12
different case patterns. The type of match found for each address in the
input data is indicated by the match_type
column in the output. In every
match type, the function always calculates the average latitude and longitude
of all addresses in CNEFE that match the input address. In the strictest case,
the function finds a perfect match for all of the fields of a given address.
Think for example of a building with several apartments that match the same
street address. In such case, the coordinates of the apartments will differ
very slightly, and geocodebr take the average of those coordinates. On the
other hand, in the loosest case, in which only the state and the city are
matched, geocodebr takes the city-wide average coordinates, which tends to
favor more densely populated areas. The columns considered in each of the 12
different match types are described below:
Case 01: estado, município, logradouro, número, cep e bairro;
Case 02: estado, município, logradouro, número e cep;
Case 03: estado, município, logradouro, número e bairro;
Case 04: estado, município, logradouro e número;
Case 05: estado, município, logradouro, cep e bairro;
Case 06: estado, município, logradouro e cep;
Case 07: estado, município, logradouro e bairro;
Case 08: estado, município e logradouro;
Case 09: estado, município, cep e bairro;
Case 10: estado, município e cep;
Case 11: estado, município e bairro;
Case 12: estado, município.
Examples
data_path <- system.file("extdata/small_sample.csv", package = "geocodebr")
input_df <- read.csv(data_path)
fields <- setup_address_fields(
logradouro = "nm_logradouro",
numero = "Numero",
cep = "Cep",
bairro = "Bairro",
municipio = "nm_municipio",
estado = "nm_uf"
)
# df <- geocodebr:::geocode_dani(input_df, address_fields = fields, progress = FALSE)
# df