Skip to contents

Geocodes Brazilian addresses based on CNEFE data. Addresses must be passed as a data frame in which each column describes one address field (street name, street number, neighborhood, etc). The input addresses are matched with CNEFE following 12 different case patterns. For more info, please see the Details section.

Usage

geocode_dani(
  addresses_table,
  address_fields = setup_address_fields(),
  n_cores = 1,
  progress = TRUE,
  cache = TRUE
)

Arguments

addresses_table

A data frame. The addresses to be geocoded. Each column must represent an address field.

address_fields

A character vector. The correspondence between each address field and the name of the column that describes it in the addresses_table. The setup_address_fields() function helps creating this vector and performs some checks on the input. Address fields passed as NULL are ignored and the function must receive at least one non-null field. If manually creating the vector, please note that the vector names should be the same names used in the setup_address_fields() parameters.

n_cores

A number. The number of cores to be used in parallel execution. Defaults to 1.

progress

A logical. Whether to display progress bars when downloading CNEFE data and when geocoding the addresses. Defaults to TRUE.

cache

A logical. Whether CNEFE data should be saved to/read from cache, reducing processing time in future calls. Defaults to TRUE. When FALSE, CNEFE data is downloaded to a temporary directory.

Value

Returns the data frame passed in addresses_table with the latitude (lat) and longitude (lon) of each matched address, as well as another column (match_type) indicating the match level with which the address was matched.

Details

The input addresses are deterministically matched with CNEFE following 12 different case patterns. The type of match found for each address in the input data is indicated by the match_type column in the output. In every match type, the function always calculates the average latitude and longitude of all addresses in CNEFE that match the input address. In the strictest case, the function finds a perfect match for all of the fields of a given address. Think for example of a building with several apartments that match the same street address. In such case, the coordinates of the apartments will differ very slightly, and geocodebr take the average of those coordinates. On the other hand, in the loosest case, in which only the state and the city are matched, geocodebr takes the city-wide average coordinates, which tends to favor more densely populated areas. The columns considered in each of the 12 different match types are described below:

  • Case 01: estado, município, logradouro, número, cep e bairro;

  • Case 02: estado, município, logradouro, número e cep;

  • Case 03: estado, município, logradouro, número e bairro;

  • Case 04: estado, município, logradouro e número;

  • Case 05: estado, município, logradouro, cep e bairro;

  • Case 06: estado, município, logradouro e cep;

  • Case 07: estado, município, logradouro e bairro;

  • Case 08: estado, município e logradouro;

  • Case 09: estado, município, cep e bairro;

  • Case 10: estado, município e cep;

  • Case 11: estado, município e bairro;

  • Case 12: estado, município.

Examples


data_path <- system.file("extdata/small_sample.csv", package = "geocodebr")
input_df <- read.csv(data_path)

fields <- setup_address_fields(
  logradouro = "nm_logradouro",
  numero = "Numero",
  cep = "Cep",
  bairro = "Bairro",
  municipio = "nm_municipio",
  estado = "nm_uf"
)

# df <- geocodebr:::geocode_dani(input_df, address_fields = fields, progress = FALSE)
# df