Skip to contents

Calculates the Theil T Index of a given accessibility distribution. Values range from 0 (when all individuals have exactly the same accessibility levels) to the natural log of n, in which n is the number of individuals in the accessibility dataset. If the individuals can be classified into mutually exclusive and completely exhaustive groups, the index can be decomposed into a between-groups inequaliy component and a within-groups component.

Usage

theil_t(
  accessibility_data,
  sociodemographic_data,
  opportunity,
  population,
  socioeconomic_groups = NULL,
  group_by = character(0)
)

Arguments

accessibility_data

A data frame. The accessibility levels whose inequality should be calculated. Must contain the columns id and any others specified in opportunity.

sociodemographic_data

A data frame. The distribution of sociodemographic characteristics of the population in the study area cells. Must contain the columns id and any others specified in population and socioeconomic_groups.

opportunity

A string. The name of the column in accessibility_data with the accessibility levels to be considerend when calculating inequality levels.

population

A string. The name of the column in sociodemographic_data with the number of people in each cell. Used to weigh accessibility levels when calculating inequality.

socioeconomic_groups

A string. The name of the column in sociodemographic_data whose values identify the socioeconomic groups that should be used to calculate the between- and within-groups inequality levels. If NULL (the default), between- and within-groups components are not calculated and only the total aggregate inequality is returned.

group_by

A character vector. When not character(0) (the default), indicates the accessibility_data columns that should be used to group the inequality estimates by. For example, if accessibility_data includes a scenario column that identifies distinct scenarios that each accessibility estimates refer to (e.g. before and after a transport policy intervention), passing "scenario" to this parameter results in inequality estimates grouped by scenario.

Value

If socioeconomic_groups is NULL, a data frame containing the total Theil T estimates for the study area. If not, a list containing three dataframes: one summarizing the total inequality and the between- and within-groups components, one listing the contribution of each group to the between-groups component and another listing the contribution of each group to the within-groups component.

See also

Other inequality: concentration_index(), gini_index(), palma_ratio()

Examples

if (FALSE) { # identical(tolower(Sys.getenv("NOT_CRAN")), "true")
data_dir <- system.file("extdata", package = "accessibility")
travel_matrix <- readRDS(file.path(data_dir, "travel_matrix.rds"))
land_use_data <- readRDS(file.path(data_dir, "land_use_data.rds"))

access <- cumulative_cutoff(
  travel_matrix,
  land_use_data,
  cutoff = 30,
  opportunity = "jobs",
  travel_cost = "travel_time"
)

ti <- theil_t(
  access,
  sociodemographic_data = land_use_data,
  opportunity = "jobs",
  population = "population"
)
ti

# to calculate inequality between and within income deciles, we pass
# "income_decile" to socioeconomic_groups.
# some cells, however, are classified as in the decile NA because their
# income per capita is NaN, as they don't have any population. we filter
# these cells from our accessibility data, otherwise the output would include
# NA values (note that subsetting the data like this doesn't affect the
# assumption that groups are completely exhaustive, because cells with NA
# income decile don't have any population)

na_decile_ids <- land_use_data[is.na(land_use_data$income_decile), ]$id
access <- access[! access$id %in% na_decile_ids, ]
sociodem_data <- land_use_data[! land_use_data$id %in% na_decile_ids, ]

ti <- theil_t(
  access,
  sociodemographic_data = sociodem_data,
  opportunity = "jobs",
  population = "population",
  socioeconomic_groups = "income_decile"
)
ti
}