**uci** is an `R`

package to measure the
centrality of cities and regions. The package implements the Urban
Centrality Index (UCI) originally proposed by Pereira et al. (2013). UCI measures the extent
to which the spatial organization of a city varies from extreme
monocentric to extreme polycentric in a continuous scale, rather than
considering a binary classification (*either* monocentric
*or* polycentric). UCI values range from 0 to 1. Values closer to
0 indicate more polycentric patterns and values closer to 1 indicate a
more monocentric urban form.

## Installation

```
# from CRAN
install.packages('uci')
# development version from GitHub
remotes::install_github("ipeaGIT/uci")
```

## Demonstration on sample data

First, let’s load a few libraries we’ll be using in this vignette.

### Data input

The **uci** package comes with a sample data for
demonstration and test purposes. The data is a small sample of the
spatial distribution of the population, jobs and schools around the city
center of Belo Horizonte, Brazil. This sample data set is a good
illustration of the type of data input required by
**uci**.

Here is how the sample data looks like:

```
data_dir <- system.file("extdata", package = "uci")
grid <- readRDS(file.path(data_dir, "grid_bho.rds"))
head(grid)
#> Simple feature collection with 6 features and 4 fields
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: -43.96438 ymin: -19.97414 xmax: -43.93284 ymax: -19.96717
#> Geodetic CRS: WGS 84
#> id population jobs schools geometry
#> 1 89a881a5a2bffff 439 180 0 POLYGON ((-43.9431 -19.9741...
#> 2 89a881a5a2fffff 266 134 0 POLYGON ((-43.94612 -19.972...
#> 3 89a881a5a67ffff 1069 143 0 POLYGON ((-43.94001 -19.972...
#> 4 89a881a5a6bffff 245 61 0 POLYGON ((-43.9339 -19.9728...
#> 5 89a881a5a6fffff 298 11 0 POLYGON ((-43.93691 -19.971...
#> 6 89a881a5b03ffff 555 1071 0 POLYGON ((-43.96136 -19.970...
```

The data is an object of class `"sf" "data.frame"`

with
spatial polygons covering our study area and a few columns indicating
the number of activities (e.g. jobs, schools, population) in each
polygon. Our particular sample data is based on a spatial hexagonal grid
(H3 index). While there are advantages of using regular spatial grids to
calculate spatial statistics, **uci** also works with
non-regular geometries, such as census tracts, enumeration areas or
municipalities.

We can visualize the spatial distribution of jobs using
`ggplot2`

:

```
library(ggplot2)
ggplot(data = grid) +
geom_sf(aes(fill = jobs), color = NA) +
scale_fill_viridis_c() +
theme_void()
```

### Calculating UCI

In the example below, we calculate how mono/polycentric our study area is considering its spatial distribution of jobs.

Mind you that, by default, the `uci()`

functions uses
Euclidean distances between polygons. * The `uci`

function
now has a new parameter `dist_type`

that allows users to
choose whether calculations should be based on `"euclidean"`

distances (Default) or `"spatial_link"`

distances. Spatial
link distances consider Euclidean distances along the links of spatial
neighbour links. In the case of areas with a concave shape (like a bay),
it is strongly recommended to use `"spatial_link"`

distances
(even though they are computationally more costly) because simple
Euclidean distances can bias UCI estimates in those cases.

```
df <- uci(
sf_object = grid,
var_name = 'jobs',
dist_type = 'euclidean'
)
head(df)
#> UCI location_coef spatial_separation spatial_separation_max
#> 1 0.2538635 0.5278007 3880.114 7475.899
```

By default, the `uci()`

functions uses Euclidean distances
between polygons. However, Euclidean distances can lead to misleading
results in the case of areas with a concave shape (like a bay). In these
cases, it is strongly recommended to use
`dist_type = 'spatial_link'`

distances, even though they are
a bit computationally more costly.

Mind you that when using `"spatial_link"`

distances, there
should be no disconnected island of polygon (or group of polygons). In
some cases, users might need to manually edit their spatial data to
include a polygon representing a road or bridge connecting disconnected
parts of the area.

```
df <- uci(
sf_object = grid,
var_name = 'jobs',
dist_type = 'spatial_link'
)
#> The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
#> which was just loaded, will retire in October 2023.
#> Please refer to R-spatial evolution reports for details, especially
#> https://r-spatial.org/r/2023/05/15/evolution4.html.
#> It may be desirable to make the sf package available;
#> package maintainers should consider adding sf to Suggests:.
#> The sp package is now running under evolution status 2
#> (status 2 uses the sf package in place of rgdal)
head(df)
#> UCI location_coef spatial_separation spatial_separation_max
#> 1 0.2535891 0.5278007 4281.819 8241.617
```

### Formal definition of UCI

The **Urban Centrality Index (UCI)** is given by the
product of the Location Coefficient (LC) and the Proximity Index (PI).
With this formulation, UCI calculates urban centrality simultaneously
accounting for both the *concentration* of activities and the
*proximity* between activities.

\[
\begin{aligned}
UCI = LC * PI
\end{aligned}
\] The **Location Coefficient (LC)** was introduced
by Florence (1948) to measure the unequal
distribution factor of jobs within an urban area. It works as *a
nonspatial inequality measure* (similarly to Gini, for example). It
reflects, for example, the extent to which the number of activities are
concentrated in few polygons. The range of the LC is `0`

to
`(1 - 1/n)`

. If LC equals zero, then economic activity is
evenly distributed, while values close to `(1 - 1/n)`

indicate that employment is concentrated in a few areas.

\[ \begin{aligned} LC = \frac{1}{2} \sum_{1}^{n}∣S_{i} − \frac{1}{n}|\\ S_{i} = \frac{E_{i}}{E} \end{aligned} \]

where `Si`

is the share of employment in area i (Ei)
relative to the total employment (E) of the city; E is the total number
of jobs in a city; and n = number of areas.

Meanwhile, the **Proximity Index (PI)** is a
transformation that Pereira et al. (2013)
proposed for the Venables index, originally developed by Midelfart-Knarvik et al. (2002) to examine
changes in the spatial distribution of economic activity. The Venables
index (*aka* spatial separation index) is defined as:

\[
\begin{aligned}
V = S' * D * S
\end{aligned}
\] where `S`

is a column vector of `Si`

;
and `D`

is a distance matrix whose entry `dij`

is
the distance between the centroids of areas i and j.

When all employment activity is concentrated in just one spatial
unit, the minimum value of `V`

is reached; that is, zero (no
matter where this spatial unit is located). However, the index has no
*maximum* value and therefore cannot be compared across different
spatial settings. To overcome this limitation, it is necessary to
calculate the maximum attainable value of `V`

.

The **Proximity Index (PI)** solves the normalization
issue with `V`

and changes its interpretation to suit our
needs. The interpretation of PI is the opposite of `V`

, with
its theoretical range being `(0, 1)`

. Values of PI closer to
`1`

mean that employment is clustered in one single center.
If PI is `0`

, employment is as spatially separated as
possible. In other words, activities are distributed in a way that
maximizes the distances between them.

\[ \begin{aligned} P = 1 - \frac{V}{V_{max}} \end{aligned} \]

The normalization by `Vmax`

is what makes the comparison
of urban areas of different shapes and sizes possible. However, the
estimation of `Vmax`

is not trivial, because it has no
closed-form solution. In a very simple square grid, `Vmax`

is
obtained when each corner has one-fourth of the total employment. In a
region forming a perfect circle, the maximum value of V occurs when all
employment is evenly distributed along the external edge.

In the original paper, Pereira et al.
(2013) proposed a simple heuristic approach to calculate
`Vmax`

(the “opposite of maximum proximity”) as a homogeneous
distribution of values along the edge of the study area. This is the
default strategy used in the `uci()`

function with the
default parameter `bootstrap_border = FALSE`

. Nonetheless,
the users can also use a bootstrap simulation approach to estimate
`Vmax`

by setting `bootstrap_border = TRUE`

.

#### Bootstraping `Vmax`

Using bootstrap to find max Venables spatial separation. Users can
set `parallel = TRUE`

to speed uo computation using parallel
processing.

```
df_bootstrap <- uci(
sf_object = grid,
var_name = 'jobs',
bootstrap_border = TRUE,
showProgress = FALSE
)
head(df_bootstrap)
#> UCI location_coef spatial_separation spatial_separation_max
#> 1 0.2561533 0.5278007 3880.114 7538.916
```

This bootstrap approach simulates 20000 random distributions of activities along the border. It practice, it works interacting two simulations: (1) it simulates that all jobs are concentrated in up to 2, 3, 4, 5 … 51 polygons along the border, and (2) for each number of selected polygons, it shuffles 400 random positions of those polygons along the border.

The bootstrap approach is more computationally expensive but it
returns `Vmax`

values that can be between 5 and 25% higher
than the heuristic approach. Although the bootstrap simulation might
still not return the maximum theoretical value of spatial separation, it
is probably very close to it

## References

*Investment, Location and Size of Plant: A Realistic Enquiry into the Structure of British and American Industries*. Vol. 8. CUP Archive.

*Revue Économique*53 (3): 469–81. https://doi.org/10.2307/3502978.

*Geographical Analysis*45 (1): 77–89. https://doi.org/10.1111/gean.12002.