Introduction

Dear Washington, D.C. City Council Representative:

Transit Oriented Development has been influencing everyone in cities and regions. It encourages high density development, pedestrian-friendly urban systems, mixed-use developments. As an urban planning tool, it tries to maximize the mixed use of residential, recreational, commercial, and public space within walking distance of public transport, which can further lead to more sustainable urban growth.

In this brief analysis of TOD in Washington DC, we put together some socio-economic data to illustrate the potential influences of TOD across space and time (from 2009 to 2019). Those criteria include Median Household Income, Median Rent, Educational Level, and Crime Rates. By comparing the changes in each category across spatial and temporal axes, we want to address more conservation regarding TOD in DC in the future.

library(tidycensus)
library(tidyverse)
library(ggplot2)
library(sf)
library(kableExtra)

options(scipen=999)
options(tigris_class = "sf")

source("https://raw.githubusercontent.com/urbanSpatial/Public-Policy-Analytics-Landing/master/functions.r")

palette5 <- c("#f0f9e8","#bae4bc","#7bccc4","#43a2ca","#0868ac")
newqBr <- function(df, variable, rnd) {
  if (missing(rnd)) {
    as.character(quantile(round(df[[variable]],4),
                          c(.01,.2,.4,.6,.8), na.rm=T))
  } else if (rnd == FALSE | rnd == F) {
    as.character(formatC(quantile(df[[variable]],
                                  c(.01,.2,.4,.6,.8), na.rm=T),
                         digits = 3))
  }
}

Data Preparation and Processing: Census Data, Crime data, Transit Data

Retrieving and Wrangling 2009 & 2019 Census Data

tracts19 <-  
  get_acs(geography = "tract",
          variables = c("B25026_001E","B02001_002E",
                        "B15001_050E","B15001_009E",
                        "B19013_001E", "B25058_001E",
                        "B06012_002E"), 
          year=2019, state=11,
          geometry=TRUE) %>% 
  st_transform('ESRI:102728')
totalPop19 <-
  tracts19 %>%
  filter(variable == "B25026_001")
tracts19 <- 
  tracts19 %>%
  dplyr::select( -NAME, -moe) %>%
  spread(key = variable, value = estimate) %>%
  rename(TotalPop = B25026_001, 
         Whites = B02001_002,
         FemaleBachelors = B15001_050, 
         MaleBachelors = B15001_009,
         MedHHInc = B19013_001, 
         MedRent = B25058_001,
         TotalPoverty = B06012_002)

tracts19 <- 
  tracts19 %>%
  mutate(pctWhite = ifelse(TotalPop > 0, Whites / TotalPop, 0),
         pctBachelors = ifelse(TotalPop > 0, ((FemaleBachelors + MaleBachelors) / TotalPop), 0),
         pctPoverty = ifelse(TotalPop > 0, TotalPoverty / TotalPop, 0),
         year = "2019") %>%
  dplyr::select(-Whites,-FemaleBachelors,-MaleBachelors,-TotalPoverty)
tracts09 <- 
  get_acs(geography = "tract", 
          variables = c("B25026_001E","B02001_002E",
                        "B15001_050E","B15001_009E",
                        "B19013_001E","B25058_001E",
                        "B06012_002E"), 
          year=2009, state=11, 
          geometry=TRUE, output="wide") %>%
  st_transform('ESRI:102728') %>%
  rename(TotalPop = B25026_001E, 
         Whites = B02001_002E,
         FemaleBachelors = B15001_050E, 
         MaleBachelors = B15001_009E,
         MedHHInc = B19013_001E, 
         MedRent = B25058_001E,
         TotalPoverty = B06012_002E) %>%
  dplyr::select(-NAME, -starts_with("B")) %>%
  mutate(pctWhite = ifelse(TotalPop > 0, Whites / TotalPop,0),
         pctBachelors = ifelse(TotalPop > 0, ((FemaleBachelors + MaleBachelors) / TotalPop),0),
         pctPoverty = ifelse(TotalPop > 0, TotalPoverty / TotalPop, 0),
         year = "2009") %>%
  dplyr::select(-Whites, -FemaleBachelors, -MaleBachelors, -TotalPoverty) 
allTracts <- rbind(tracts19,tracts09)

Retrieving and Wrangling 2009 & 2019 Crime Data

crime_counts_by_tract_19 <- allTracts %>%
  st_intersection(crime19) %>%
  group_by(GEOID) %>%
  summarise(crime_counts=n()) %>%
  st_drop_geometry() %>%
  mutate(year="2019")


crime_counts_by_tract_09 <- allTracts %>%
  st_intersection(crime09) %>%
  group_by(GEOID) %>%
  summarise(crime_counts=n()) %>%
  st_drop_geometry() %>%
  mutate(year="2009")
crime_counts_all <- rbind(crime_counts_by_tract_09, crime_counts_by_tract_19)

allTracts <- left_join(allTracts, crime_counts_all, by=c("GEOID"="GEOID", "year"="year"))

Retrieving Transit Data

Map Subway Lines and Stations in DC

The map shows the metro lines with stations within Washington DC. There are 6 metro lines which connect all directions within the district.

dc_line$group <- factor(dc_line$NAME)

ggplot() +
  geom_sf(data = st_union(tracts19),fill="grey90") +
  geom_sf(data = dc_station,
          show.legend = TRUE,size=1.5) +
  geom_sf(data = dc_line,
          aes(color = group),
          show.legend = TRUE,linewidth=0.7)+
  labs(title = 'Station Stops',
       subtitle = 'Washington DC')+
  scale_color_manual(values = c("blue","green","orange","red","#C0C0C0","yellow"),name="Metro Line")

Identifing TOD & Non-TOD Census Tracts

station_buffer<- rbind(
  st_buffer(dc_station,2640) %>%
    mutate(Legend = 'buffer')%>%
    dplyr::select(Legend),
  st_union(st_buffer(dc_station,2640))%>%
    st_sf()%>%
    mutate(Legend = 'Unioned Buffer')
)

Defining TOD: 0.5 miles from Subway Stations

We define the TOD area with a radius of 0.5 miles from the metro station, which is a decent walking distance for most residents.

ggplot() +
  geom_sf(data = station_buffer) +
  geom_sf(data = dc_station,
          show.legend = 'Point',size = 2) +
  facet_wrap(~Legend) + 
  mapTheme()

buffer <- filter(station_buffer, Legend=='Unioned Buffer')
clip <- st_intersection(buffer,tracts19) %>%
  dplyr::select(TotalPop)%>%
  mutate(inter_type = 'Clip')

Three Possible Ways to Show TOD

There are different visualizations to show the TOD area with the station buffers, which are “centroids”, “Clip”, and “Spatial Selection”.

Centroid-based: It selects the census tracts whose centroids lie within the buffer. It’ll ignore those tracts that are very close to transit but their centroids may be outside of the buffer.

Cliped: It selects the part of a census tract that only lies within the buffer area. It’ll include exactly the buffer areas but will be difficult to calculate accurate data based on census tracts, especially due to the fact that people might not be evenly distributed within a tract.

Spatial Selection: It will select any census tract that intersecting with the buffer.But the area might be too inclusive and even those tracts that have only a small portion within the buffer.

selection <- tracts19[buffer,]%>%
  select(TotalPop)%>%
  mutate(inter_type = 'Spatial Selection')
select_centroid <- st_centroid(tracts19)[buffer,] %>%
  st_drop_geometry() %>%
  left_join(., dplyr::select(tracts19, GEOID), by = "GEOID") %>%
  st_sf() %>%
  dplyr::select(TotalPop) %>%
  mutate(inter_type = "Centroids")
intersections <- rbind(clip, selection, select_centroid)

ggplot() +
  geom_sf(data=intersections, aes(fill = TotalPop)) +
  geom_sf(data=dc_station, show.legend = "point") +
  scale_fill_viridis_c() +
  facet_wrap(~inter_type) + 
  mapTheme()

In terms of the method, we choose the centroid-based approach, which selects census tracts whose centroids lie within the buffer.

Pros: This method will be easier to analyze data based on tracts while not including too many possible TOD areas which basically cover the whole DC area.

Cons: The TOD area might not be continuous due to some geographical bias. It might overlook tracts which are significantly adjacent to stations, or include the tracts that are mostly not in the buffer but are included due to their shape.

allTracts.group <- 
  rbind(
    st_centroid(allTracts)[buffer,] %>%
      st_drop_geometry() %>%
      left_join(allTracts) %>%
      st_sf() %>%
      mutate(TOD = "TOD"),
    st_centroid(allTracts)[buffer, op = st_disjoint] %>%
      st_drop_geometry() %>%
      left_join(allTracts) %>%
      st_sf() %>%
      mutate(TOD = "Non-TOD")) %>%
  mutate(MedRent.inf = ifelse(year == "2009", MedRent * 1.19, MedRent)) %>%
  mutate(MedHHInc.inf = ifelse(year == "2009", MedHHInc *1.19, MedHHInc))
Tod_region <- allTracts.group %>%
  select(TOD) %>%
  filter(TOD =="TOD") %>%
  st_union() %>%
  st_sf()
ggplot(allTracts.group)+
  geom_sf(data = st_union(tracts19))+
  geom_sf(aes(fill = TOD))+
  scale_fill_manual(values = c("grey90", "orange"),
                    name = "Type")+
  labs(title = "TOD and Non-TOD Census Tracts in Washington DC",
       caption = "Data from US Census Bureau")+
  facet_wrap(~year)+
  mapTheme()+
  theme(legend.position = "bottom")

Data Visulizations

Summary: TOD Indicator Tables & Plots

In summary, from 2009 to 2019, the TOD regions saw a greater increase in population, average median rent, average median household income, and crime rates compared to the non-TOD areas. However, the percentage of residents with bachelor’s degrees remained stable in non-TOD areas, while it saw a decrease within the TOD regions.

allTracts.Summary <- 
  st_drop_geometry(allTracts.group) %>%
  group_by(year, TOD) %>%
  summarize(Rent = mean(MedRent.inf, na.rm = T),
            Population = mean(TotalPop, na.rm = T),
            Percent_Bach = mean(pctBachelors, na.rm = T),
            HH_Income=mean(MedHHInc.inf, na.rm = T),
            Crime_Counts=mean(crime_counts,na.rm = T))

allTracts.Summary %>%
  unite(year.TOD, year, TOD, sep = ": ", remove = T) %>%
  gather(Variable, Value, -year.TOD) %>%
  mutate(Value = round(Value, 2)) %>%
  spread(year.TOD, Value) %>%
  kable() %>%
  kable_styling() %>%
  footnote(general_title = "\n",
           general = "Table 1")
Variable 2009: Non-TOD 2009: TOD 2019: Non-TOD 2019: TOD
Crime_Counts 274.49 356.70 278.61 451.00
HH_Income 73778.65 75620.41 86196.55 101401.36
Percent_Bach 0.02 0.04 0.02 0.03
Population 2996.05 2869.95 3572.93 3767.50
Rent 1083.37 1152.26 1358.34 1642.61

Table 1
allTracts.Summary %>%
  gather(Variable, Value, -year, -TOD) %>%
  ggplot(aes(year, Value, fill = TOD)) +
  geom_bar(stat = "identity", position = "dodge") +
  facet_wrap(~Variable, scales = "free", ncol=5) +
  scale_fill_manual(values = c("#bae4bc", "#0868ac"),
                    name = "Type") +
  labs(title = "Indicator differences across time and space") +
  plotTheme() + theme(legend.position="bottom")

Graduated Symbol Maps

centroid_all <- st_centroid(allTracts.group, of_largest_polygon = TRUE)
buffer_new<- st_buffer(dc_station,2640)%>%
  dplyr::select(NAME)

pop_area <- st_join(buffer_new,allTracts%>%select(TotalPop,MedRent,year))

pop_area_sum <- pop_area%>%
  group_by(NAME,year)%>%
  summarise(pop = sum(TotalPop),
            Rent=mean(MedRent,na.rm = T))%>%
  st_drop_geometry()

station_new <- left_join(dc_station,pop_area_sum,by='NAME')

Population Within 0.5 Mile of Each Subway Station

# population
ggplot() +
  geom_sf(data = allTracts.group,fill='grey70') + 
  geom_sf(data = st_union(Tod_region), color = "white",fill="transparent", linewidth = 1)+
    geom_sf(data = station_new,
          pch = 21,
          aes(size=pop),
          fill=alpha('darkorange',0.7),
          col = 'grey20') +
  facet_wrap(~year) +
  scale_size(range = c(1,4))

Median Rent Within 0.5 Mile of Each Subway Station

From 2009 to 2019, rents near metro stations grew significantly, with relatively large increases in central areas.

# population
ggplot() +
  geom_sf(data = allTracts.group,fill='grey70') + 
  geom_sf(data = st_union(Tod_region), color = "white",fill="transparent", linewidth = 1)+
    geom_sf(data = station_new,
          pch = 21,
          aes(size=Rent),
          fill=alpha('darkorange',0.7),
          col = 'grey20') +
  facet_wrap(~year) +
  scale_size(range = c(1,4))

Buffers and Distance to Subway Stations

station_ring <- multipleRingBuffer(st_union(dc_station), 2640*9, 2640)

allTracts.rings <-
  st_join(st_centroid(dplyr::select(allTracts.group, GEOID, year)),
          station_ring) %>%
  st_drop_geometry() %>%
  left_join(dplyr::select(allTracts.group, GEOID, MedRent, year), 
            by=c("GEOID"="GEOID", "year"="year")) %>%
  st_sf() %>%
  mutate(distance = distance / 5280)
ggplot() +
    geom_sf(data=station_ring,aes(color=distance),linewidth=2) +
  scale_color_gradient(low="pink",high="blue",
                       name = "Distance\n(Ft)")+
    geom_sf(data=dc_station, size=1) +
    geom_sf(data=st_union(allTracts.rings), fill=NA, color="darkorange",linewidth = 1) +
    labs(title="Station Stops: Half Mile Buffers",
         subtitle = "Washington DC",
         caption = "Data from US Census Bureau, Open Data DC") +
    mapTheme()

Distance to subway station

The distance of the centroid of each census tract from the nearest metro station is further calculated.

#colors_categories <- c("0.5"="#f0f9e8","1"="#bae4bc","1.5"="#7bccc4","2"="#43a2ca","2.5"="#0868ac","3.5"="#003366" )

ggplot(allTracts.rings)+
  geom_sf(data = st_union(tracts19))+
  geom_sf(aes(fill = as.factor(distance)))+
  geom_sf(data = Tod_region, color= "darkorange",fill="transparent", linewidth=1)+
  scale_fill_brewer(palette = "YlGnBu",
                      name = "Distance to Subway Stations\n(Miles)")+
  #distiller -> continuous palette, brewer->discrete, need to fill = as.factor(distance),
    labs(title = "Distance to Subway Stations by census tracts",
         subtitle = "Washington DC",
       caption = "Data from US Census Bureau")+
  mapTheme()

Summary: Rent as a Function of Distance to Subway Stations

allTracts.rings.summary <- st_drop_geometry(allTracts.rings) %>%
  group_by(distance, year) %>%
  summarize(Mean_Rent = mean(MedRent, na.rm = T))
ggplot(allTracts.rings.summary,
       aes(distance,Mean_Rent, color = year))+
  geom_point(size = 3.5)+
  geom_line(size = 2)+
  scale_color_manual(values = c("#bae4bc","#0868ac"))+
  labs(title = "Rent as a function of distance to subway stations",
       subtitle = "Census tracts in Washington DC",
       x="Distance(mile)",y="Average Rent($)") +
      plotTheme() + theme(legend.position="bottom")

Conclusion

Residents in DC tend to value neighborhoods in TOD areas, as well as suburbs. As the distance from the subway station increases, the average rent shows a trend of decreasing, then increasing, then decreasing in both 2009 and 2019. The two rent highs are within areas closer to TOD and in suburbs farther away from the city, and these high-rent suburbs are also expanding outward. However, there are some spatial bias in terms of the selection of TOD areas. As such, it’s important to take more factors into consideration in the decision of promoting transit-oriented developments.