Chapter 6 Geoprocessing
6.1 Lesson Goals
A quick look at a some typical topological operations (spatial subsetting, spatial joins, dissolve) using sf - followed by a couple ‘real world’ spatial tasks. We’ll be using several of the geometric binary predicates described in ?sf::geos_binary_pred.
6.2 Spatial Subsetting
Let’s look at the bike paths and parks data in the Rspatialworkshop package. There are sf feature collections of parks and trails in Austin in the package - we want to know what bike trails go through parks. A great feature of sf is it supports spatial indexing:
library(sf)
library(Rspatialworkshop)
data(parks)
data(bike_paths)
plot(bike_paths$geoms, col='green', axes=T)
plot(parks$geoms, col='blue', add=T)
paths_in_parks <- bike_paths[parks,]plot(parks$geoms, col='blue', axes=T)
plot(paths_in_parks$geoms, col='red', lwd = 2, add=T)
title(main='Bike Paths in Parks in Austin')
6.3 Spatial Join
We’ll use chained operations to select just a couple columns from both bike_paths and parks, and then we’ll do a spatial join operation with sf. Note that when we do a select on just attribute column, the geometry column remains - geometry is sticky in sf!
library(dplyr)
bike_paths <- bike_paths %>%
dplyr::select(ROUTE_NAME)
parks <- parks %>%
dplyr::select(LOCATION_NAME, ZIPCODE,PARK_TYPE)
parks_bike_paths <- st_join(parks, bike_paths) # st_intersects is the default
glimpse(parks_bike_paths)## Rows: 744
## Columns: 5
## $ LOCATION_NAME <chr> "Stratford Overlook Greenbelt", "Highland Neighborhood P~
## $ ZIPCODE <chr> "78746", "78752", "78703", "78753", "78724", "78702", "7~
## $ PARK_TYPE <chr> "Greenbelt", "Neighborhood", "Pocket", "Neighborhood", "~
## $ ROUTE_NAME <chr> NA, NA, NA, NA, NA, NA, "TOWN LAKE HIKE & BIKE TRAIL", "~
## $ geoms <MULTIPOLYGON [°]> MULTIPOLYGON (((-97.78802 3..., MULTIPOLYGO~
6.4 Dissolve
We can perform a spatial dissolve in sf using dplyr group_by and summarize functions with an sf object.
What we are doing here is using zip code as our grouping variable with dplyr, summarizing area using the sf st_area function at that grouping level, and mapping the result.
library(ggplot2)
parks$AREA <- st_area(parks)
parks_zip <- parks %>%
group_by(ZIPCODE) %>%
summarise(AREA = sum(AREA)) %>%
ggplot() + geom_sf(aes(fill=(ZIPCODE))) +
ggtitle("Austin Parks by Zip Code") +
theme_bw()
parks_zip
6.5 Spatial Overlap
Here’s a fun example using material posted by Nicholas Tierney here that he put together based on this Stack Overflow discussion.
First we’ll extract the Portland Oregon metropolitan area using the tidycensus and tigris packages
library(ggplot2)
library(tidycensus)
library(tidyverse)
library(tigris)
tracts <- get_acs(geography = "tract", variables = "DP04_0134",
state = "OR", geometry = TRUE, progress_bar = FALSE)
pdx <- core_based_statistical_areas(cb = TRUE, progress_bar = FALSE) %>%
filter(GEOID == "38900")
ggplot() +
geom_sf(data = pdx,
fill = "forestgreen") 
6.5.1 Next we’ll create a dummy spatial polygon file to compare area with using the rmapshaper package to simplify the border of the PDX metropolitan polygon
library(rmapshaper)
pdx_simplified <- pdx %>%
ms_simplify(keep = 0.01)6.5.2 Then we can overlay polygons in ggplot to see how similar they are, showing the original census PDX metropolitan area in green, and new simplified polygon in red
ggplot() +
geom_sf(data = pdx,
fill = "forestgreen", alpha = 0.5) +
geom_sf(data = pdx_simplified,
fill = "firebrick",
alpha = 0.5)
6.5.3 Now that we have an original and simplified polygon to compare, the process we want to use is:
- Calculate original metro area polygon area
- Calculate the intersection of these two areas - original and simplified (st_intersection)
- Calculate that area (st_area)
- Then only keep the relevant data again
We’ll run the steps then combine into a function
original_area <- pdx %>%
mutate(original_area = st_area(.)) %>%
dplyr::select(NAME, original_area) %>%
st_drop_geometry()
intersection_area <- st_intersection(pdx, pdx_simplified) %>%
mutate(intersect_area = st_area(.)) %>%
dplyr::select(NAME, intersect_area) %>%
st_drop_geometry()6.5.4 Exercise
This step should have given you an error -THIS IS VERY COMMON WORKING WITH SPATIAL DATA IN R AND USING sf.
6.5.5 Solution
pdx_simplified <- st_make_valid(pdx_simplified)
pdx <- st_make_valid(pdx)
original_area <- pdx %>%
mutate(original_area = st_area(.)) %>%
dplyr::select(NAME, original_area) %>%
st_drop_geometry()
intersection_area <- st_intersection(pdx, pdx_simplified) %>%
mutate(intersect_area = st_area(.)) %>%
dplyr::select(NAME, intersect_area) %>%
st_drop_geometry()
# show the area of intersection
intersection_area## NAME intersect_area
## 1 Portland-Vancouver-Hillsboro, OR-WA 16524152114 [m^2]
# show the proportion of overlap
intersection_area %>%
left_join(original_area,
by = "NAME") %>%
mutate(orig = as.numeric(original_area),
new = as.numeric(intersect_area),
proportion = (new / orig) * 100)## NAME intersect_area original_area
## 1 Portland-Vancouver-Hillsboro, OR-WA 16524152114 [m^2] 17633108214 [m^2]
## orig new proportion
## 1 17633108214 16524152114 93.71094
6.5.6 Exercise
Why did I use as.numeric in the mutate statement above?
6.5.6.1 Solution
sf will calculate area or length for features as units - which is convenient and forces you to be up front about units used, but you when you do calculations with attributes stored as units the units are retained - which is often not what is desired. You can see this behavior using str(st_area(pdx)).
All steps rolled into a function
calculate_spatial_overlap <-
function(shape_new,shape_old, shared_column_name) {
intersection_area <- st_intersection(shape_new, shape_old) %>%
mutate(intersect_area = st_area(.)) %>%
dplyr::select(shared_column_name, intersect_area) %>%
st_drop_geometry()
# Create a fresh area variable
shape_old_areas <- shape_old %>%
mutate(original_area = st_area(.)) %>%
dplyr::select(original_area, shared_column_name) %>%
st_drop_geometry()
intersection_area %>%
left_join(shape_old_areas,
by = shared_column_name) %>%
mutate(orig = as.numeric(original_area),
new = as.numeric(intersect_area),
proportion = (new / orig) * 100)
}
calculate_spatial_overlap(pdx, pdx_simplified, shared_column_name='NAME')## NAME intersect_area original_area
## 1 Portland-Vancouver-Hillsboro, OR-WA 16524152114 [m^2] 17295383068 [m^2]
## orig new proportion
## 1 17295383068 16524152114 95.54083
6.6 Deriving data for a sites or a watershed
6.6.1 Extract using sites
First we’ll use dataRetrieval to find NWIS sites in Benton County Oregon
library(mapview)
library(dataRetrieval)
library(sf)
Benton_Stations <- readNWISdata(stateCd="Oregon", countyCd="Benton")
siteInfo <- attr(Benton_Stations , "siteInfo")
stations_sf = st_as_sf(siteInfo, coords = c("dec_lon_va", "dec_lat_va"), crs = 4269,agr = "constant")
mapview(stations_sf)6.6.1.1 Exercise
What is the agr argument to st_as_sf in code chunk above?
6.6.1.2 Solution
Try help(st_sf) and look at details, and try demo(nc) to see a worked example. Basically, agr specifies the attribute-geometry relationship - does the attribute represent a constant value, an aggregation over the geometry, or an identity that uniquely identifies the geometry. For the most part I haven’t paid too much attention to this parameter but similar to units, it can help with constructing a more self-descriptive spatial object.
6.6.2 River reaches and basin
Next we’ll use nhdplusTools to get the river reaches for the Mary’s River in Benton county, get the watershed Mary’s River watershed, and then spatially subset the stations to the watershed
library(nhdplusTools)
start_comid = 23762881
nldi_feature <- list(featureSource = "comid", featureID = start_comid)
flowline_nldi <- navigate_nldi(nldi_feature, mode = "UT", data_source = "flowlines", distance=5000)
basin <- get_nldi_basin(nldi_feature = nldi_feature)6.6.3 StreamCat data for watershed
StreamCat provides pre-compiled watershed metrics for every NHDPlus stream reach in the CONUS. Note that the StreamCatTools R package is in alpha development and will only work behind the EPA firewall at the moment (must be connected to VPN to use)
# install_github('USEPA/StreamCatTools')
library(StreamCatTools)
# get NHDPlus COMIDS for flowlines of the Mary's River - some machinations on our data from NLDI using nhdplusTools...
comids <- c(flowline_nldi$origin$comid,flowline_nldi$UT_flowlines$nhdplus_comid)
comids <- paste(comids,collapse=",",sep="")
df <- sc_get_data(metric='PctImp2011', aoi='catchment', comid=comids)## &name=PctImp2011&comid=23762881,23762881,23764703,23762883,23765593,23764671,23762885,23764723,23764669,23764667,23762887,23764733,23764721,23764715,23764657,23764641,23764643,23764779,23762889,23764581,23764639,23764645,23762891,23764771,23762893,23764785,23762895,23762959,23764823,23764825,23762897,23764765,23764805,23762961,23764841,23762899,23762939,23764755,23764757,23764893,23762963,23762901,23763967,23762941,23764731,23764717,23764719,23764869,23762965,23762903,23764651,23763969,23764681,23762943,23764767,23764711,23764713,23764705,23764707,23764819,23762967,23764633,23762905,23763971,23764693,23762945,23764773,23764709,23764807,23763809,23762969,23762907,23764631,23763973,23762947,23764789,23764817,23764803,23764923,23762971,23762909,23764629,23764593,23764595,23763931,23762949,23764799,23764801,23763767,23762973,23762911,23764619,23764561,23764567,23764835,23763933,23762951,23764813,23764905,23763769,23762975,23764981,23762913,23764625,23763935,23764793,23764827,23762953,23764913,23763771,23762977,23762915,23764621,23763937,23764741,23764863,23762955,23764933,23764937,23763773,23764899,23762979,23762917,23764663,23764747,23763939,23764865,23764881,23762957,23763775,23764903,23762981,23762919,23762925,23764739,23764753,23763941,23764873,23764875,23763777,23764925,23762983,23762921,23764617,23763897,23762927,23765581,23763943,23762985,23762923,23764551,23764589,23764587,23764637,23762929,23764833,23764769,23762987,23764445,23764447,23764553,23764555,23762931,23763759,23762989,23762993,23764437,23764439,23764369,23764367,23764505,23764507,23764599,23762933,23763761,23765007,23762991,23765063,23762995,23764401,23764399,23764341,23762935,23764579,23763763,23764993,23765021,23765077,23765115,23765039,23762997,23764319,23764279,23762937,23764611,23763765,23764977,23764997,23765079,23765125,23765101,23762999,23765661,23764971,23764973,23765095,23765123,23765027,23763001,23764653,23763003,23764685,23763005,23764691,23764729,23764743,23764763&areaOfInterest=catchment
flowline_nldi$UT_flowlines$PCTIMP2011CAT <- df$PCTIMP2011CAT[match(flowline_nldi$UT_flowlines$nhdplus_comid, df$COMID)]
mapview(flowline_nldi$UT_flowlines, zcol = "PCTIMP2011CAT", legend = TRUE) + mapview(basin, alpha.regions=.07)6.7 Extract
Last, just quick examples of extracting raster values at points or using extract to summarize raster values for spatial polygons.
Extract of elevation for station points
library(FedData)
library(terra)
# The data access for elevation using FedData slow, so I've commented out and saved the data locally...
# elev <- get_ned(template = as(basin,'Spatial'),label='Marys River')
# marys_elev <- terra::crop(elev, basin)
# terra::writeRaster(marys_elev, 'C:/Users/mweber/Temp/marys_elev.tif')
marys_elev <- terra::rast('C:/Users/mweber/Temp/marys_elev.tif')
st_crs(stations_sf)$wkt == crs(marys_elev)## [1] FALSE
marys_elev <- terra::project(marys_elev, st_crs(stations_sf)$wkt)
stations_sf$elevation <- terra::extract(marys_elev, vect(stations_sf))
stations_sf$elevation## ID marys_elev
## 1 1 88.02767
## 2 2 87.99461
## 3 3 74.26478
## 4 4 111.86833
## 5 5 70.22331
## 6 6 81.99174
## 7 7 59.63894
## 8 8 NaN
## 9 9 191.47162
Elevation summary for the watershed - doing zonal statistics using a vector feature zone on a raster
# project both our raster and polygon data
library(terra)
basin <- st_transform(basin, 5070)
marys_elev <- project(marys_elev, st_crs(basin)$wkt, method = "bilinear")
meanelev <- terra::extract(marys_elev, vect(basin), fun = mean, na.rm = T, small = T)
meanelev## ID marys_elev
## 1 1 227.784
Categorical raster (NLCD) summary for the watershed
raster_filepath <- system.file("extdata", "benton_nlcd.tif", package = "Rspatialworkshop")
nlcd <- rast(raster_filepath)
nlcd <- project(nlcd, st_crs(basin)$wkt, method = "near")
basin_nlcd = terra::extract(nlcd, vect(basin))
basin_nlcd$Category <- as.character(basin_nlcd$`NLCD Land Cover Class`)
basin_nlcd %>%
group_by(Category) %>%
count()## # A tibble: 15 x 2
## # Groups: Category [15]
## Category n
## <chr> <int>
## 1 Barren Land 177
## 2 Cultivated Crops 15080
## 3 Deciduous Forest 9514
## 4 Developed, High Intensity 2145
## 5 Developed, Low Intensity 19234
## 6 Developed, Medium Intensity 6004
## 7 Developed, Open Space 43259
## 8 Emergent Herbaceuous Wetlands 22807
## 9 Evergreen Forest 345263
## 10 Hay/Pasture 205659
## 11 Herbaceuous 32847
## 12 Mixed Forest 111779
## 13 Open Water 695
## 14 Shrub/Scrub 48671
## 15 Woody Wetlands 7895