Chapter 6 Geoprocessing
6.1 Lesson Goals
A quick look at a some typical topological operations (spatial subsetting, spatial joins, dissolve) using sf
- followed by a couple ‘real world’ spatial tasks. We’ll be using several of the geometric binary predicates described in ?sf::geos_binary_pred
.
6.2 Spatial Subsetting
Let’s look at the bike paths and parks data in the Rspatialworkshop
package. There are sf
feature collections of parks and trails in Austin in the package - we want to know what bike trails go through parks. A great feature of sf
is it supports spatial indexing:
library(sf)
library(Rspatialworkshop)
data(parks)
data(bike_paths)
plot(bike_paths$geoms, col='green', axes=T)
plot(parks$geoms, col='blue', add=T)
<- bike_paths[parks,] paths_in_parks
plot(parks$geoms, col='blue', axes=T)
plot(paths_in_parks$geoms, col='red', lwd = 2, add=T)
title(main='Bike Paths in Parks in Austin')
6.3 Spatial Join
We’ll use chained operations to select just a couple columns from both bike_paths and parks, and then we’ll do a spatial join operation with sf
. Note that when we do a select on just attribute column, the geometry column remains - geometry is sticky in sf
!
library(dplyr)
<- bike_paths %>%
bike_paths ::select(ROUTE_NAME)
dplyr<- parks %>%
parks ::select(LOCATION_NAME, ZIPCODE,PARK_TYPE)
dplyr<- st_join(parks, bike_paths) # st_intersects is the default
parks_bike_paths glimpse(parks_bike_paths)
## Rows: 744
## Columns: 5
## $ LOCATION_NAME <chr> "Stratford Overlook Greenbelt", "Highland Neighborhood P~
## $ ZIPCODE <chr> "78746", "78752", "78703", "78753", "78724", "78702", "7~
## $ PARK_TYPE <chr> "Greenbelt", "Neighborhood", "Pocket", "Neighborhood", "~
## $ ROUTE_NAME <chr> NA, NA, NA, NA, NA, NA, "TOWN LAKE HIKE & BIKE TRAIL", "~
## $ geoms <MULTIPOLYGON [°]> MULTIPOLYGON (((-97.78802 3..., MULTIPOLYGO~
6.4 Dissolve
We can perform a spatial dissolve in sf
using dplyr
group_by
and summarize
functions with an sf
object.
What we are doing here is using zip code as our grouping variable with dplyr
, summarizing area using the sf
st_area
function at that grouping level, and mapping the result.
library(ggplot2)
$AREA <- st_area(parks)
parks<- parks %>%
parks_zip group_by(ZIPCODE) %>%
summarise(AREA = sum(AREA)) %>%
ggplot() + geom_sf(aes(fill=(ZIPCODE))) +
ggtitle("Austin Parks by Zip Code") +
theme_bw()
parks_zip
6.5 Spatial Overlap
Here’s a fun example using material posted by Nicholas Tierney here that he put together based on this Stack Overflow discussion.
First we’ll extract the Portland Oregon metropolitan area using the tidycensus
and tigris
packages
library(ggplot2)
library(tidycensus)
library(tidyverse)
library(tigris)
<- get_acs(geography = "tract", variables = "DP04_0134",
tracts state = "OR", geometry = TRUE, progress_bar = FALSE)
<- core_based_statistical_areas(cb = TRUE, progress_bar = FALSE) %>%
pdx filter(GEOID == "38900")
ggplot() +
geom_sf(data = pdx,
fill = "forestgreen")
6.5.1 Next we’ll create a dummy spatial polygon file to compare area with using the rmapshaper
package to simplify the border of the PDX metropolitan polygon
library(rmapshaper)
<- pdx %>%
pdx_simplified ms_simplify(keep = 0.01)
6.5.2 Then we can overlay polygons in ggplot to see how similar they are, showing the original census PDX metropolitan area in green, and new simplified polygon in red
ggplot() +
geom_sf(data = pdx,
fill = "forestgreen", alpha = 0.5) +
geom_sf(data = pdx_simplified,
fill = "firebrick",
alpha = 0.5)
6.5.3 Now that we have an original and simplified polygon to compare, the process we want to use is:
- Calculate original metro area polygon area
- Calculate the intersection of these two areas - original and simplified (st_intersection)
- Calculate that area (st_area)
- Then only keep the relevant data again
We’ll run the steps then combine into a function
<- pdx %>%
original_area mutate(original_area = st_area(.)) %>%
::select(NAME, original_area) %>%
dplyrst_drop_geometry()
<- st_intersection(pdx, pdx_simplified) %>%
intersection_area mutate(intersect_area = st_area(.)) %>%
::select(NAME, intersect_area) %>%
dplyrst_drop_geometry()
6.5.4 Exercise
This step should have given you an error -THIS IS VERY COMMON WORKING WITH SPATIAL DATA IN R AND USING sf
.
6.5.5 Solution
<- st_make_valid(pdx_simplified)
pdx_simplified <- st_make_valid(pdx)
pdx
<- pdx %>%
original_area mutate(original_area = st_area(.)) %>%
::select(NAME, original_area) %>%
dplyrst_drop_geometry()
<- st_intersection(pdx, pdx_simplified) %>%
intersection_area mutate(intersect_area = st_area(.)) %>%
::select(NAME, intersect_area) %>%
dplyrst_drop_geometry()
# show the area of intersection
intersection_area
## NAME intersect_area
## 1 Portland-Vancouver-Hillsboro, OR-WA 16524152114 [m^2]
# show the proportion of overlap
%>%
intersection_area left_join(original_area,
by = "NAME") %>%
mutate(orig = as.numeric(original_area),
new = as.numeric(intersect_area),
proportion = (new / orig) * 100)
## NAME intersect_area original_area
## 1 Portland-Vancouver-Hillsboro, OR-WA 16524152114 [m^2] 17633108214 [m^2]
## orig new proportion
## 1 17633108214 16524152114 93.71094
6.5.6 Exercise
Why did I use as.numeric
in the mutate
statement above?
6.5.6.1 Solution
sf
will calculate area or length for features as units
- which is convenient and forces you to be up front about units used, but you when you do calculations with attributes stored as units the units are retained - which is often not what is desired. You can see this behavior using str(st_area(pdx))
.
All steps rolled into a function
<-
calculate_spatial_overlap function(shape_new,shape_old, shared_column_name) {
<- st_intersection(shape_new, shape_old) %>%
intersection_area mutate(intersect_area = st_area(.)) %>%
::select(shared_column_name, intersect_area) %>%
dplyrst_drop_geometry()
# Create a fresh area variable
<- shape_old %>%
shape_old_areas mutate(original_area = st_area(.)) %>%
::select(original_area, shared_column_name) %>%
dplyrst_drop_geometry()
%>%
intersection_area left_join(shape_old_areas,
by = shared_column_name) %>%
mutate(orig = as.numeric(original_area),
new = as.numeric(intersect_area),
proportion = (new / orig) * 100)
}
calculate_spatial_overlap(pdx, pdx_simplified, shared_column_name='NAME')
## NAME intersect_area original_area
## 1 Portland-Vancouver-Hillsboro, OR-WA 16524152114 [m^2] 17295383068 [m^2]
## orig new proportion
## 1 17295383068 16524152114 95.54083
6.6 Deriving data for a sites or a watershed
6.6.1 Extract using sites
First we’ll use dataRetrieval
to find NWIS sites in Benton County Oregon
library(mapview)
library(dataRetrieval)
library(sf)
<- readNWISdata(stateCd="Oregon", countyCd="Benton")
Benton_Stations <- attr(Benton_Stations , "siteInfo")
siteInfo = st_as_sf(siteInfo, coords = c("dec_lon_va", "dec_lat_va"), crs = 4269,agr = "constant")
stations_sf mapview(stations_sf)
6.6.1.1 Exercise
What is the agr
argument to st_as_sf
in code chunk above?
6.6.1.2 Solution
Try help(st_sf)
and look at details, and try demo(nc)
to see a worked example. Basically, agr
specifies the attribute-geometry relationship - does the attribute represent a constant value, an aggregation over the geometry, or an identity that uniquely identifies the geometry. For the most part I haven’t paid too much attention to this parameter but similar to units
, it can help with constructing a more self-descriptive spatial object.
6.6.2 River reaches and basin
Next we’ll use nhdplusTools
to get the river reaches for the Mary’s River in Benton county, get the watershed Mary’s River watershed, and then spatially subset the stations to the watershed
library(nhdplusTools)
= 23762881
start_comid <- list(featureSource = "comid", featureID = start_comid)
nldi_feature
<- navigate_nldi(nldi_feature, mode = "UT", data_source = "flowlines", distance=5000)
flowline_nldi
<- get_nldi_basin(nldi_feature = nldi_feature) basin
6.6.3 StreamCat data for watershed
StreamCat provides pre-compiled watershed metrics for every NHDPlus stream reach in the CONUS. Note that the StreamCatTools R package is in alpha development and will only work behind the EPA firewall at the moment (must be connected to VPN to use)
# install_github('USEPA/StreamCatTools')
library(StreamCatTools)
# get NHDPlus COMIDS for flowlines of the Mary's River - some machinations on our data from NLDI using nhdplusTools...
<- c(flowline_nldi$origin$comid,flowline_nldi$UT_flowlines$nhdplus_comid)
comids <- paste(comids,collapse=",",sep="")
comids
<- sc_get_data(metric='PctImp2011', aoi='catchment', comid=comids) df
## &name=PctImp2011&comid=23762881,23762881,23764703,23762883,23765593,23764671,23762885,23764723,23764669,23764667,23762887,23764733,23764721,23764715,23764657,23764641,23764643,23764779,23762889,23764581,23764639,23764645,23762891,23764771,23762893,23764785,23762895,23762959,23764823,23764825,23762897,23764765,23764805,23762961,23764841,23762899,23762939,23764755,23764757,23764893,23762963,23762901,23763967,23762941,23764731,23764717,23764719,23764869,23762965,23762903,23764651,23763969,23764681,23762943,23764767,23764711,23764713,23764705,23764707,23764819,23762967,23764633,23762905,23763971,23764693,23762945,23764773,23764709,23764807,23763809,23762969,23762907,23764631,23763973,23762947,23764789,23764817,23764803,23764923,23762971,23762909,23764629,23764593,23764595,23763931,23762949,23764799,23764801,23763767,23762973,23762911,23764619,23764561,23764567,23764835,23763933,23762951,23764813,23764905,23763769,23762975,23764981,23762913,23764625,23763935,23764793,23764827,23762953,23764913,23763771,23762977,23762915,23764621,23763937,23764741,23764863,23762955,23764933,23764937,23763773,23764899,23762979,23762917,23764663,23764747,23763939,23764865,23764881,23762957,23763775,23764903,23762981,23762919,23762925,23764739,23764753,23763941,23764873,23764875,23763777,23764925,23762983,23762921,23764617,23763897,23762927,23765581,23763943,23762985,23762923,23764551,23764589,23764587,23764637,23762929,23764833,23764769,23762987,23764445,23764447,23764553,23764555,23762931,23763759,23762989,23762993,23764437,23764439,23764369,23764367,23764505,23764507,23764599,23762933,23763761,23765007,23762991,23765063,23762995,23764401,23764399,23764341,23762935,23764579,23763763,23764993,23765021,23765077,23765115,23765039,23762997,23764319,23764279,23762937,23764611,23763765,23764977,23764997,23765079,23765125,23765101,23762999,23765661,23764971,23764973,23765095,23765123,23765027,23763001,23764653,23763003,23764685,23763005,23764691,23764729,23764743,23764763&areaOfInterest=catchment
$UT_flowlines$PCTIMP2011CAT <- df$PCTIMP2011CAT[match(flowline_nldi$UT_flowlines$nhdplus_comid, df$COMID)]
flowline_nldimapview(flowline_nldi$UT_flowlines, zcol = "PCTIMP2011CAT", legend = TRUE) + mapview(basin, alpha.regions=.07)
6.7 Extract
Last, just quick examples of extracting raster values at points or using extract to summarize raster values for spatial polygons.
Extract of elevation for station points
library(FedData)
library(terra)
# The data access for elevation using FedData slow, so I've commented out and saved the data locally...
# elev <- get_ned(template = as(basin,'Spatial'),label='Marys River')
# marys_elev <- terra::crop(elev, basin)
# terra::writeRaster(marys_elev, 'C:/Users/mweber/Temp/marys_elev.tif')
<- terra::rast('C:/Users/mweber/Temp/marys_elev.tif')
marys_elev
st_crs(stations_sf)$wkt == crs(marys_elev)
## [1] FALSE
<- terra::project(marys_elev, st_crs(stations_sf)$wkt)
marys_elev $elevation <- terra::extract(marys_elev, vect(stations_sf))
stations_sf$elevation stations_sf
## ID marys_elev
## 1 1 88.02767
## 2 2 87.99461
## 3 3 74.26478
## 4 4 111.86833
## 5 5 70.22331
## 6 6 81.99174
## 7 7 59.63894
## 8 8 NaN
## 9 9 191.47162
Elevation summary for the watershed - doing zonal statistics using a vector feature zone on a raster
# project both our raster and polygon data
library(terra)
<- st_transform(basin, 5070)
basin <- project(marys_elev, st_crs(basin)$wkt, method = "bilinear")
marys_elev
<- terra::extract(marys_elev, vect(basin), fun = mean, na.rm = T, small = T)
meanelev meanelev
## ID marys_elev
## 1 1 227.784
Categorical raster (NLCD) summary for the watershed
<- system.file("extdata", "benton_nlcd.tif", package = "Rspatialworkshop")
raster_filepath <- rast(raster_filepath)
nlcd <- project(nlcd, st_crs(basin)$wkt, method = "near")
nlcd = terra::extract(nlcd, vect(basin))
basin_nlcd
$Category <- as.character(basin_nlcd$`NLCD Land Cover Class`)
basin_nlcd%>%
basin_nlcd group_by(Category) %>%
count()
## # A tibble: 15 x 2
## # Groups: Category [15]
## Category n
## <chr> <int>
## 1 Barren Land 177
## 2 Cultivated Crops 15080
## 3 Deciduous Forest 9514
## 4 Developed, High Intensity 2145
## 5 Developed, Low Intensity 19234
## 6 Developed, Medium Intensity 6004
## 7 Developed, Open Space 43259
## 8 Emergent Herbaceuous Wetlands 22807
## 9 Evergreen Forest 345263
## 10 Hay/Pasture 205659
## 11 Herbaceuous 32847
## 12 Mixed Forest 111779
## 13 Open Water 695
## 14 Shrub/Scrub 48671
## 15 Woody Wetlands 7895