Chapter 4 Vector data with sf

Sticky geometry with `sf`

Figure 4.1: Sticky geometry with sf

sf makes use of a couple key libraries that are the foundation for most open source geospatial software

sf is an implementation of Simple features, an open standard developed and endorsed by the Open Geospatial Consortium (OGC). Simple Features is a hierarchical data model that represents a wide range of geometry types - it includes all common vector geometry types (but does not include raster) and even allows geometry collections, which can have multiple geometry types in a single object. From the first sf package vignette we see:

knitr::include_graphics("images/sf_objects.png")

The big difference between sf and sp is that sf uses S3 classes rather than S4 as sp does. Simple features are simple data.frame objects that have a geometry list-column. The simple feature model will be familiar to those who use PostGIS, MySQL Spatial Extensions, Oracle Spatial, the OGR component of the GDAL library, GeoJSON and GeoPandas in Python. Simple features are represented with Well-Known text - WKT - and well-known binary formats.

Just as in PostGIS, all functions and methods in sf are prefixed with st_, which stands for ‘spatial and temporal’. An advantage of this prefixing is all commands are easy to find with command-line completion in sf.

There is extensive documentation, blog posts and vignettes available for sf here: Simple Features for R. Additionally, see r-spatial blog which has numerous announcements, discussion pieces and tutorials on spatial work in R focused.

A very handy page, if you’re already familiar with the sp ecosystem, is the Migrating page on the sf GitHub wiki.

4.1 sf Objects

There is a great breakdown of sf objects in this blog post. In sf, we have several types of objects:

  • sfg: geometry of a single feature
  • sfc: geometry column with the spatial attributes of the object printed above the data frame
  • attributes: non-geometry variables in the data frame
  • simple feature: a single simple feature with both geometry and attributes represented by a row in the data frame
  • sf object: a collection of simple features (rows) represented by a data frame

We can create an sfg easily, which will have information about coordinates, dimension, and type of geometry:

library(sf)
Corvallis <- st_point(c(-123.283, 44.566))
class(Corvallis)
## [1] "XY"    "POINT" "sfg"

We can also create a MULTIPOINT sfg or a LINESTRING sfg:

library(sf)
multipoint_sfg <- st_multipoint(rbind(c(-123.283, 44.566), c(-125.866, 50.544)))
linestring_sfg <- st_linestring(rbind(c(-123.283, 44.566), c(-125.866, 50.544))) 
multipoint_sfg
linestring_sfg

..And plot

plot(multipoint_sfg, axes=TRUE)
plot(linestring_sfg, col='red', add=TRUE)

Just as we saw in intro, these objects are not truly spatial because they still lack a coordinate reference system.

It should also be noted that sf objects can have both Z (height) and M (measurement) dimensions along with X and Y dimensions. Certain operations won’t work with M and Z dimensions (anyone who works with national hydrography data in sf will discover this quickly!) - often you need to use st_zm(x, drop=TRUE, what=“ZM”) for this.

4.2 sf Methods

Here’s a quick synopsis of available methods in sf:

library(sf,quietly = T)
methods(class = 'sf')
##  [1] $<-                   [                     [[<-                 
##  [4] aggregate             as.data.frame         cbind                
##  [7] coerce                dbDataType            dbWriteTable         
## [10] filter                identify              initialize           
## [13] merge                 plot                  print                
## [16] rbind                 show                  slotsFromS3          
## [19] st_agr                st_agr<-              st_area              
## [22] st_as_s2              st_as_sf              st_bbox              
## [25] st_boundary           st_buffer             st_cast              
## [28] st_centroid           st_collection_extract st_convex_hull       
## [31] st_coordinates        st_crop               st_crs               
## [34] st_crs<-              st_difference         st_filter            
## [37] st_geometry           st_geometry<-         st_inscribed_circle  
## [40] st_interpolate_aw     st_intersection       st_intersects        
## [43] st_is                 st_is_valid           st_join              
## [46] st_line_merge         st_m_range            st_make_valid        
## [49] st_nearest_points     st_node               st_normalize         
## [52] st_point_on_surface   st_polygonize         st_precision         
## [55] st_reverse            st_sample             st_segmentize        
## [58] st_set_precision      st_shift_longitude    st_simplify          
## [61] st_snap               st_sym_difference     st_transform         
## [64] st_triangulate        st_union              st_voronoi           
## [67] st_wrap_dateline      st_write              st_z_range           
## [70] st_zm                 transform            
## see '?methods' for accessing help and source code

4.3 Exploring sf

Let’s start exploring sf using some data included in the Rspatialworkshop package.

library(Rspatialworkshop)
data(bike_paths)
head(bike_paths[,c('ROUTE_CODE','ROUTE_NAME','geoms')])
## Simple feature collection with 6 features and 2 fields
## Geometry type: MULTILINESTRING
## Dimension:     XY
## Bounding box:  xmin: -97.75811 ymin: 30.26214 xmax: -97.75198 ymax: 30.28477
## Geodetic CRS:  WGS 84
##   ROUTE_CODE   ROUTE_NAME                          geoms
## 1    43.0023 LAMAR BLVD N MULTILINESTRING ((-97.75198...
## 2    43.0023 LAMAR BLVD N MULTILINESTRING ((-97.75198...
## 3    43.0028 LAMAR BLVD S MULTILINESTRING ((-97.75711...
## 4    43.0028 LAMAR BLVD S MULTILINESTRING ((-97.75711...
## 5    43.0028 LAMAR BLVD S MULTILINESTRING ((-97.75775...
## 6    43.0028 LAMAR BLVD S MULTILINESTRING ((-97.75775...

Let’s break down what we just saw in bike paths, which is bike paths in Austin - from the first sf package vignette:

knitr::include_graphics("images/sf_structure.png")

We can see:

  • in green a simple feature: a single record, or data.frame row, consisting of attributes and geometry
  • in blue a single simple feature geometry (an object of class sfg)
  • in red a simple feature list-column (an object of class sfc, which is a column in the data.frame)

The sfc, our geometry list-column, is presented as well-known text, in the form of (for polygon data):

  • Multipolygon(polygon1, polygon2)

polygon1 might have 1 or more holes, and itself could be represented as (poly1, hole1, hole2).

Each polygon and its holes are held together by a set of parentheses, so:

  • Multipolygon(((list of coordinates))) indicates the exterior ring coordinates, going counter-clockwise in direction, without holes, of the first polygon.

Generic plotting works on sf objects - take a minute to play with plotting the bike paths data frame - why do we specify $geometry in the plot call?

plot(bike_paths$geoms, axes=T)

4.4 Converting other object types to sf

We can convert foreign objects (regular data frames with coordinates, sp objects, maps package objects) to sf objects easily.

Here we convert PNW states in older spatial format in the maps package to sf polygon objects, then read in a .csv of stream gage locations from the Rspatialworkshop package and convert to sf objects.

library(readr)
states <- sf::st_as_sf(maps::map("state",  region = c('oregon', 'washington', 'idaho'), plot = FALSE, fill = TRUE))
fpath <- system.file("extdata", "Gages_flowdata.csv", package="Rspatialworkshop")
gages <- read_csv(fpath,show_col_types = FALSE)
gages <- gages %>% 
  st_as_sf(coords = c("LON_SITE", "LAT_SITE"), crs=4269)
plot(states$geom, axes=TRUE)
plot(gages$geometry, add=TRUE, col='blue')
title(main='StreamGages and PNW \n State Boundaries')

4.5 Units in sf

The crs in sf encodes the units of measure in information relating to spatial units of features - this can be both handy and very confusing for those new to it. Consider the following:

(sum(st_length(bike_paths)))
## 3458778 [m]

We can set units if we do manipulations as well using the units package

units::set_units(sum(st_length(bike_paths)),km)
## 3458.778 [km]

If we need to use the value elsewhere get rid of units

as.numeric(sum(st_length(bike_paths)))
## [1] 3458778