Chapter 2 Reading in Spatial Data
- There are several ways we typically get spatial data into R:
- Load spatial files we have on our machine or from remote source
- Load spatial data that is part of an R package
- Grab data using API (often making use of particular R packages)
- Converting flat files with x,y data to spatial data
- Geocoding data (we saw example of this at beginning)
For reading and writing vector and raster data in R, the three primary packages you’ll use are:
sf
orrgdal
for vector formats such as ESRI Shapefiles, GeoJSON, and GPX - both packages use OGR, which is a library under the GDAL source tree,under the hoodraster
for raster formats such as GeoTIFF or ESRI or ASCII grid using GDAL under the hood
We can quickly discover supported I/O vector formats either via sf
or rgdal
:
library(knitr)
library(sf)
library(rgdal)
print(paste0('There are ',st_drivers("vector") %>% nrow(), ' vector drivers available using st_read or read_sf'))
## [1] "There are 82 vector drivers available using st_read or read_sf"
name | long_name | write | copy | isVector |
---|---|---|---|---|
AeronavFAA | Aeronav FAA | FALSE | FALSE | TRUE |
AmigoCloud | AmigoCloud | TRUE | FALSE | TRUE |
ARCGEN | Arc/Info Generate | FALSE | FALSE | TRUE |
AVCBin | Arc/Info Binary Coverage | FALSE | FALSE | TRUE |
AVCE00 | Arc/Info E00 (ASCII) Coverage | FALSE | FALSE | TRUE |
name | long_name | write | copy | is_raster | is_vector | vsi | |
---|---|---|---|---|---|---|---|
PCIDSK | PCIDSK | PCIDSK Database File | TRUE | FALSE | TRUE | TRUE | TRUE |
netCDF | netCDF | Network Common Data Format | TRUE | TRUE | TRUE | TRUE | FALSE |
PDS4 | PDS4 | NASA Planetary Data System 4 | TRUE | TRUE | TRUE | TRUE | TRUE |
JP2OpenJPEG | JP2OpenJPEG | JPEG-2000 driver based on OpenJPEG library | FALSE | TRUE | TRUE | TRUE | TRUE |
JPEG2000 | JPEG2000 | JPEG-2000 part 1 (ISO/IEC 15444-1), based on Jasper library | FALSE | TRUE | TRUE | TRUE | TRUE |
As well as I/O raster formats via sf
:
library(knitr)
print(paste0('There are ',st_drivers(what='raster') %>% nrow(), ' raster drivers available'))
## [1] "There are 144 raster drivers available"
name | long_name | write | copy | is_raster | is_vector | vsi | |
---|---|---|---|---|---|---|---|
VRT | VRT | Virtual Raster | TRUE | TRUE | TRUE | FALSE | TRUE |
DERIVED | DERIVED | Derived datasets using VRT pixel functions | FALSE | FALSE | TRUE | FALSE | FALSE |
GTiff | GTiff | GeoTIFF | TRUE | TRUE | TRUE | FALSE | TRUE |
NITF | NITF | National Imagery Transmission Format | TRUE | TRUE | TRUE | FALSE | TRUE |
RPFTOC | RPFTOC | Raster Product Format TOC format | FALSE | FALSE | TRUE | FALSE | TRUE |
2.1 Reading in vector data
sf
can be used to read numerous file types:
- Shapefiles
- Geodatabases
- Geopackages
- Geojson
- Spatial database files
2.1.1 Shapefiles
Typically working with vector GIS data we work with ESRI shapefiles or geodatabases - here we have an example of how one would read in a shapefile using sf
:
# download.file("ftp://ftp.gis.oregon.gov/adminbound/citylim_2017.zip","citylim_2017.zip")
# unzip("citylim_2017.zip", exdir = ".")
library(sf)
citylims <- st_read("citylim_2017.shp")
## Reading layer `citylim_2017' from data source `C:\Users\mweber\GitProjects\AWRA_2020_R_Spatial\citylim_2017.shp' using driver `ESRI Shapefile'
## Simple feature collection with 241 features and 14 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 234278.9 ymin: 94690.89 xmax: 2249460 ymax: 1644089
## projected CRS: NAD83 / Oregon GIC Lambert (ft)
2.1.2 st_read
versus read_sf
Above, I didn’t pass any parameters to st_read - typically I would pass the parameters quiet=TRUE
and stringsAsFactors=FALSE
- why would this be a good practice in general?
2.1.3 Answer
read_sf
is an sf
alternative to st_read (see this section 1.2.2). Try reading in citylims
data above using read_sf
and notice difference, and check out help(read_sf)
. read_sf
and write
sf` are simply aliases for st_read and st_write with modified default arguments. Big differences are:
- stringsAsFactors=FALSE
- quiet=TRUE
- as_tibble=TRUE
2.1.4 Geodatabases
We use st_read
or read_sf
similarly for reading in an ESRI file geodatabase feature:
# download.file("ftp://ftp.gis.oregon.gov/adminbound/OregonStateParks_20181010.zip", "OregonStateParks.zip")
# unzip("OregonStateParks.zip", exdir = ".")
library(ggplot2)
fgdb = "OregonStateParks_20181010.gdb"
# List all feature classes in a file geodatabase
st_layers(fgdb)
## Driver: OpenFileGDB
## Available layers:
## layer_name geometry_type features fields
## 1 LO_PARKS Multi Polygon 431 15
## Reading layer `LO_PARKS' from data source `C:\Users\mweber\GitProjects\AWRA_2020_R_Spatial\OregonStateParks_20181010.gdb' using driver `OpenFileGDB'
## Simple feature collection with 431 features and 15 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: 222821.4 ymin: 88699.71 xmax: 2243413 ymax: 1655108
## projected CRS: NAD83(2011) / Oregon GIC Lambert (ft)
2.1.5 Geopackages
Another spatial file format is the geopackage. Let’s try a quick read and write of geopackage data. First we’ll read in a geopackage using data that comes with sf
using dplyr
syntax just to show something a bit different and use read_sf
as an alternative to st_read
. You may want to try writing the data back out as a geopackage
as well.
Quick question: What are a couple advantages of geopackages
over shapefiles
?
## Rows: 100
## Columns: 15
## $ AREA <dbl> 0.114, 0.061, 0.143, 0.070, 0.153, 0.097, 0.062, 0.091, 0...
## $ PERIMETER <dbl> 1.442, 1.231, 1.630, 2.968, 2.206, 1.670, 1.547, 1.284, 1...
## $ CNTY_ <dbl> 1825, 1827, 1828, 1831, 1832, 1833, 1834, 1835, 1836, 183...
## $ CNTY_ID <dbl> 1825, 1827, 1828, 1831, 1832, 1833, 1834, 1835, 1836, 183...
## $ NAME <chr> "Ashe", "Alleghany", "Surry", "Currituck", "Northampton",...
## $ FIPS <chr> "37009", "37005", "37171", "37053", "37131", "37091", "37...
## $ FIPSNO <dbl> 37009, 37005, 37171, 37053, 37131, 37091, 37029, 37073, 3...
## $ CRESS_ID <int> 5, 3, 86, 27, 66, 46, 15, 37, 93, 85, 17, 79, 39, 73, 91,...
## $ BIR74 <dbl> 1091, 487, 3188, 508, 1421, 1452, 286, 420, 968, 1612, 10...
## $ SID74 <dbl> 1, 0, 5, 1, 9, 7, 0, 0, 4, 1, 2, 16, 4, 4, 4, 18, 3, 4, 1...
## $ NWBIR74 <dbl> 10, 10, 208, 123, 1066, 954, 115, 254, 748, 160, 550, 124...
## $ BIR79 <dbl> 1364, 542, 3616, 830, 1606, 1838, 350, 594, 1190, 2038, 1...
## $ SID79 <dbl> 0, 3, 6, 2, 3, 5, 2, 2, 2, 5, 2, 5, 4, 4, 6, 17, 4, 7, 1,...
## $ NWBIR79 <dbl> 19, 12, 260, 145, 1197, 1237, 139, 371, 844, 176, 597, 13...
## $ geom <MULTIPOLYGON [°]> MULTIPOLYGON (((-81.47276 3..., MULTIPOLYGON...
2.1.6 Open spatial data sources
There is a wealth of open spatial data accessible online now via static URLs or APIs - a few examples include Data.gov, NASA SECAC Portal, Natural Earth, UNEP GEOdata, and countless others listed here at Free GIS Data
2.1.7 Spatial data from R packages
There are also a number of R packages written specifically to provide access to geospatial data - below are a few and we’ll step through some examples of pulling in data using some of these packages.
Package name | Description |
---|---|
USABoundaries | Provide historic and contemporary boundaries of the US |
tigris | Download and use US Census TIGER/Line Shapefiles in R |
tidycensus | Uses Census American Community API to return tidyverse and optionally sf ready data frames |
FedData | Functions for downloading geospatial data from several federal sources |
elevatr | Access elevation data from various APIs (by Jeff Hollister) |
getlandsat | Provides access to Landsat 8 data. |
osmdata | Download and import of OpenStreetMap data. |
raster | The getData() function downloads and imports administrative country, SRTM/ASTER elevation, WorldClim data. |
rnaturalearth | Functions to download Natural Earth vector and raster data, including world country borders. |
rnoaa | An R interface to National Oceanic and Atmospheric Administration (NOAA) climate data. |
rWBclimate | An access to the World Bank climate data. |
Below is an example of pulling in US states using the rnaturalearth
package - note that the default is to pull in data as sp
objects and we coerce to sf
. Also take a look at the chained operation using dplyr. Try changing the filter or a parameter in ggplot.
2.1.8 Read in raster data
Here we use the getData
function in the raster
package to download elevation into a RasterLayer
and grab administrative boundaries from a database of global administrative boundaries - warning: sometimes getData
function has trouble accessing the server and download can be a bit slow. Here we see as well how we can use vector spataio polygon data to crop raster data.
library(raster)
US <- getData("GADM",country="USA",level=2)
Benton <- US[US$NAME_1=='Oregon' & US$NAME_2=='Benton',]
elev <- getData('SRTM', lon=-123, lat=44)
elev <- crop(elev, Benton)
elev <- mask(elev, Benton)
plot(Benton, main="Elevation (m) in Benton County, Oregon", axes=T)
plot(elev, add=TRUE)
plot(Benton, add=TRUE)
2.1.9 Read in OpenStreetMap data
The osmdata package is a fantastic resource for leveraging the OpenStreetMap (OSM) database.
library(osmdata)
library(mapview)
footway <- opq(bbox = "corvallis oregon") %>%
add_osm_feature(key = "highway", value = "footway") %>%
osmdata_sf()
footway <- footway$osm_lines
mapview(footway$geometry)
library(osmdata)
library(mapview)
rstrnts <- opq(bbox = "corvallis oregon") %>%
add_osm_feature(key = "amenity", value = "restaurant") %>%
osmdata_sf()
rstrnts <- rstrnts$osm_points
mapview(rstrnts$geometry)
We often have flat files, locally on our machine or accessed elsewhere, that have coordinate information which we would like to make spatial:
library(devtools)
library(readr)
library(ggplot2)
# install_github("mhweber/awra2020spatial", force=TRUE)
library(awra2020spatial)
gages = read_csv(system.file("extdata/Gages_flowdata.csv", package = "awra2020spatial"))
gages_sf <- gages %>%
st_as_sf(coords = c("LON_SITE", "LAT_SITE"), crs = 4269, remove = FALSE) %>%
dplyr::select(STATION_NM,LON_SITE, LAT_SITE)
ggplot() + geom_sf(data=gages_sf)