04 - Spatial Data in R - sp
18 April 2017
So, to begin, what is R and why should we use R for spatial analysis? Let’s break that into two questions - first, what is R and why should we use it?
- A language and environment for statistical computing and graphics
- R is lightweight, free, open-source and cross-platform
- Works with contriburted packages - currently 10,462 -extensibility
- Automation and recording of workflow (reproducibility)
- Optimized work flow - data manipulation, analysis andvisualization all in one place
- R does not alter underlying data - manipulation and visualization in memory
- R is great for repetative graphics
Second, why use R for spatial, or GIS, workflows?
- Spatial and statistical analysis in one environment
- Leverage statistical power of R (i.e. modeling spatial data, data visualization, statistical exploration)
- Can handle vector and raster data, as well as work with spatial databases and pretty much any data format spatial data comes in
- R’s GIS capabilities growing rapidly right now - new packages added monthly - currently about 180 spatial packages
Some drawbacks for using R for GIS work
- R not as good for interactive use as desktop GIS applications like ArcGIS or QGIS (i.e. editing features, panning,zooming)
- Steep learning curve
- Up to you to find packages to do what you need - help not always great
R runs on contributed packages - it has core functionality, but all the spatial work we would do in R is contained in user-contributed packages. Primary ones you’ll want to familiarize yourself with are `sp’, ‘rgdal’, ‘sf’, ‘rgeos’, ‘raster’ - there are many, many more. A good source to learn about available R spatial packages is:
CRAN Task View: Analysis of Spatial Data
Lesson Goals
- Understanding of spatial data in R and
sp
(spatial) objects in R - Introduction to R packages for spatial analysis
- Learn to read vector spatial data into R
- Perform some simple exploratory spatial data analysis with vector data in R
Quick Links to Exercises
- Exercise 1: Getting to Know of Spatial Objects
- Exercise 2: Building and Manipulating Spatial Data in R
- Exercise 3: Reading and writing data and projections
Download and extract data for exercises to your computer
A Little R Background
Terminology: Working Directory
Working directory in R is the location on your computer R is working from. To determine your working directory, in console type:
Which should return something like:
To see what is in the directory:
To establish a different directory:
Terminology: data structures
R is an interpreted language (access through a command-line interpreter) with a number of data structures (vectors, matrices, arrays, data frames, lists) and extensible objects (regression models, time-series, geospatial coordinates) and supports procedural programming with functions.
To learn about objects, become friends with the built-in class
and str
functions. Let’s explore the built-in iris data set to start:
As we can see, iris
is a data frame and is used extensively for beginning tutorials on learning R. Data frames consist of rows of observations on columns of values for variables of interest - they are one of the fundamental and most important data structures in R.
Overview of Classes and Methods
- Class: object types
-
class()
: gives the class type -
typeof()
: information on how the object is stored -
str()
: how the object is structured
-
- Method: generic functions
print()
plot()
summary()
Exercise 1
Getting to Know of Spatial Objects
Handling of spatial data in R has been standardized in recent years through the base package sp
- uses ‘new-style’ S4 classes in R that use formal class definitions and are closer to object-oriented systems than standard S3 classes in R.
The best source to learn about sp
and fundamentals of spatial analysis in R is Roger Bivand’s book Applied Spatial Data Analysis in R
Although we’ll look at the new simple features object specification this morning as well, numerous packages are currently built using sp object structure so need to learn to navigate current R spatial ecosystem.
sp
provides definitions for basic spatial classes (points, lines, polygons, pixels, and grids).
To start with, it’s good to stop and ask yourself what it takes to define spatial objects. What would we need to define vector (point, line, polygon) spatial objects?
- A coordinate reference system
- A bounding box, or extent
- plot order
- data
- ?
sp
objects inherit from the basic spatial class, which has two ‘slots’ in R new-style class lingo. From the Bivand book above, here’s what this looks like (Blue at top of each box is the class name, items in white are the slots, arrows show inheritance between classes):
- Let’s explore this in R. We can use the
getClass()
command to view the subclasses of a spatial object:
Next we’ll delve a bit deeper into the spatial objects inhereting from the base spatial class and try creating some simple objects. Here’s a schematic of how spatial lines and polygons inherit from the base spatial class - again, from the Bivand book:
And to explore a bit in R:
Also, there are a number of spatial methods you can use with classes in sp
- here are some useful ones to familarize yourself with:
Method / Class | Description |
---|---|
bbox() | Returns the bounding box coordinates |
proj4string() | Sets or retrieves projection attributes using the CRS object |
CRS() | Creates an object of class of coordinate reference system arguments |
spplot() | Plots a separate map of all the attributes unless specified otherwise |
coordinates() | Returns a matrix with the spatial coordinates. For spatial polygons it returns the centroids. |
over(x, y) | Used for example to retrieve the polygon or grid indexes on a set of points |
spsample(x) | Sampling of spatial points within the spatial extent of objects |
As an example data set to try out some of these methods on some spatial data in sp
, we’ll load the nor2k
data in the rgdal
package which represents Norwegian peaks over 2000 meters:
Take a few minutes to examine the nor2k SpatialPointsDataFrame
and try using methods we’ve seen such as class()
, str()
, typeof()
, proj4string()
, etc. A hint - which we’ll use more - to access slots in a new style S4 object, use the @ symbol.
Exercise 2
Building and Manipulating Spatial Data in R
Let’s take a step back now. Basic data structures in R can represent spatial data - all we need is some vectors with location and attribute information
Add a legend
Add a polygon to our map…
So, is this sufficient for working with spatial data in R and doing spatial analysis? What are we missing?
Packages early on in R came at handling spatial data in their own way. The maps
package is great example - a database of locational information that is quite handy. The maps
package format was developed in S (R is implementation of S) - lines represented as a sequence of points separated by ‘NA’ values - think of as drawing with a pen, raising at NA, then lowering at a value. Bad for associating with data since objects are only distinguished by separation with NA values. Try the following code-
maps
package draws on a binary database - see Becker references in help(map) for more details. Creates a list of 4 vectors when you create a maps
object in R.
Explore the structure of map object a bit….
Spatial classes provided in sp
package have mostly standardized spatial data in R and provide a solid way to represent and work with spatial data in R.
Let’s create a basic sp
SpatialLines object from coordinates we were looking at in maps
package..
Bottom line segment of Baker county…compare with earlier map and see if you understand why…
The maptools
package provides convenience function for making spatial objects from map objects. Try the following code and see if you can follow each step…
Exercise 3
Reading and writing data and projections
Now let’s look at how to construct a spatial object in R from a data frame with coordinate information.
A very common task you might do in R in taking a spreadsheet of data with coordinate information and turning it into a spatial object to do further GIS operations on. Here, we’ve read a speadsheet in an R data frame. Data frames, as we saw earlier, consist of rows of observations on columns of values for variables of interest
As with anything in R, there are several ways to go about this, but the basics are we need to pull the coordinate columns of the data frame into a matrix which becomes the coordinates slot of a spatial object, and then give the SpatialPointsDataFrame
we create a coordinate reference system.
See how it looks
Summary method gives a description of the spatial object in R. Summary works on pretty much all objects in R - for spatial data, gives us basic information about the projection, coordinates, and data for an sp
object if it’s a spatial data frame object.
We can see the coordinate reference system information for our SpatialPointsDataFrame
as part of output of summary, and we can also use the proj4string
method to extract just this piece of information, or get the bounding box as well with bbox
:
Coordinate reference system, or CRS, information in sp
uses the proj4string
format. A very handy site to use to lookup any projection and get it’s proj4string
format is spatialreference.org. A very handy resource put together by Melanie Frazier for an R spatial workshop we did several years ago, is here: Overview of Coordinate Reference Systems (CRS) in R.
A brief digression on CRS and projections in R
Dealing with coordinate reference systems and projections is a big part of working with spatial data in R, and it’s really relatively straightforward once you get the hang of it. Here are some of the fundamentals:
- CRS can be geographic (lat/lon), projected, or NA in R
- Data with different CRS MUST be transformed to common CRS in R
- Projections in
sp
are provided in PROJ4 strings in the proj4string slot of an object - http://www.spatialreference.org/
- Useful
rgdal
package functions:- projInfo(type=’datum’)
- projInfo(type=’ellps’)
- projInfo(type=’proj’)
- For
sp
class objects:- To get the CRS: proj4string(x)
- To assign the CRS:
- Use either EPSG code or PROJ.4:
- proj4string(x) <- CRS(“+init=epsg:4269”)
- proj4string(x) <- CRS(“+proj=utm +zone=10 +datum=WGS84”)
- Use either EPSG code or PROJ.4:
- To transform CRS
- x <- spTransform(x, CRS(“+init=epsg:4238”))
- x <- spTransform(x, proj4string(y))
- For rasters (we’ll focus on rasters later, but mention projections here):
- To get the CRS: projection(x)
- To transform CRS: projectRaster(x)
We can use the generic plot function in R to produce a quick plot add axes as well-axes option puts box around region
And we can combine and use state borders from maps package in our map
We can also use subsetting with plotting with this stream gage data to symbolize our gages by state for instance - try the following lines, try different colors or border states
Now let’s load the Rdata object we downloaded at beginning of this session - Rdata files are just a handy way of saving and reloading your workspace - remember, R works with objects in memory, you can save them out in this format or share with others this way.
Let’s look at a SptialPolygonsDataframe
of HUCs and dig into slot structure for polygon data in sp
What are the following lines of code doing? - welcome to the wonderful world of slots in R
What are the slots within each element of the HUCs SpatialPolygonDataFrame object polygons slot?
What method do you use to list them?
How would we code a way to extract the HUCs polygon with the smallest area? Hint - apply family of functions and slots - try on your own and then take a look at the function that I included as part of HUCs.RData file.
Using the over
function, we can find out what HUC every stream gage is in quite easily:
There’s a fair bit to unpack there, so ask questions!
A method for getting total area of our HUCs might use the rgeos
package and the getArea
function. Below we load the rgeos
function, and in order to get area we have to have HUCs in a planar CRS. Let’s transform to Oregon Lambert, but let’ use the epsg code (which we can look up on spatialreference.org) rather than passing a projection string to spTransform
:
-
Good Intro to R Spatial Resources: