Accessing an example geoparquet file in an S3 bucket and pulling data in only within a bounding box and for certain criteria
geoparquet
R
arrow
Author
Marc Weber
Published
December 19, 2025
Geoparquet extends the parquet file format and as nicely described in this blog post by Kyle Barron and provides a powerful new way to store and share geospatial data in a cloud-optimized format. I’ve been using it for more and more of my spatial data, and below is just a quick example using Overture Maps buildings data and doing spatial and attribute filtering to subset the data prior to reading in.
First open a connection to a cloud-hosted GeoParquet file
For this example we use Overture Maps buildings (public S3 bucket). We’ll open a connection (but we are not actually reading it in yet)
library(arrow)library(dplyr)library(sf)library(sfarrow)# Connect to Overture S3 (anonymous, us-west-2)bucket <-s3_bucket("overturemaps-us-west-2", anonymous =TRUE, region ="us-west-2")ds_path <- bucket$path("release/2025-12-17.0/theme=buildings/type=building")buildings_ds <-open_dataset(ds_path, format ="parquet")# Inspect available columns to confirm tile partitioningprint(buildings_ds$schema$names) # look for "z", "x", "y" as partition columns