
Universal Spatial Join - Complete Implementation
Source:R/06-spatial-join.R
universal_spatial_join.RdComprehensive spatial join system that handles ALL spatial data combinations: Vector to Vector, Vector to Raster, Raster to Raster with full documentation, error handling, and extensive examples. This replaces all previous spatial join functions with a unified, robust system.
Usage
universal_spatial_join(
source_data,
target_data,
method = "auto",
scale_factor = NULL,
summary_function = "mean",
buffer_distance = NULL,
temporal_tolerance = NULL,
crs_target = NULL,
na_strategy = "remove",
chunk_size = 1e+06,
parallel = FALSE,
verbose = FALSE
)Arguments
- source_data
Source spatial data. Can be:
File path: "/path/to/data.tif" or "/path/to/data.shp"
Directory: "/path/to/spatial_files/" (processes all spatial files)
R object: SpatRaster, sf object, data.frame with coordinates
List: Multiple files, raster stack, or sf objects
- target_data
Target spatial data (same format options as source_data). Can be NULL for scaling operations with scale_factor.
- method
Spatial join method:
"auto": Automatically detect best method (default)
"extract": Extract raster values to vector features
"overlay": Spatial intersection/overlay of vectors
"resample": Resample raster to match target geometry
"zonal": Calculate zonal statistics (raster → vector)
"nearest": Nearest neighbor spatial join
"interpolate": Spatial interpolation (IDW, kriging)
"temporal": Time-aware spatial join
- scale_factor
Numeric (> 0 if provided). Scale factor for resolution changes:
NULL: Use target data resolution (default)> 1: Coarser resolution (e.g., 2 = half resolution)< 1: Finer resolution (e.g., 0.5 = double resolution)Custom: Any positive number for specific scaling
- summary_function
Character. Function for aggregating overlapping values:
"mean": Average values (default for continuous data)
"median": Median values (robust to outliers)
"max"/"min": Maximum/minimum values
"sum": Sum values (useful for counts, areas)
"sd": Standard deviation (measure variability)
"mode"/"majority": Most frequent value (categorical data)
- buffer_distance
Numeric (>= 0 if provided). Buffer distance in map units:
For point extraction: Buffer around points
For line extraction: Buffer along lines
For nearest neighbor: Search radius
Units: Same as source data CRS (meters, degrees, etc.)
- temporal_tolerance
Numeric (>= 0 if provided). Time tolerance for temporal joins (in days):
Maximum time difference for matching observations
Only used with method = "temporal"
Example: 7 = match within 7 days
- crs_target
Character or numeric. Target coordinate reference system:
EPSG code: 4326, 3857, etc.
PROJ string: "+proj=utm +zone=33 +datum=WGS84"
NULL: Use source data CRS (default)
- na_strategy
Character. Strategy for handling NA values:
"remove": Keep NAs as missing (default)
"nearest": Replace with nearest neighbor value
"interpolate": Spatial interpolation of NAs
"zero": Replace NAs with zero
- chunk_size
Numeric (> 0). Chunk size for processing large datasets:
Number of features/cells to process at once
Larger = faster but more memory
Smaller = slower but less memory
Default: 1,000,000
- parallel
Logical. Use parallel processing:
TRUE: Use multiple cores (faster for large data)
FALSE: Single core processing (default)
Requires 'parallel' package
- verbose
Logical. Print detailed progress messages:
TRUE: Show processing steps and diagnostics
FALSE: Silent processing (default)
Value
Spatial data object with joined attributes. Return type depends on operation:
- extract (vector → raster)
sf object with new columns containing extracted raster values. Original geometry preserved, new columns named "extracted_" followed by the raster layer name
- zonal (raster → vector)
sf object with new columns containing zonal statistics. Original geometry preserved, new columns named "zonal_" followed by the statistic name and raster layer name
- resample (raster → raster)
SpatRaster with resampled/processed data matching target resolution or scale factor
- overlay (vector → vector)
sf object with intersected/overlaid features combining attributes from both datasets
- nearest
sf object with attributes from nearest features joined
Returned objects include 'spatial_join_info' attribute containing:
method: Join method used
source_type, target_type: Data types processed
processing_time: Time taken (if verbose=TRUE)
timestamp: Processing timestamp
summary_function: Aggregation function used
Details
Quick Start Guide:
Most common use case - extract raster values to point locations:
result <- universal_spatial_join("my_points.csv", "my_raster.tif", method="extract")Supported Operations:
Data Type Combinations:
Vector → Raster: Extract raster values to points/polygons/lines
Raster → Vector: Calculate zonal statistics for polygons
Raster → Raster: Resample, overlay, mathematical operations
Vector → Vector: Spatial intersections, overlays, nearest neighbor
Input Format Support:
File paths: ".tif", ".shp", ".gpkg", ".geojson", ".nc"
Directories: Automatically processes all spatial files
R objects: SpatRaster, sf, data.frame with coordinates
Lists: Multiple files or raster stacks
Scaling Operations:
Up-scaling: Aggregate to coarser resolution (scale_factor > 1)
Down-scaling: Interpolate to finer resolution (scale_factor < 1)
Custom resolution: Match target raster geometry
Error Handling:
Auto CRS reprojection: Handles coordinate system mismatches
Geometry alignment: Auto-crops, extends, or resamples as needed
NA handling: Multiple strategies for missing data
Memory management: Chunked processing for large datasets
Method Selection Guide:
- extract
Use when you have point/polygon locations and want to get values from a raster
- zonal
Use when you have polygons and want statistics from raster data within each polygon
- resample
Use when you need to change raster resolution or align two rasters
- overlay
Use when joining two vector datasets based on spatial relationships
- nearest
Use when you want to find the closest features between two vector datasets
- auto
Let the function choose - works well for standard extract/resample operations
Common Error Solutions
- CRS Mismatch
"CRS mismatch detected" - Function automatically reprojects data, but manual CRS checking recommended for precision
- Memory Issues
"Large dataset processing" - Reduce chunk_size parameter (try 500000) or set parallel=FALSE
- No Spatial Overlap
"No spatial overlap found" - Check that source and target data cover the same geographic area
- File Not Found
"File does not exist" - Verify file paths and ensure files exist at specified locations
- Missing Bands
"Required bands not found" - For raster operations, ensure expected spectral bands are present
- Invalid Geometries
"Geometry errors" - Function attempts to fix automatically, but check input data quality
Performance Tips
For large datasets (>1M cells): set chunk_size=500000 and parallel=TRUE
Use method="resample" with scale_factor > 1 to reduce data size before complex operations
For time series analysis: consider temporal_tolerance to balance accuracy vs processing speed
When processing multiple datasets: ensure consistent CRS to avoid reprojection overhead
For point extraction: use smaller buffer_distance when possible to reduce processing time
See also
raster_to_raster_opsfor specialized raster operationsmultiscale_operationsfor multi-scale analysisprocess_vector_datafor vector data preprocessing
Examples
if (FALSE) { # \dontrun{
# These examples require satellite imagery files (Landsat/Sentinel data etc.)
# =================================================================
# MOST COMMON USE CASE: Extract raster values to CSV points
# =================================================================
# Your typical workflow: CSV file with coordinates + raster file
results <- universal_spatial_join(
source_data = "my_field_sites.csv", # CSV with lon, lat columns
target_data = "satellite_image.tif", # Any raster file
method = "extract", # Extract raster values to points
buffer_distance = 100, # 100m buffer around each point
summary_function = "mean", # Average within buffer
verbose = TRUE # See what's happening
)
# Check results - original data + new columns with raster values
head(results)
# site_id lon lat geometry extracted_satellite_image
# 1 1 -83.12345 40.12345 POINT (-83.1 40.1) 0.752
# 2 2 -83.23456 40.23456 POINT (-83.2 40.2) 0.681
# 3 3 -83.34567 40.34567 POINT (-83.3 40.3) 0.594
# Access the extracted values
results$extracted_satellite_image
# =================================================================
# ZONAL STATISTICS: Calculate statistics by polygon areas
# =================================================================
# Calculate average precipitation by watershed
watershed_precip <- universal_spatial_join(
source_data = "precipitation_raster.tif", # Raster data
target_data = "watershed_boundaries.shp", # Polygon boundaries
method = "zonal", # Calculate zonal statistics
summary_function = "mean", # Average precipitation per watershed
verbose = TRUE
)
# Result: polygons with precipitation statistics
head(watershed_precip)
# watershed_id geometry zonal_mean_precipitation_raster
# 1 1 POLYGON ((-84.2 40.1, ...)) 42.3
# 2 2 POLYGON ((-84.5 40.3, ...)) 38.7
# =================================================================
# RESAMPLE RASTER: Change resolution or align rasters
# =================================================================
# Aggregate 30m Landsat to 250m MODIS resolution
landsat_resampled <- universal_spatial_join(
source_data = "landsat_30m.tif", # High resolution input
target_data = "modis_250m.tif", # Target resolution template
method = "resample", # Resample operation
summary_function = "mean", # Average when aggregating
verbose = TRUE
)
# Check new resolution
terra::res(landsat_resampled)
# [1] 250 250
# Scale by factor instead of template
coarser_raster <- universal_spatial_join(
source_data = "fine_resolution.tif",
target_data = NULL, # No template needed
method = "resample",
scale_factor = 5, # 5x coarser resolution
summary_function = "mean"
)
# =================================================================
# VECTOR OVERLAY: Join two vector datasets
# =================================================================
# Find which counties contain each field site
sites_with_counties <- universal_spatial_join(
source_data = "field_sites.shp", # Point data
target_data = "county_boundaries.shp", # Polygon data
method = "overlay", # Spatial intersection
verbose = TRUE
)
# Result: points with county attributes added
head(sites_with_counties)
# site_id geometry county_name state_name
# 1 1 POINT (-83.1 40.1) Franklin Ohio
# 2 2 POINT (-83.2 40.2) Delaware Ohio
# =================================================================
# AUTO-DETECTION: Let function choose best method
# =================================================================
# Function automatically detects: points + raster = extract method
auto_result <- universal_spatial_join(
source_data = my_points, # Any point data
target_data = my_raster, # Any raster data
method = "auto", # Automatically choose method
verbose = TRUE # See what method was chosen
)
# Output: "Auto-detected method: extract for vector to raster"
# =================================================================
# ERROR HANDLING EXAMPLES
# =================================================================
# Function handles common issues automatically
robust_result <- universal_spatial_join(
source_data = "points_wgs84.csv", # WGS84 coordinate system
target_data = "raster_utm.tif", # UTM coordinate system
method = "extract",
na_strategy = "nearest", # Handle missing values
verbose = TRUE # See CRS handling messages
)
# Output: "CRS mismatch detected. Reprojecting to match raster CRS..."
} # }