Learn how to handle spatial data analysis using R.
Spatial
R
Author
Isaac Ajao
Published
December 18, 2024
In this blog, I will demonstrate how to perform spatial data analysis using R, focusing on techniques for exploring and modeling spatial relationships in your data. You’ll learn how to handle spatial data, visualize spatial patterns, and apply spatial econometric models to gain deeper insights into spatial dependencies.
Case study: Reading Culture Among Higher Education Students in Southwestern Nigeria.
Introduction
Spatial econometrics is a branch of econometrics that deals with the modeling and analysis of spatially dependent data. In traditional econometrics, observations are often assumed to be independent of each other, but in spatial econometrics, it is recognized that observations from nearby locations can influence each other, creating spatial dependence.
This analysis incorporates spatial relationships into models through spatial weights matrices, which define the structure of these dependencies. Two common forms of spatial dependence are spatial lag (where the value of a variable in one location depends on the values of the same variable in nearby locations) and spatial error (where errors in one location are correlated with errors in nearby locations).
Spatial econometrics is especially useful in fields like regional economics, real estate, environmental studies, and geography, where spatial factors significantly impact the relationships being studied. It allows for more accurate modeling and understanding of phenomena such as housing prices, land use, and even the spread of diseases, taking into account the spatial proximity of observations.
To access larger datasets in this package, install the spDataLarge
package with: `install.packages('spDataLarge',
repos='https://nowosad.github.io/drat/', type='source')`
The following objects are masked from 'package:spdep':
get.ClusterOption, get.coresOption, get.mcOption,
get.VerboseOption, get.ZeroPolicyOption, set.ClusterOption,
set.coresOption, set.mcOption, set.VerboseOption,
set.ZeroPolicyOption
ID_0 ISO NAME_0 ID_1
Min. :163 Length:38 Length:38 Min. : 1.00
1st Qu.:163 Class :character Class :character 1st Qu.:10.25
Median :163 Mode :character Mode :character Median :19.50
Mean :163 Mean :19.50
3rd Qu.:163 3rd Qu.:28.75
Max. :163 Max. :38.00
NAME_1 TYPE_1 ENGTYPE_1 NL_NAME_1
Length:38 Length:38 Length:38 Length:38
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
VARNAME_1 reading_hr books_read cgpa
Length:38 Min. :0.000 Min. : 0.000 Min. :0.000
Class :character 1st Qu.:2.250 1st Qu.: 3.000 1st Qu.:2.962
Mode :character Median :2.735 Median : 4.000 Median :3.085
Mean :2.753 Mean : 4.395 Mean :2.904
3rd Qu.:3.060 3rd Qu.: 5.000 3rd Qu.:3.413
Max. :6.000 Max. :12.000 Max. :3.930
love_readi met_standa finis_book long
Min. :0.000 Min. :0.000 Min. :0.000 Min. : 3.474
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.: 5.672
Median :2.000 Median :2.000 Median :2.000 Median : 7.314
Mean :1.974 Mean :1.947 Mean :1.895 Mean : 7.556
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.: 8.705
Max. :3.000 Max. :3.000 Max. :3.000 Max. :14.450
lat geometry
Min. : 4.772 MULTIPOLYGON :38
1st Qu.: 6.525 epsg:4326 : 0
Median : 8.089 +proj=long...: 0
Mean : 8.597
3rd Qu.:10.686
Max. :13.110
Reading layer `reading_cultre' from data source
`C:\Users\user\Desktop\softdataconsult.github.io\blog\posts\spatial\reading_cultre.shp'
using driver `ESRI Shapefile'
Simple feature collection with 38 features and 17 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2.668431 ymin: 4.270418 xmax: 14.67642 ymax: 13.89201
Geodetic CRS: WGS 84
# Set up a 2x3 layout with tmaptm_layout<-tm_layout(title =c("Reading Hour", "Books Read", "CGPA", "Love Reading", "At least 1 hr/day", "Finished last book"), frame =FALSE, asp =0)# Set aspect ratio to 0 for individual map customization# Plot the first maptm1<-tm_shape(reading_sf)+tm_borders()+tm_fill("reading_hr", title ="Reading Hour", style ="quantile")# Plot the second maptm2<-tm_shape(reading_sf)+tm_borders()+tm_fill("books_read", title ="Number of Books Read", style ="quantile")# Plot the third maptm3<-tm_shape(reading_sf)+tm_borders()+tm_fill("cgpa", title ="CGPA", style ="quantile")# Plot the fourth maptm4<-tm_shape(reading_sf)+tm_borders()+tm_fill("love_readi", title ="Love Reading", style ="quantile")# Plot the fifth maptm5<-tm_shape(reading_sf)+tm_borders()+tm_fill("met_standa", title ="At least 1 hr/day", style ="quantile")# Plot the sixth maptm6<-tm_shape(reading_sf)+tm_borders()+tm_fill("finis_book", title ="Finished last book", style ="quantile")# Display the maps in a 2x3 layouttmap_arrange(list(tm1, tm2, tm3, tm4, tm5, tm6), layout =tm_layout)
Legend labels were too wide. The labels have been resized to 0.55, 0.55, 0.55, 0.55, 0.55. Increase legend.width (argument of tm_layout) to make the legend wider and therefore the labels larger.
Legend labels were too wide. The labels have been resized to 0.55, 0.55, 0.55, 0.55, 0.55. Increase legend.width (argument of tm_layout) to make the legend wider and therefore the labels larger.
# read the data from excel for other analysisreading_data<-read.csv("reading_culture_coordinates.csv")
#define our regression equation so we don't have to type it each timereg.eq1=reading_hr~books_read+cgpa+love_readi+met_standa+finis_bookreg1=lm(reg.eq1,data=reading)
Warning in mat2listw(matrix(rbinom(nrow(reading_data)^2, 1, 0.2),
nrow(reading_data))): style is M (missing); style should be set to a valid
value
listw
Characteristics of weights list object:
Neighbour list object:
Number of regions: 38
Number of nonzero links: 255
Percentage nonzero weights: 17.65928
Average number of links: 6.710526
Non-symmetric neighbours list
Weights style: M
Weights constants summary:
n nn S0 S1 S2
M 38 1444 255 310 7264
Global Moran I for regression residuals
data:
model: lm(formula = reg.eq1, data = reading)
weights: listw
Moran I statistic standard deviate = 2.4478, p-value = 0.007186
alternative hypothesis: greater
sample estimates:
Observed Moran I Expectation Variance
0.097317729 -0.034432679 0.002896984
Let’s run the Four simplest models: OLS, SLX, Lag Y, and Lag Error
Global Moran I for regression residuals
data:
model: lm(formula = reg.eq1, data = reading)
weights: listw
Moran I statistic standard deviate = 2.4478, p-value = 0.007186
alternative hypothesis: greater
sample estimates:
Observed Moran I Expectation Variance
0.097317729 -0.034432679 0.002896984
Please update scripts to use lm.RStests in place of lm.LMtests
Warning in lm.RStests(model = model, listw = listw, zero.policy = zero.policy,
: Spatial weights matrix not row standardized
Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
dependence
data:
model: lm(formula = reg.eq1, data = reading)
test weights: listw
RSerr = 1.9866, df = 1, p-value = 0.1587
Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
dependence
data:
model: lm(formula = reg.eq1, data = reading)
test weights: listw
RSlag = 0.047814, df = 1, p-value = 0.8269
Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
dependence
data:
model: lm(formula = reg.eq1, data = reading)
test weights: listw
adjRSerr = 2.5486, df = 1, p-value = 0.1104
Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
dependence
data:
model: lm(formula = reg.eq1, data = reading)
test weights: listw
adjRSlag = 0.6098, df = 1, p-value = 0.4349
Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
dependence
data:
model: lm(formula = reg.eq1, data = reading)
test weights: listw
SARMA = 2.5964, df = 2, p-value = 0.273
SLX (Spatial Lag Model with Exogenous Variables)
# Assuming your data is named 'your_data'# Load required packageslibrary(spatialreg)# Create an 'sf' object with the spatial coordinatesreading_sf_data<-st_as_sf(reading_data, coords =c("long", "lat"))# Create a spatial weights matrix (queen contiguity)listw<-mat2listw(matrix(rbinom(nrow(reading_data)^2, 1, 0.2), nrow(reading_data)))
Warning in mat2listw(matrix(rbinom(nrow(reading_data)^2, 1, 0.2),
nrow(reading_data))): style is M (missing); style should be set to a valid
value
Do you enjoy my blog? Subscribe here to get notifications and updates (it's absolutely free!):
Source Code
---title: "Spatial Econometrics Data Analysis"description: | Learn how to handle spatial data analysis using R.date: todayauthor: "Isaac Ajao"#date: "2024-10-06"categories: [Spatial, R]image: "spatial2.png"---In this blog, I will demonstrate how to perform spatial data analysis using R, focusing on techniques for exploring and modeling spatial relationships in your data. You'll learn how to handle spatial data, visualize spatial patterns, and apply spatial econometric models to gain deeper insights into spatial dependencies. **Case study:** Reading Culture Among Higher Education Students in Southwestern Nigeria.### Introduction Spatial econometrics is a branch of econometrics that deals with the modeling and analysis of spatially dependent data. In traditional econometrics, observations are often assumed to be independent of each other, but in spatial econometrics, it is recognized that observations from nearby locations can influence each other, creating spatial dependence.This analysis incorporates spatial relationships into models through spatial weights matrices, which define the structure of these dependencies. Two commonforms of spatial dependence are spatial lag (where the value of a variable in one location depends on the values of the same variable in nearby locations) and spatial error (where errors in one location are correlated with errors in nearby locations).Spatial econometrics is especially useful in fields like regional economics, real estate, environmental studies, and geography, where spatial factors significantly impact the relationships being studied. It allows for more accurate modeling and understanding of phenomena such as housing prices, land use, and even the spread of diseases, taking into account the spatial proximity of observations.### Load the necessary libraries```{r}library(sf)library(spdep)library(splm)library(spatialreg)library(sp)library(stargazer)```### Reading the shapefile containing the data```{r}## Reading shape file containing the datareading =st_read("reading_cultre.shp",quiet =TRUE)names(reading) #show variable namessummary(reading)head(reading)plot(reading)``````{r}class(reading)str(reading)st_is_longlat(reading) # checking whether the geographical coordinates have been #projected (the result TRUE means not) st_crs(reading) #checking which mapping was appliedtable(st_is_valid(reading)) # validationreading_sp<-as(reading, "Spatial") class(reading_sp)``````{r}reading_points<-st_cast(reading$geometry, "MULTIPOINT")``````{r}reading_points_count<-sapply(reading_points, length)sum(reading_points_count) # Checking how many vertices are in all counties``````{r}reading_simple<-st_simplify(reading, dTolerance =50)``````{r}reading_simple_points<-st_cast(reading_simple$geometry, "MULTIPOINT")sum(sapply(reading_simple_points, length))``````{r}reading_central<-st_centroid(reading)``````{r}plot(st_geometry(reading))reading_central<-st_centroid(reading)plot(reading_central$geometry, add=TRUE, pch=20, col="red")``````{r}library(ggplot2)ggplot(reading_simple) +geom_sf() +theme_bw() ggplot(reading_central) +geom_sf() +theme_bw() ``````{r}ggplot(reading_simple) +geom_sf(aes(fill = reading_hr)) +theme_bw()``````{r}ggplot() +geom_sf(data=reading_simple, aes(fill=reading_hr)) +geom_sf(data=reading_central, col="red") +theme_bw()``````{r}ggplot() +geom_sf(data=reading_simple, aes(fill=books_read)) +geom_sf(data=reading_central, col="red") +theme_bw()``````{r}reading_sf =st_read("reading_cultre.shp") #shape file earlier createdplot(reading_sf)``````{r}names(reading)reading_sf =st_read("reading_cultre.shp") #shape file earlier createdplot(reading_sf["reading_hr"], main ="Spatial distn. of students' reading hour", breaks ="quantile")``````{r}reading_sf =st_read("reading_cultre.shp") #shape file earlier createdplot(reading_sf["books_read"], main ="Spatial distn. of number of books read", breaks ="quantile")#legend("topright", legend = "books_read", fill = topo.colors(5))``````{r}reading_sf <-st_read("reading_cultre.shp")plot(reading_sf["cgpa"], main ="Spatial Distribution of Students' CGPA", breaks ="quantile")# Add a legend#legend("topright", legend = "CGPA", fill = topo.colors(5))``````{r}reading_sf =st_read("reading_cultre.shp") #shape file earlier createdplot(reading_sf["love_readi"], main ="Spatial distn. of students who love reading", breaks ="quantile")#legend("topright", legend = "Love reading", fill = topo.colors(5))``````{r}reading_sf =st_read("reading_cultre.shp") #shape file earlier createdplot(reading_sf["met_standa"], main ="Spatial distn. of readers who read at least 1 hour per day", breaks ="quantile")``````{r}reading_sf =st_read("reading_cultre.shp") #shape file earlier createdplot(reading_sf["finis_book"], main ="Spatial distn. of readers who finished their last book", breaks ="quantile")```### Six maps in one frame```{r}library(tmap)# Read shapefilereading_sf <-st_read("reading_cultre.shp")# Set up a 2x3 layout with tmaptm_layout <-tm_layout(title =c("Reading Hour", "Books Read", "CGPA", "Love Reading", "At least 1 hr/day", "Finished last book"),frame =FALSE,asp =0) # Set aspect ratio to 0 for individual map customization# Plot the first maptm1 <-tm_shape(reading_sf) +tm_borders() +tm_fill("reading_hr", title ="Reading Hour", style ="quantile")# Plot the second maptm2 <-tm_shape(reading_sf) +tm_borders() +tm_fill("books_read", title ="Number of Books Read", style ="quantile")# Plot the third maptm3 <-tm_shape(reading_sf) +tm_borders() +tm_fill("cgpa", title ="CGPA", style ="quantile")# Plot the fourth maptm4 <-tm_shape(reading_sf) +tm_borders() +tm_fill("love_readi", title ="Love Reading", style ="quantile")# Plot the fifth maptm5 <-tm_shape(reading_sf) +tm_borders() +tm_fill("met_standa", title ="At least 1 hr/day", style ="quantile")# Plot the sixth maptm6 <-tm_shape(reading_sf) +tm_borders() +tm_fill("finis_book", title ="Finished last book", style ="quantile")# Display the maps in a 2x3 layouttmap_arrange(list(tm1, tm2, tm3, tm4, tm5, tm6), layout = tm_layout)``````{r}# read the data from excel for other analysisreading_data <-read.csv("reading_culture_coordinates.csv")``````{r}#define our regression equation so we don't have to type it each timereg.eq1=reading_hr ~ books_read + cgpa + love_readi + met_standa + finis_bookreg1=lm(reg.eq1,data=reading)```### Spatial weights matrix W```{r}# Create a spatial weights matrix (queen contiguity)listw <-mat2listw(matrix(rbinom(nrow(reading_data)^2, 1, 0.2), nrow(reading_data)))listw```### lm.morantest```{r}lm.morantest(reg1, listw )```Let's run the Four simplest models: OLS, SLX, Lag Y, and Lag Error### OLS model```{r}library(spdep)library(spatialreg)reg1=lm(reg.eq1,data=reading)reg1b=lm(reading_hr ~ books_read + cgpa + love_readi + met_standa + finis_book, data=reading)summary(reg1)summary(reg1b)lm.morantest(reg1,listw)#lm.moranplot(reg1,listw1)lm.LMtests(reg1,listw,test=c("LMerr", "LMlag", "RLMerr", "RLMlag", "SARMA"))```### SLX (Spatial Lag Model with Exogenous Variables)```{r}# Assuming your data is named 'your_data'# Load required packageslibrary(spatialreg)# Create an 'sf' object with the spatial coordinatesreading_sf_data <-st_as_sf(reading_data, coords =c("long", "lat"))# Create a spatial weights matrix (queen contiguity)listw <-mat2listw(matrix(rbinom(nrow(reading_data)^2, 1, 0.2), nrow(reading_data)))```