Spatial Statistics

Spatial statistics

Outline for this part: - Graphs & spatial weight matrices - Spatial autocorrelation - Standard geographic regression models

import numpy as np
from pysal.lib import weights
import pandas as pd
import shapely as sp
import contextily as ctx
import geopandas as gpd
import networkx as nx
import matplotlib.pyplot as plt
import esda
from splot.esda import plot_moran, lisa_cluster
from pysal.model import spreg

constituencies_shapes = gpd.read_file("./data/constituencies_shape/circonscriptions_legislatives_030522.shp")
constituencies_shapes = constituencies_shapes[~constituencies_shapes["id_circo"].str.startswith("97")]
constituencies_shapes.explore()

Make this Notebook Trusted to load map: File -> Trust Notebook

constituencies_x = pd.read_csv("./data/constituencies_x.csv")
constituencies_x["id_circo"] = constituencies_x["id_circo"].astype(str).str[0:2] + constituencies_x["id_circo"].astype(str).str[3:5]
constituencies_y = pd.read_csv("./data/constituencies_y.csv")

constituencies = constituencies_shapes \
    .merge(constituencies_x, on = "id_circo", how = "left") \
    .merge(constituencies_y, on = "id_circo", how = "left")

constituencies

	id_circo	dep	libelle	geometry	Nom de la circonscription	mean_age	unemployement_rate	D1_diff	D9_diff	rpt_D9_D1_diff	vote_share_rn
0	3803	38	Isère - 3e circonscription	MULTIPOLYGON (((5.70136 45.18796, 5.70136 45.1...	Isère - 3e circonscription	38.1	7.2	10530	36370	3.5	20.41
1	3801	38	Isère - 1re circonscription	MULTIPOLYGON (((5.74302 45.16638, 5.74259 45.1...	Isère - 1re circonscription	39.8	5.4	12080	51150	4.2	14.39
2	3810	38	Isère - 10e circonscription	MULTIPOLYGON (((5.67138 45.47904, 5.66725 45.4...	Isère - 10e circonscription	37.8	6.1	12240	36130	3	38.43
3	3804	38	Isère - 4e circonscription	MULTIPOLYGON (((5.80147 44.70678, 5.77463 44.6...	Isère - 4e circonscription	41.9	4.5	13160	39110	3	28.41
4	3802	38	Isère - 2e circonscription	MULTIPOLYGON (((5.87334 45.09739, 5.88031 45.0...	Isère - 2e circonscription	39.0	5.8	11760	37610	3.2	27.67
...	...	...	...	...	...	...	...	...	...	...	...
534	0502	05	Hautes-Alpes - 2e circonscription	MULTIPOLYGON (((6.23277 44.46288, 6.24026 44.4...	Hautes-Alpes - 2e circonscription	44.6	3.8	11980	35190	2.9	29.44
535	0401	04	Alpes-de-Haute-Provence - 1re circonscription	MULTIPOLYGON (((6.84988 43.9141, 6.83595 43.91...	Alpes de Haute-Provence - 1re circonscription	45.9	6.0	11310	34120	3	38.50
536	4204	42	Loire - 4e circonscription	MULTIPOLYGON (((4.54255 45.24218, 4.53626 45.2...	Loire - 4e circonscription	41.9	4.9	12140	34470	2.8	38.62
537	0703	07	Ardèche - 3e circonscription	POLYGON ((4.05709 44.36415, 4.05709 44.36415, ...	Ardèche - 3e circonscription	46.7	7.3	10880	33580	3.1	32.02
538	0501	05	Hautes-Alpes - 1re circonscription	MULTIPOLYGON (((6.26146 44.5461, 6.26146 44.54...	Hautes-Alpes - 1re circonscription	44.5	5.4	11770	35350	3	32.56

539 rows × 11 columns

constituencies.set_index('id_circo')

	dep	libelle	geometry	Nom de la circonscription	mean_age	unemployement_rate	D1_diff	D9_diff	rpt_D9_D1_diff	vote_share_rn
id_circo
3803	38	Isère - 3e circonscription	MULTIPOLYGON (((5.70136 45.18796, 5.70136 45.1...	Isère - 3e circonscription	38.1	7.2	10530	36370	3.5	20.41
3801	38	Isère - 1re circonscription	MULTIPOLYGON (((5.74302 45.16638, 5.74259 45.1...	Isère - 1re circonscription	39.8	5.4	12080	51150	4.2	14.39
3810	38	Isère - 10e circonscription	MULTIPOLYGON (((5.67138 45.47904, 5.66725 45.4...	Isère - 10e circonscription	37.8	6.1	12240	36130	3	38.43
3804	38	Isère - 4e circonscription	MULTIPOLYGON (((5.80147 44.70678, 5.77463 44.6...	Isère - 4e circonscription	41.9	4.5	13160	39110	3	28.41
3802	38	Isère - 2e circonscription	MULTIPOLYGON (((5.87334 45.09739, 5.88031 45.0...	Isère - 2e circonscription	39.0	5.8	11760	37610	3.2	27.67
...	...	...	...	...	...	...	...	...	...	...
0502	05	Hautes-Alpes - 2e circonscription	MULTIPOLYGON (((6.23277 44.46288, 6.24026 44.4...	Hautes-Alpes - 2e circonscription	44.6	3.8	11980	35190	2.9	29.44
0401	04	Alpes-de-Haute-Provence - 1re circonscription	MULTIPOLYGON (((6.84988 43.9141, 6.83595 43.91...	Alpes de Haute-Provence - 1re circonscription	45.9	6.0	11310	34120	3	38.50
4204	42	Loire - 4e circonscription	MULTIPOLYGON (((4.54255 45.24218, 4.53626 45.2...	Loire - 4e circonscription	41.9	4.9	12140	34470	2.8	38.62
0703	07	Ardèche - 3e circonscription	POLYGON ((4.05709 44.36415, 4.05709 44.36415, ...	Ardèche - 3e circonscription	46.7	7.3	10880	33580	3.1	32.02
0501	05	Hautes-Alpes - 1re circonscription	MULTIPOLYGON (((6.26146 44.5461, 6.26146 44.54...	Hautes-Alpes - 1re circonscription	44.5	5.4	11770	35350	3	32.56

539 rows × 10 columns

paris = constituencies.query("dep == '75'").to_crs(epsg = "2154")
paris.explore()

Make this Notebook Trusted to load map: File -> Trust Notebook

Graphs & spatial weight matrices

Mathematical representation of geometries

One may want to encode as mathematical objects, e.g. numbers,some relationship between two geographic units:

Are unit A and B neighbours?
How “far” is unit A from unit B?
How “strong” is the relationship between unit A and unit B?

Geometries are not useful for this, but graphs are.

Graphs are a data structure with nodes and a set of connection between them called edges.

In our case, nodes might be geometric objects and the relationships (or absence of) between them the edges.

Are two geometries neighbours?

One possible definition may be that A and B are neighbours if they share at least one vertex or one edge

W = weights.contiguity.Queen.from_dataframe(paris, idVariable = "id_circo")

/tmp/ipykernel_26467/3174741056.py:1: FutureWarning: `idVariable` is deprecated and will be removed in future. Use `ids` instead.
  W = weights.contiguity.Queen.from_dataframe(paris, idVariable = "id_circo")

centroids = np.column_stack((paris.centroid.x, paris.centroid.y))
graph = W.to_networkx()
positions = dict(zip(graph.nodes, centroids))

# plot with a nice basemap
ax = paris.plot(linewidth=1, edgecolor="grey", facecolor="white")
nx.draw(graph, positions, ax=ax, node_size=5, node_color="r")
plt.show()

W.neighbors

{'7506': ['7515', '7508', '7516', '7505', '7507'],
 '7505': ['7501', '7517', '7516', '7506', '7518', '7507'],
 '7502': ['7511', '7501', '7512', '7514', '7509', '7504', '7507'],
 '7501': ['7502', '7504', '7503', '7505', '7518', '7507'],
 '7504': ['7502', '7501', '7503', '7514'],
 '7517': ['7516', '7505', '7518'],
 '7512': ['7502', '7511', '7514', '7510', '7513'],
 '7515': ['7508', '7516', '7506'],
 '7511': ['7502', '7512', '7509', '7510', '7513'],
 '7513': ['7511', '7510', '7512', '7514'],
 '7508': ['7509', '7507', '7506', '7515'],
 '7507': ['7502', '7501', '7505', '7508', '7509', '7506'],
 '7503': ['7501', '7504', '7518'],
 '7516': ['7505', '7517', '7506', '7515'],
 '7518': ['7501', '7517', '7505', '7503'],
 '7514': ['7502', '7504', '7513', '7512'],
 '7510': ['7511', '7509', '7513', '7512'],
 '7509': ['7502', '7511', '7508', '7510', '7507']}

W.weights

{'7506': [1.0, 1.0, 1.0, 1.0, 1.0],
 '7505': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
 '7502': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
 '7501': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
 '7504': [1.0, 1.0, 1.0, 1.0],
 '7517': [1.0, 1.0, 1.0],
 '7512': [1.0, 1.0, 1.0, 1.0, 1.0],
 '7515': [1.0, 1.0, 1.0],
 '7511': [1.0, 1.0, 1.0, 1.0, 1.0],
 '7513': [1.0, 1.0, 1.0, 1.0],
 '7508': [1.0, 1.0, 1.0, 1.0],
 '7507': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
 '7503': [1.0, 1.0, 1.0],
 '7516': [1.0, 1.0, 1.0, 1.0],
 '7518': [1.0, 1.0, 1.0, 1.0],
 '7514': [1.0, 1.0, 1.0, 1.0],
 '7510': [1.0, 1.0, 1.0, 1.0],
 '7509': [1.0, 1.0, 1.0, 1.0, 1.0]}

Adjacency matrix

pd.DataFrame(*W.full()).astype(int)

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17
7506	0	1	0	0	0	0	0	1	0	0	1	1	0	1	0	0	0	0
7505	1	0	0	1	0	1	0	0	0	0	0	1	0	1	1	0	0	0
7502	0	0	0	1	1	0	1	0	1	0	0	1	0	0	0	1	0	1
7501	0	1	1	0	1	0	0	0	0	0	0	1	1	0	1	0	0	0
7504	0	0	1	1	0	0	0	0	0	0	0	0	1	0	0	1	0	0
7517	0	1	0	0	0	0	0	0	0	0	0	0	0	1	1	0	0	0
7512	0	0	1	0	0	0	0	0	1	1	0	0	0	0	0	1	1	0
7515	1	0	0	0	0	0	0	0	0	0	1	0	0	1	0	0	0	0
7511	0	0	1	0	0	0	1	0	0	1	0	0	0	0	0	0	1	1
7513	0	0	0	0	0	0	1	0	1	0	0	0	0	0	0	1	1	0
7508	1	0	0	0	0	0	0	1	0	0	0	1	0	0	0	0	0	1
7507	1	1	1	1	0	0	0	0	0	0	1	0	0	0	0	0	0	1
7503	0	0	0	1	1	0	0	0	0	0	0	0	0	0	1	0	0	0
7516	1	1	0	0	0	1	0	1	0	0	0	0	0	0	0	0	0	0
7518	0	1	0	1	0	1	0	0	0	0	0	0	1	0	0	0	0	0
7514	0	0	1	0	1	0	1	0	0	1	0	0	0	0	0	0	0	0
7510	0	0	0	0	0	0	1	0	1	1	0	0	0	0	0	0	0	1
7509	0	0	1	0	0	0	0	0	1	0	1	1	0	0	0	0	1	0

Note that we went from geometries to a matrix, which we can work with using usual linear algebra tools

W.cardinalities

{'7506': 5,
 '7505': 6,
 '7502': 7,
 '7501': 6,
 '7504': 4,
 '7517': 3,
 '7512': 5,
 '7515': 3,
 '7511': 5,
 '7513': 4,
 '7508': 4,
 '7507': 6,
 '7503': 3,
 '7516': 4,
 '7518': 4,
 '7514': 4,
 '7510': 4,
 '7509': 5}

W.transform = "R"
pd.DataFrame(*W.full())

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17
7506	0.000000	0.200000	0.000000	0.000000	0.000000	0.000000	0.000000	0.20	0.000000	0.00	0.200000	0.200000	0.000000	0.200000	0.000000	0.000000	0.00	0.000000
7505	0.166667	0.000000	0.000000	0.166667	0.000000	0.166667	0.000000	0.00	0.000000	0.00	0.000000	0.166667	0.000000	0.166667	0.166667	0.000000	0.00	0.000000
7502	0.000000	0.000000	0.000000	0.142857	0.142857	0.000000	0.142857	0.00	0.142857	0.00	0.000000	0.142857	0.000000	0.000000	0.000000	0.142857	0.00	0.142857
7501	0.000000	0.166667	0.166667	0.000000	0.166667	0.000000	0.000000	0.00	0.000000	0.00	0.000000	0.166667	0.166667	0.000000	0.166667	0.000000	0.00	0.000000
7504	0.000000	0.000000	0.250000	0.250000	0.000000	0.000000	0.000000	0.00	0.000000	0.00	0.000000	0.000000	0.250000	0.000000	0.000000	0.250000	0.00	0.000000
7517	0.000000	0.333333	0.000000	0.000000	0.000000	0.000000	0.000000	0.00	0.000000	0.00	0.000000	0.000000	0.000000	0.333333	0.333333	0.000000	0.00	0.000000
7512	0.000000	0.000000	0.200000	0.000000	0.000000	0.000000	0.000000	0.00	0.200000	0.20	0.000000	0.000000	0.000000	0.000000	0.000000	0.200000	0.20	0.000000
7515	0.333333	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.00	0.000000	0.00	0.333333	0.000000	0.000000	0.333333	0.000000	0.000000	0.00	0.000000
7511	0.000000	0.000000	0.200000	0.000000	0.000000	0.000000	0.200000	0.00	0.000000	0.20	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.20	0.200000
7513	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.250000	0.00	0.250000	0.00	0.000000	0.000000	0.000000	0.000000	0.000000	0.250000	0.25	0.000000
7508	0.250000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.25	0.000000	0.00	0.000000	0.250000	0.000000	0.000000	0.000000	0.000000	0.00	0.250000
7507	0.166667	0.166667	0.166667	0.166667	0.000000	0.000000	0.000000	0.00	0.000000	0.00	0.166667	0.000000	0.000000	0.000000	0.000000	0.000000	0.00	0.166667
7503	0.000000	0.000000	0.000000	0.333333	0.333333	0.000000	0.000000	0.00	0.000000	0.00	0.000000	0.000000	0.000000	0.000000	0.333333	0.000000	0.00	0.000000
7516	0.250000	0.250000	0.000000	0.000000	0.000000	0.250000	0.000000	0.25	0.000000	0.00	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.00	0.000000
7518	0.000000	0.250000	0.000000	0.250000	0.000000	0.250000	0.000000	0.00	0.000000	0.00	0.000000	0.000000	0.250000	0.000000	0.000000	0.000000	0.00	0.000000
7514	0.000000	0.000000	0.250000	0.000000	0.250000	0.000000	0.250000	0.00	0.000000	0.25	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.00	0.000000
7510	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.250000	0.00	0.250000	0.25	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.00	0.250000
7509	0.000000	0.000000	0.200000	0.000000	0.000000	0.000000	0.000000	0.00	0.200000	0.00	0.200000	0.200000	0.000000	0.000000	0.000000	0.000000	0.20	0.000000

pd.Series(W.cardinalities).plot.hist(color="k")

Other possible definition of weights using kernels

W = weights.distance.Kernel.from_dataframe(paris, function = "gaussian")

pd.DataFrame(*W.full())

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17
52	0.398942	0.338931	0.000000	0.000000	0.000000	0.267739	0.000000	0.380092	0.000000	0.000000	0.241971	0.366320	0.000000	0.316847	0.000000	0.000000	0.000000	0.255290
53	0.338931	0.398942	0.291999	0.328731	0.000000	0.348119	0.000000	0.283863	0.000000	0.000000	0.000000	0.347477	0.264008	0.326715	0.332084	0.000000	0.000000	0.000000
55	0.000000	0.291999	0.398942	0.345396	0.000000	0.000000	0.350514	0.000000	0.362621	0.267876	0.000000	0.312283	0.000000	0.000000	0.000000	0.000000	0.292319	0.272336
56	0.000000	0.328731	0.345396	0.398942	0.298303	0.265964	0.291857	0.000000	0.249691	0.000000	0.000000	0.265977	0.332224	0.000000	0.332484	0.000000	0.000000	0.000000
57	0.000000	0.000000	0.000000	0.298303	0.398942	0.000000	0.272444	0.000000	0.000000	0.000000	0.000000	0.000000	0.288869	0.000000	0.000000	0.302015	0.000000	0.000000
58	0.267739	0.348119	0.000000	0.265964	0.000000	0.398942	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.287942	0.365686	0.362977	0.000000	0.000000	0.000000
59	0.000000	0.000000	0.350514	0.291857	0.272444	0.000000	0.398942	0.000000	0.344688	0.367152	0.000000	0.000000	0.000000	0.000000	0.000000	0.263290	0.270262	0.000000
60	0.380092	0.283863	0.000000	0.000000	0.000000	0.000000	0.000000	0.398942	0.000000	0.000000	0.265076	0.308999	0.000000	0.317139	0.000000	0.000000	0.000000	0.000000
61	0.000000	0.000000	0.362621	0.249691	0.000000	0.000000	0.344688	0.000000	0.398942	0.300098	0.000000	0.269137	0.000000	0.000000	0.000000	0.000000	0.371497	0.310528
74	0.000000	0.000000	0.267876	0.000000	0.000000	0.000000	0.367152	0.000000	0.300098	0.398942	0.000000	0.000000	0.000000	0.000000	0.000000	0.291013	0.246770	0.000000
289	0.241971	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.265076	0.000000	0.000000	0.398942	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
290	0.366320	0.347477	0.312283	0.265977	0.000000	0.000000	0.000000	0.308999	0.269137	0.000000	0.000000	0.398942	0.000000	0.250599	0.000000	0.000000	0.000000	0.324895
291	0.000000	0.264008	0.000000	0.332224	0.288869	0.287942	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.398942	0.000000	0.371598	0.000000	0.000000	0.000000
292	0.316847	0.326715	0.000000	0.000000	0.000000	0.365686	0.000000	0.317139	0.000000	0.000000	0.000000	0.250599	0.000000	0.398942	0.280075	0.000000	0.000000	0.000000
293	0.000000	0.332084	0.000000	0.332484	0.000000	0.362977	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.371598	0.280075	0.398942	0.000000	0.000000	0.000000
294	0.000000	0.000000	0.000000	0.000000	0.302015	0.000000	0.263290	0.000000	0.000000	0.291013	0.000000	0.000000	0.000000	0.000000	0.000000	0.398942	0.000000	0.000000
295	0.000000	0.000000	0.292319	0.000000	0.000000	0.000000	0.270262	0.000000	0.371497	0.246770	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.398942	0.337942
296	0.255290	0.000000	0.272336	0.000000	0.000000	0.000000	0.000000	0.000000	0.310528	0.000000	0.000000	0.324895	0.000000	0.000000	0.000000	0.000000	0.337942	0.398942

full_matrix, ids = W.full()
paris.assign(weight_0 = full_matrix[0]).plot("weight_0", cmap="Reds")

W = weights.distance.Kernel.from_dataframe(paris, function = "triangular", k=15)

full_matrix, ids = W.full()
paris.assign(weight_0 = full_matrix[0]).plot("weight_0", cmap="Reds")

Type	Description	Use Case
Queen Contiguity	Neighbors share edge OR vertex	Polygons (regions)
Rook Contiguity	Neighbors share edge only	Grid data
K-Nearest Neighbors	K closest observations	Point data
Distance Band	All within threshold distance	Point data
Kernel	Distance-weighted	Smooth spatial effects

Practice

Take the whole constituencies geoDataFrame
- build the contiguity.Queen weights matrix
- What is the distribution of the number of neighbours?
- Which constituencies have the most neighbours?
- What are the neighbours of constituency 7513?

# TODO

W = weights.contiguity.Queen.from_dataframe(constituencies, idVariable = "id_circo")

pd.Series(W.cardinalities).plot.hist(color="k")

centroids = np.column_stack((constituencies.centroid.x, constituencies.centroid.y))
graph = W.to_networkx()
positions = dict(zip(graph.nodes, centroids))

# plot with a nice basemap
ax = constituencies.plot(linewidth=1, edgecolor="grey", facecolor="white")
nx.draw(graph, positions, ax=ax, node_size=5, node_color="r")
plt.show()

W.neighbors.get('7801')
{z for z in W.cardinalities if W.cardinalities[z] == 9}

/tmp/ipykernel_26467/196323581.py:3: FutureWarning: `idVariable` is deprecated and will be removed in future. Use `ids` instead.
  W = weights.contiguity.Queen.from_dataframe(constituencies, idVariable = "id_circo")
/opt/python/lib/python3.13/site-packages/libpysal/weights/contiguity.py:347: UserWarning: The weights matrix is not fully connected: 
 There are 11 disconnected components.
 There are 5 islands with ids: 4405, 7902, 0602, 0605, 1701.
  W.__init__(self, neighbors, ids=ids, **kw)
/tmp/ipykernel_26467/196323581.py:7: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  centroids = np.column_stack((constituencies.centroid.x, constituencies.centroid.y))

{'0205', '5704', '5802', '6201', '7602', '8004'}

Spatial autocorrelation

constituencies.plot("vote_share_rn", cmap="Greys")

Global autocorrelation

How does the vote share of any given constituency relate to the constituencies of its neighbouring constituencies?

are “high” vote constituencies close to “high” vote constituencies? “low” to “low”?
or “high” to “low” and “low” to “high”?
or pure randomness?

W = weights.contiguity.Queen.from_dataframe(constituencies, ids = "id_circo")
W.transform = "R" # row standardization: rows sum to 1

('WARNING: ', '4405', ' is an island (no neighbors)')
('WARNING: ', '7902', ' is an island (no neighbors)')
('WARNING: ', '0602', ' is an island (no neighbors)')
('WARNING: ', '0605', ' is an island (no neighbors)')
('WARNING: ', '1701', ' is an island (no neighbors)')

/opt/python/lib/python3.13/site-packages/libpysal/weights/contiguity.py:347: UserWarning: The weights matrix is not fully connected: 
 There are 11 disconnected components.
 There are 5 islands with ids: 4405, 7902, 0602, 0605, 1701.
  W.__init__(self, neighbors, ids=ids, **kw)

Spatial lag operator (analogous to time lag operator in time series):

\(X_{lag} = WX\)

\(X_{lag} = \sum_{i} w_i x_i\)

If W is row normalized, this is a weighted averages of neighbor values using the spatial weights

We can now reframe the spatial autocorrelation as:

\(Corr(X, X_{lag})\)

constituencies["lagged_vote_share_rn"] = weights.spatial_lag.lag_spatial(W, constituencies["vote_share_rn"])

constituencies[["vote_share_rn", "lagged_vote_share_rn"]]

	vote_share_rn	lagged_vote_share_rn
0	20.41	25.694000
1	14.39	25.033333
2	38.43	37.555000
3	28.41	28.304000
4	27.67	22.557500
...	...	...
534	29.44	34.086667
535	38.50	39.245000
536	38.62	35.733750
537	32.02	36.588000
538	32.56	30.520000

539 rows × 2 columns

constituencies.plot("lagged_vote_share_rn", cmap="Greys")

constituencies["vote_share_rn_standardized"] = constituencies["vote_share_rn"] - constituencies["vote_share_rn"].mean()
constituencies["lagged_vote_share_rn_standardized"] = weights.lag_spatial(W, constituencies["vote_share_rn_standardized"])

fig, ax = plt.subplots(figsize=(9, 9))
ax.scatter(constituencies["vote_share_rn_standardized"], constituencies["lagged_vote_share_rn_standardized"])

constituencies[["vote_share_rn", "lagged_vote_share_rn"]].corr()

	vote_share_rn	lagged_vote_share_rn
vote_share_rn	1.00000	0.83766
lagged_vote_share_rn	0.83766	1.00000

Similar concept with good statistical properties: Moran’s I

\(I = \frac{n \sum_i \sum_j w_{ij}(Y_i - \bar Y)(Y_j - \bar Y)} {(\sum_{i \neq j} w_{ij}) \sum_i (Y_i - \bar Y)^2} = \frac{n}{\sum_{i \neq j} w_{ij}} \frac{\sum_i \sum_j w_{ij} z_i z_j}{\sum_i z_i^2}\)

moran = esda.moran.Moran(constituencies["vote_share_rn"], W)
moran.I

np.float64(0.7794561095576024)

moran.p_sim

np.float64(0.001)

plot_moran(moran)

(<Figure size 960x384 with 2 Axes>,
 array([<Axes: title={'center': 'Reference Distribution'}, xlabel='Moran I: 0.78', ylabel='Density'>,
        <Axes: title={'center': 'Moran Scatterplot (0.78)'}, xlabel='Attribute', ylabel='Spatial Lag'>],
       dtype=object))

Practice

Plot a chloropleth map of D9_diff (9-th decile income) (cast D9 as a float first)
Compute its spatially lagged value
Plot D9 and its spatially lagged value: what is the relationship between the two?
Compute Moran’s I

moran = esda.moran.Moran(constituencies["D9_diff"].astype(float), W)
plot_moran(moran)

(<Figure size 960x384 with 2 Axes>,
 array([<Axes: title={'center': 'Reference Distribution'}, xlabel='Moran I: 0.75', ylabel='Density'>,
        <Axes: title={'center': 'Moran Scatterplot (0.75)'}, xlabel='Attribute', ylabel='Spatial Lag'>],
       dtype=object))

constituencies.plot("D9_diff", cmap="Reds")

Local autocorrelation

One Local Indicator of Spatial Association (LISA) is local Moran’s I: global Moran’s I, expect we don’t sum over the i’s

\(I_i = \frac{n}{\sum_{i} z_{i}^2} z_{i} \sum_{j} w_{i,j}z_{j}\)

lisa = esda.moran.Moran_Local(constituencies["vote_share_rn"], W)
lisa.Is

/opt/python/lib/python3.13/site-packages/esda/moran.py:1350: RuntimeWarning: invalid value encountered in divide
  self.z_sim = (self.Is - self.EI_sim) / self.seI_sim

array([ 5.79424752e-01,  9.88925307e-01,  3.48507625e-01,  9.35026207e-02,
        3.12888167e-01,  8.24897842e-01, -1.25263747e-01,  9.06920602e-01,
        9.26783838e-01,  1.38754803e+00,  1.40285754e+00, -1.05035562e-02,
        8.19739723e-01,  6.11477304e-02,  2.77357751e-02,  1.79380083e+00,
        7.42958401e-02,  7.13910752e-01, -1.98169549e-02,  1.37322264e-01,
        1.63967005e-01, -2.47265333e-02, -0.00000000e+00,  3.23873650e-01,
        2.48517940e+00,  2.75701188e+00,  1.70449698e+00, -1.58206529e-01,
       -8.14702695e-02,  1.18986244e-01,  2.53174694e-01, -1.02672730e-01,
       -1.54155086e-02,  2.58171656e-02,  2.68546977e-02,  3.43235069e-01,
        2.47691054e-01,  2.96179373e-01,  1.36454129e+00,  1.44775693e+00,
        9.42053418e-02,  4.02394253e-02,  1.97047060e+00,  1.13667614e-01,
        2.52146643e+00,  3.56473156e+00,  1.14930115e+00,  1.17335681e+00,
        3.13413875e+00,  2.83922544e+00,  9.60659579e-02,  2.58408250e+00,
        5.30615784e+00,  5.48890188e+00,  3.65903257e-01,  4.55052128e+00,
        4.87068166e+00,  3.74737822e+00,  5.16031545e+00,  4.19224185e+00,
        4.62637617e+00,  4.52929475e+00,  1.59978600e+00,  1.17804244e+00,
        2.91669832e+00,  5.38406562e-01,  5.00634563e-01,  3.15589980e+00,
        1.69755895e+00,  8.55677834e-01,  5.93958713e-01,  1.76636973e+00,
        3.29483049e+00,  3.02526901e+00,  3.93050870e+00, -5.20289444e-01,
        2.25870942e+00,  1.08669322e+00, -4.06522442e-02,  3.09635879e+00,
        3.29975029e-02,  1.70319198e-01,  1.57884791e+00,  1.18592404e+00,
        3.45042594e-01,  2.51850089e-01,  7.01140706e-01,  1.27575648e-02,
        4.56957189e-01,  8.21536916e-01,  2.18248678e-03,  3.42894955e-01,
        1.68311913e+00,  2.39812304e+00,  1.49995325e+00, -4.02316305e-02,
        2.22271167e+00,  9.89781427e-01,  2.90597784e-01,  1.48844443e+00,
        2.06554395e-01,  2.58297934e-01, -3.04278903e-01,  9.05335362e-01,
       -1.62553252e-01,  1.30026941e+00, -4.15726652e-01, -2.29787771e-01,
        7.00528600e-01,  1.10907471e-01,  2.73189899e+00,  3.25895727e+00,
        1.45217883e+00,  2.49283314e+00,  2.81186968e+00,  3.30151349e+00,
        1.74288049e+00,  2.11219195e-01,  1.14398828e+00,  8.28083086e-01,
        9.40890363e-01,  1.44428172e-01, -9.57537177e-02,  7.79064501e-01,
        5.11745666e-01, -7.69226002e-02,  3.16159333e-01,  2.94033218e-02,
        9.67503051e-01,  5.05823108e-01,  2.83858330e-01,  5.29909648e-01,
       -2.06094261e-02,  1.82628202e+00, -4.24077887e-02, -6.55306653e-02,
        8.50001656e-01,  1.33244086e+00,  1.18647672e+00,  2.17784018e-01,
       -6.12800832e-02, -1.46183761e-01,  2.97139022e+00,  3.15803899e+00,
        2.84935962e+00,  4.09514123e-01,  5.10286548e-01,  1.93541422e+00,
        1.98553165e-01,  4.20891368e-02,  5.45127823e-02,  1.92332055e-01,
        1.21613170e-01,  1.05657155e-01,  4.20208673e-01,  2.81538142e-01,
        5.12524981e-01, -2.46203720e-02,  4.15022366e-01, -1.08526558e-01,
        3.06288642e-01,  3.92813147e-01,  4.78513586e-01,  6.90631438e-02,
       -4.79518589e-02,  3.60153107e-01,  1.82705316e-01,  3.61782559e-01,
        6.15019900e-01,  3.70142998e-01,  9.23386958e-02,  9.11971151e-01,
        8.70520843e-01, -3.24646581e-01,  4.73437257e-01,  1.46883204e+00,
        2.61843871e+00,  2.10688366e-01, -8.26927295e-02,  1.47534520e-02,
        4.56830275e-02,  1.04303287e-01, -1.82945744e-02,  6.31108292e-02,
       -4.68355690e-02, -1.33216478e-02,  1.77183392e+00,  1.74745389e+00,
        1.29032517e+00, -4.28227904e-03, -1.82165813e-02,  1.91001878e-01,
        3.97848456e-01,  3.08249939e+00, -1.45798882e-01,  6.11845203e-01,
        4.50029155e-01,  9.09703204e-02,  2.84924795e-01,  8.03747041e-02,
        3.20169942e+00,  2.12213150e+00,  7.85002532e-01,  1.35700462e+00,
        1.89441254e-01,  1.48867165e+00,  6.70957869e-01,  1.24973197e+00,
        1.60965093e-02,  3.42593815e-01, -2.61759140e-01,  1.34615420e+00,
        9.91455859e-01,  1.02690628e+00,  5.48627965e-01,  4.42149186e-02,
       -5.32605896e-03,  2.94146049e+00,  2.10785040e+00,  1.07367973e+00,
        5.49774940e-01,  6.54111344e-01,  4.17327226e-01,  3.63750394e-02,
        1.79106348e+00,  1.38454125e-01,  5.12039567e-01,  2.24170593e-02,
        4.70411928e-01,  1.79617381e+00,  2.74923609e+00, -1.69456124e-01,
        2.06975639e-03,  5.11287271e-01,  0.00000000e+00,  4.00348526e-01,
        7.60382047e-01,  9.16355674e-01,  3.02051483e-01,  1.49239988e+00,
        2.64043159e+00, -1.08398642e-01, -2.29259962e-02,  6.91443225e-01,
        8.94204180e-01,  5.96233170e-02,  4.39191849e-04,  2.09330337e+00,
        8.62538231e-01,  7.52011765e-01,  7.60384441e-01,  2.32662712e+00,
        1.30730050e+00,  1.74965880e+00,  3.07252352e+00,  5.78889488e-01,
        1.19555329e-01,  2.31697555e-01,  1.75435856e-02, -2.40264396e-02,
        8.69690967e-02,  3.95275720e-01, -2.77904546e-02,  7.04733298e-02,
        0.00000000e+00, -1.08617828e-01,  5.03257817e-01,  3.07911405e+00,
        7.49271292e-02,  0.00000000e+00,  1.50366792e+00,  3.49201588e+00,
        5.01264555e-01,  1.20186471e-01,  1.82390683e+00,  3.58278633e-01,
        8.31512038e-01, -4.10591594e-02,  1.12098582e+00,  1.21503582e+00,
       -2.61845309e-02,  2.40728786e-01,  8.53187774e-02,  3.17734797e+00,
        5.65084490e-02,  6.42112323e-02,  1.95820083e-01,  6.29224598e-01,
        3.09752510e-01,  3.90199438e+00,  5.11510030e+00,  4.43362601e+00,
        4.81736065e+00,  5.25387720e+00,  3.47789569e+00,  4.17980233e+00,
        4.35436558e+00,  9.75683211e-02,  2.84194535e-01,  1.85856610e+00,
        3.04383619e+00,  2.29180748e+00,  1.51425873e+00,  1.96138499e+00,
        9.18175801e-01,  3.70361284e+00,  2.20047080e+00, -3.44602925e-01,
        3.51415175e+00,  1.25683251e+00,  2.12801095e+00,  1.48043114e+00,
        8.38722592e-02,  2.18100265e-01,  1.83910413e-02,  4.82253023e-01,
        4.49639564e-01,  2.18792083e-01, -6.10555829e-02,  8.42442858e-03,
        4.78527898e-01,  7.98747751e-01,  2.33055823e-02,  3.45017707e-01,
        1.70883960e-01, -1.51897743e-02, -0.00000000e+00,  2.19816571e-01,
        1.13635832e+00,  2.05987422e+00,  1.13695150e-02,  3.62864992e-02,
        5.47745892e-01,  4.74759047e-01,  4.10741648e-01, -3.01369000e-02,
       -8.09462004e-03,  6.13663897e-01, -4.94498333e-02,  7.68270774e-01,
       -1.15930142e-01,  1.07157130e+00,  3.18578686e-01,  2.01039048e-02,
        1.96887363e-01,  3.33692194e-01,  1.43430061e+00,  1.02137669e+00,
        1.51967396e+00,  6.12561440e-01,  4.57202396e-01,  1.02198893e-02,
        8.32518451e-02,  9.17135242e-01,  2.38647724e+00,  5.54203393e-02,
        4.35788239e-02,  6.58410559e-02,  8.47811802e-01,  5.44884396e-01,
        7.05546329e-01,  2.86429117e-01, -4.27529600e-02,  1.17828835e+00,
        2.11222866e-01,  1.53207322e-01, -1.16329928e-02,  5.20705470e-03,
        2.86304601e-01,  1.83217718e-01,  2.76379832e-02, -5.26464959e-02,
       -2.07755015e-02,  7.91113078e-02,  4.85859488e-01,  1.21554960e-01,
        3.21105300e-02,  5.05050623e-02,  1.94524383e-01,  2.22110235e-01,
       -2.47814451e-01, -2.54067786e-02,  2.16612670e-01,  2.91660760e-01,
       -3.03985600e-02,  4.25433990e-01,  1.30570010e-01, -1.68536243e-02,
        3.76561615e-02,  1.83803519e-02,  1.23295473e-02, -1.17269388e-02,
        4.59173805e-03, -2.14560109e-02,  1.59544306e-02,  4.13152118e-03,
        1.01189647e-03,  6.15413191e-01,  2.74615128e-01,  1.70053419e+00,
        2.37449605e-02, -4.02985956e-02,  1.74369137e+00,  5.03138732e-01,
        1.58639175e+00, -1.50879442e-02, -5.15752311e-04,  5.30372146e-03,
       -1.16813444e-01,  6.13136914e-01,  6.92836924e-01,  5.73806851e-01,
       -1.06127496e-01,  1.34865636e-01,  2.69740094e+00,  2.33479995e-01,
       -9.22112576e-03,  1.26329075e-03, -2.52585528e-01,  2.52405101e-01,
        2.14718661e-03,  3.54029743e-01,  6.19330983e-02,  8.24750244e-01,
        2.62205341e-01,  3.63996200e-01,  2.63784660e-01,  8.12508261e-01,
        1.55333662e-01,  6.14627535e-01,  3.75097917e-01,  2.73676585e-01,
        1.08440210e+00,  1.89722181e-01, -6.10932741e-02,  2.53458154e-02,
        4.19547823e-02,  1.98813344e+00,  8.82451051e-01,  5.07341619e-01,
        6.02967390e-02,  8.30283428e-01,  1.20173172e-01,  2.69457937e-01,
       -3.98251622e-02,  1.29369744e-02,  6.01611091e-01, -2.12235934e-02,
        9.97359636e-02,  3.44458059e-02, -2.09903378e-01,  1.70403964e+00,
        1.36669557e+00,  7.10199153e-01, -3.50648693e-02, -1.76394409e-01,
        9.67987453e-03, -2.87070713e-08,  2.28975795e-01,  1.84228683e-01,
        2.34032954e-01,  2.45268843e-01,  2.84829380e-01,  4.84907331e-01,
        1.55813743e-01,  7.64179470e-01,  3.09957578e-01,  1.28960924e+00,
        7.96548779e-01,  1.57967088e+00,  3.25739812e-01,  4.98904341e-01,
        3.73921757e-01,  3.06511387e-02,  1.21472305e+00,  1.55789710e+00,
        6.60601726e-01,  4.64702169e-01, -1.49765826e-01,  8.67988573e-02,
        1.99835938e-02, -2.73940159e-03,  3.46559423e-02, -1.72224335e-02,
        5.48203157e-02,  3.37493794e-01,  9.90672759e-02,  6.45834912e-01,
        7.73266648e-01,  4.64775424e-01, -1.13854918e-02,  3.60500213e-01,
        1.90271251e-01, -7.57584033e-02,  2.91739444e-01, -4.85033794e-02,
        3.41601264e-01, -9.27450200e-03,  1.97961112e+00, -4.46914851e-03,
        1.77118107e-02, -9.86883804e-03,  1.16671751e-01, -4.92884640e-02,
        9.94937072e-02,  1.27462575e+00, -3.62936487e-03,  1.24311111e-01,
        2.47265519e-01, -1.04231823e-02,  2.82315051e+00,  6.14006753e-01,
        9.42654177e-01,  1.17474992e+00, -1.89648058e-02,  1.21860624e-01,
        3.67344069e-01,  2.14970044e-01,  1.19577932e+00,  1.28914042e+00,
        1.49288061e+00,  9.14251674e-01,  1.79476656e-01,  3.29946767e-01,
        9.48627308e-02,  8.55177993e-02,  2.30666555e-01, -5.30234222e-02,
        1.46999038e-01,  4.58705007e-01,  3.52191601e-01,  3.41343557e-02,
       -2.88194629e-01,  3.46731968e-01, -4.67076659e-02,  4.52700954e-01,
        2.47949894e-01,  1.63147628e-02, -8.94117501e-03])

plt.hist(lisa.Is)

(array([165., 189.,  68.,  47.,  17.,  22.,  13.,   6.,   7.,   5.]),
 array([-0.52028944,  0.08062969,  0.68154882,  1.28246795,  1.88338709,
         2.48430622,  3.08522535,  3.68614448,  4.28706361,  4.88798275,
         5.48890188]),
 <BarContainer object of 10 artists>)

lisa.Is.mean()

np.float64(0.7707928329691234)

constituencies.assign(i=lisa.Is).plot(column="i", cmap = "Reds")

lisa_cluster(lisa, constituencies, p=1)

lisa_cluster(lisa, constituencies, p=0.05)

Practice

Compute local Moran’s I for D1_diff
Plot the Is, and the LL/HL/LH/HH clusters

Standard geographic regression models

If the data generating process is explicitly spatial, we want to include geography in the analysis
If some omitted variables is spatial, resulting errors will be spatial

Let’s see if we can predict/explain vote_share_rn with our explanatory variables

constituencies["D1_diff"] = constituencies["D1_diff"].astype(float)
constituencies["D9_diff"] = constituencies["D9_diff"].astype(float)

ols_model = spreg.OLS(
    constituencies[['vote_share_rn']].values,
    constituencies[['unemployement_rate', 'mean_age', 'D1_diff', 'D9_diff']].values,
    # Dependent variable name
    name_y="vote_share_rn",
    # Independent variable name
    name_x=['unemployement_rate', 'mean_age', 'D1_diff', 'D9_diff'],
)

print(ols_model.summary)

REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
------------------------------------------------------------------------------------
Data set            :     unknown
Weights matrix      :        None
Dependent Variable  :vote_share_rn                Number of Observations:         539
Mean dependent var  :     31.6400                Number of Variables   :           5
S.D. dependent var  :     10.7351                Degrees of Freedom    :         534
R-squared           :      0.4494
Adjusted R-squared  :      0.4452
Sum squared residual:     34140.1                F-statistic           :    108.9422
Sigma-square        :      63.933                Prob(F-statistic)     :   7.867e-68
S.E. of regression  :       7.996                Log likelihood        :   -1882.832
Sigma-square ML     :      63.340                Akaike info criterion :    3775.664
S.E of regression ML:      7.9586                Schwarz criterion     :    3797.113

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     t-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT       -59.14834        11.61176        -5.09383         0.00000
  unemployement_rate         3.02469         0.53091         5.69717         0.00000
            mean_age         1.36535         0.12838        10.63541         0.00000
             D1_diff         0.00313         0.00050         6.22522         0.00000
             D9_diff        -0.00054         0.00004       -14.03249         0.00000
------------------------------------------------------------------------------------

REGRESSION DIAGNOSTICS
MULTICOLLINEARITY CONDITION NUMBER          86.220

TEST ON NORMALITY OF ERRORS
TEST                             DF        VALUE           PROB
Jarque-Bera                       2          7.609           0.0223

DIAGNOSTICS FOR HETEROSKEDASTICITY
RANDOM COEFFICIENTS
TEST                             DF        VALUE           PROB
Breusch-Pagan test                4         47.549           0.0000
Koenker-Bassett test              4         64.950           0.0000
================================ END OF REPORT =====================================

constituencies.assign(predy=ols_model.predy).plot(column="predy", cmap = "Reds")

constituencies.assign(u=ols_model.u).plot(column="u", cmap = "seismic", legend = True)

constituencies['u'] = ols_model.u
# constituencies.boxplot(column = ['residuals'], by = "dep", figsize = (20, 5))
constituencies["lagged_u"] = weights.lag_spatial(W, constituencies["u"])

fig, ax = plt.subplots(figsize=(9, 9))
ax.scatter(constituencies["u"], constituencies["lagged_u"])

SLX model

Add spatially lagged values as exogenous variables, i.e. \(WX_j\) influence \(Y_i\)

constituencies["lagged_unemployement_rate"] = weights.lag_spatial(W, constituencies["unemployement_rate"])
constituencies["lagged_mean_age"] = weights.lag_spatial(W, constituencies["mean_age"])
constituencies["lagged_D1_diff"] = weights.lag_spatial(W, constituencies["D1_diff"])
constituencies["lagged_D9_diff"] = weights.lag_spatial(W, constituencies["D9_diff"])

variables = ['unemployement_rate', 'mean_age', 'D1_diff', 'D9_diff']
lagged_variables = ["lagged_" + x for x in variables]
all_variables = variables + lagged_variables

slx_model = spreg.OLS(
    constituencies[['vote_share_rn']].values,
    constituencies[all_variables].values,
    # Dependent variable name
    name_y="vote_share_rn",
    # Independent variable name
    name_x=all_variables
)

print(slx_model.summary)

REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
------------------------------------------------------------------------------------
Data set            :     unknown
Weights matrix      :        None
Dependent Variable  :vote_share_rn                Number of Observations:         539
Mean dependent var  :     31.6400                Number of Variables   :           9
S.D. dependent var  :     10.7351                Degrees of Freedom    :         530
R-squared           :      0.4732
Adjusted R-squared  :      0.4653
Sum squared residual:     32661.4                F-statistic           :     59.5100
Sigma-square        :      61.625                Prob(F-statistic)     :   5.854e-69
S.E. of regression  :       7.850                Log likelihood        :   -1870.899
Sigma-square ML     :      60.596                Akaike info criterion :    3759.798
S.E of regression ML:      7.7844                Schwarz criterion     :    3798.406

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     t-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT       -59.35288        11.79147        -5.03354         0.00000
  unemployement_rate         2.16722         0.58391         3.71157         0.00023
            mean_age         1.30285         0.19924         6.53918         0.00000
             D1_diff         0.00310         0.00062         5.02964         0.00000
             D9_diff        -0.00046         0.00007        -6.09763         0.00000
lagged_unemployement_rate         1.74391         0.50921         3.42472         0.00066
     lagged_mean_age         0.05791         0.19780         0.29279         0.76980
      lagged_D1_diff        -0.00026         0.00060        -0.44086         0.65949
      lagged_D9_diff        -0.00011         0.00009        -1.33309         0.18307
------------------------------------------------------------------------------------

REGRESSION DIAGNOSTICS
MULTICOLLINEARITY CONDITION NUMBER         122.403

TEST ON NORMALITY OF ERRORS
TEST                             DF        VALUE           PROB
Jarque-Bera                       2          3.904           0.1420

DIAGNOSTICS FOR HETEROSKEDASTICITY
RANDOM COEFFICIENTS
TEST                             DF        VALUE           PROB
Breusch-Pagan test                8         83.573           0.0000
Koenker-Bassett test              8        104.473           0.0000
================================ END OF REPORT =====================================

slx_model_2 = spreg.OLS(
    constituencies[['vote_share_rn']].values,
    constituencies[variables].values,
    name_y="vote_share_rn",
    name_x=variables,
    w = W,
    slx_lags = 1
    )

print(slx_model_2.summary)

REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES WITH SPATIALLY LAGGED X (SLX)
------------------------------------------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :vote_share_rn                Number of Observations:         539
Mean dependent var  :     31.6400                Number of Variables   :           9
S.D. dependent var  :     10.7351                Degrees of Freedom    :         530
R-squared           :      0.4732
Adjusted R-squared  :      0.4653
Sum squared residual:     32661.4                F-statistic           :     59.5100
Sigma-square        :      61.625                Prob(F-statistic)     :   5.854e-69
S.E. of regression  :       7.850                Log likelihood        :   -1870.899
Sigma-square ML     :      60.596                Akaike info criterion :    3759.798
S.E of regression ML:      7.7844                Schwarz criterion     :    3798.406

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     t-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT       -59.35288        11.79147        -5.03354         0.00000
  unemployement_rate         2.16722         0.58391         3.71157         0.00023
            mean_age         1.30285         0.19924         6.53918         0.00000
             D1_diff         0.00310         0.00062         5.02964         0.00000
             D9_diff        -0.00046         0.00007        -6.09763         0.00000
W_unemployement_rate         1.74391         0.50921         3.42472         0.00066
          W_mean_age         0.05791         0.19780         0.29279         0.76980
           W_D1_diff        -0.00026         0.00060        -0.44086         0.65949
           W_D9_diff        -0.00011         0.00009        -1.33309         0.18307
------------------------------------------------------------------------------------

REGRESSION DIAGNOSTICS
MULTICOLLINEARITY CONDITION NUMBER         122.403

TEST ON NORMALITY OF ERRORS
TEST                             DF        VALUE           PROB
Jarque-Bera                       2          3.904           0.1420

DIAGNOSTICS FOR HETEROSKEDASTICITY
RANDOM COEFFICIENTS
TEST                             DF        VALUE           PROB
Breusch-Pagan test                8         83.573           0.0000
Koenker-Bassett test              8        104.473           0.0000
================================ END OF REPORT =====================================

Beware of marginal effect computation :

effect of increasing \(x_i\)
effect of increasing all \(x_j\) on \(y_i\)
effect of increasing \(x_i\) and \(x_j\) on \(y_i\)

Spatial Error model (SEM)

Allow for spatial autocorrelation in errors : \(Wu_j\) influence \(u_i\) (and thus \(Y_i\))

\(u_i = \lambda Wu_j + \epsilon_{i}\)

This imply heteroskedasticity hence OLS is not efficient

sem_model = spreg.GM_Error_Het(
    constituencies[['vote_share_rn']].values,
    constituencies[variables].values,
    name_y="vote_share_rn",
    name_x=variables,
    w = W,
    )

print(sem_model.summary)

GM_Error_Het
REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: GM SPATIALLY WEIGHTED LEAST SQUARES (HET)
------------------------------------------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :vote_share_rn                Number of Observations:         539
Mean dependent var  :     31.6400                Number of Variables   :           5
S.D. dependent var  :     10.7351                Degrees of Freedom    :         534
Pseudo R-squared    :      0.4055
N. of iterations    :           1                Step1c computed       :          No

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT        -7.98323         8.47178        -0.94233         0.34602
  unemployement_rate         0.30196         0.38959         0.77507         0.43830
            mean_age         0.83133         0.12537         6.63103         0.00000
             D1_diff         0.00194         0.00043         4.48594         0.00001
             D9_diff        -0.00052         0.00007        -7.33055         0.00000
              lambda         0.82500         0.02364        34.89286         0.00000
------------------------------------------------------------------------------------
================================ END OF REPORT =====================================

Spatial Lag Model (or Spatial Autoregressive Model)

\(WY_j\) influences \(Y_i\)

Violates exogeneity conditions

slm_model = spreg.GM_Lag(
    constituencies[['vote_share_rn']].values,
    constituencies[variables].values,
    name_y="vote_share_rn",
    name_x=variables,
    w = W,
    )

print(slm_model.summary)

GM_Lag
REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: SPATIAL TWO STAGE LEAST SQUARES
------------------------------------------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :vote_share_rn                Number of Observations:         539
Mean dependent var  :     31.6400                Number of Variables   :           6
S.D. dependent var  :     10.7351                Degrees of Freedom    :         533
Pseudo R-squared    :      0.5887
Spatial Pseudo R-squared:  0.4566

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT       -50.10444        10.56509        -4.74245         0.00000
  unemployement_rate         2.43502         0.50534         4.81855         0.00000
            mean_age         1.13751         0.13736         8.28134         0.00000
             D1_diff         0.00267         0.00046         5.75318         0.00000
             D9_diff        -0.00045         0.00004       -10.19554         0.00000
     W_vote_share_rn         0.19117         0.06749         2.83249         0.00462
------------------------------------------------------------------------------------
Instrumented: W_vote_share_rn
Instruments: W_D1_diff, W_D9_diff, W_mean_age, W_unemployement_rate

DIAGNOSTICS FOR SPATIAL DEPENDENCE
TEST                              DF         VALUE           PROB
Anselin-Kelejian Test             1         33.244           0.0000

SPATIAL LAG MODEL IMPACTS
Impacts computed using the 'simple' method.
            Variable         Direct        Indirect          Total
  unemployement_rate         2.4350          0.5755          3.0106
            mean_age         1.1375          0.2689          1.4064
             D1_diff         0.0027          0.0006          0.0033
             D9_diff        -0.0005         -0.0001         -0.0006
================================ END OF REPORT =====================================

Model	Formula	When to Use
OLS	\(Y = X\beta + \epsilon\)	Baseline, no spatial effects
SLX	\(Y = X\beta + WX\gamma + \epsilon\)	Neighbor characteristics affect outcome
SEM	\(Y = X\beta + u\), \(u = \lambda Wu + \epsilon\)	Spatial error correlation (omitted variables)
SAR/Lag	\(Y = \rho WY + X\beta + \epsilon\)	Outcomes depend on neighbor outcomes

Where: - \(W\) = spatial weights matrix - \(WX\) = spatially lagged explanatory variables - \(WY\) = spatially lagged dependent variable - \(\lambda\) = spatial error parameter - \(\rho\) = spatial lag parameter