
Basic spatial operations in Geo Dataframes¶
We will review some important operations for geodataframes.
linkWorldMap="https://github.com/DACSS-Spatial/data_forSpatial/raw/refs/heads/main/WORLD/worldMaps.gpkg"
import geopandas as gpd
gpd.list_layers(linkWorldMap)
Let's open all the layers (this takes a minute):
countries=gpd.read_file(linkWorldMap,layer='countries')
rivers=gpd.read_file(linkWorldMap,layer='rivers')
cities=gpd.read_file(linkWorldMap,layer='cities')
Now, let's see some important spatial operations:
countries.head()
# as DF
countries.iloc[50:,]
# as DF
countries.loc[50:,'geometry']
But as a GeDF, you can also filter using a coordinate point via cx. Let me get the bounding box of the map:
countries.total_bounds
As you are getting [minx, miny, maxx, maxy] let me select a valid coordinate, i.e. (0,0)
countries.cx[:0,:0]
#then
countries.cx[:0,:0].plot()
Notice cx would be cleaner if spatial element is a point.
Clipping¶
Let me keep one country:
brazil=countries[countries.COUNTRY=='Brazil']
#see
brazil
Pay attention to this GDF:
cities
The GDF has a column 'COUNTRY' too.
Now, check the rivers GDF:
rivers
As you see, this GDF has no Country. But since it has geometry, you can keep the rivers, or their sections, that serve a country:
riversBrazil_clipped = gpd.clip(gdf=rivers,
mask=brazil)
Then, you can plot the clipped version:
base = brazil.plot(facecolor="greenyellow", edgecolor='black', linewidth=0.4,figsize=(5,5))
riversBrazil_clipped.plot(edgecolor='blue', linewidth=0.5,
ax=base)
The geometry types are not modified:
set(brazil.geom_type), set(riversBrazil_clipped.geom_type)
LinkBrazil="https://github.com/DACSS-Spatial/data_forSpatial/raw/refs/heads/main/BRAZIL/brazil_5880.gpkg"
## we have
gpd.list_layers(LinkBrazil)
Let me open municipalities:
brazil_municipalities=gpd.read_file(LinkBrazil,layer='municipalities')
brazil_municipalities.plot(facecolor='lightgrey', edgecolor='black',linewidth=0.2)
#see
brazil_municipalities.head()
## we have
len(brazil_municipalities.ADM2_PT)
# higher level count
len(set(brazil_municipalities.ADM1_PT))
Then, this is Minas Gerais:
brazil_municipalities[brazil_municipalities.ADM1_PT=='Minas Gerais'].plot()
Let's see the options to combine:
Unary UNION¶
We can combine all these polygons into one:
brazil_municipalities[brazil_municipalities.ADM1_PT=='Minas Gerais'].union_all()
Let's save that result:
MinasGerais_union=brazil_municipalities[brazil_municipalities.ADM1_PT=='Minas Gerais'].union_all()
# what do we have?
type(MinasGerais_union)
You can turn that shapely object into a GeoDF like this:
gpd.GeoDataFrame(index=[0],data={'ADM':'Minas Gerais'},
crs=brazil_municipalities.crs,
geometry=[MinasGerais_union])
brazil_municipalities[brazil_municipalities.ADM1_PT=='Minas Gerais'].dissolve().plot()
Let me save the result, and see the type :
MinasGerais_dissolve=brazil_municipalities[brazil_municipalities.ADM1_PT=='Minas Gerais'].dissolve()
# we got?
type(MinasGerais_dissolve)
You got a GEOdf this time:
## see
MinasGerais_dissolve
# keeping what is relevant
MinasGerais_dissolve.drop(columns=['ADM2_PT','ADM2_PCODE','ET_ID'],inplace=True)
# then
MinasGerais_dissolve
b. Dissolve for groups¶
Using dissolve() with no arguments returns the union of the polygons, BUT also you get a GEOdf. However, if you have a column that represents a grouping (as we do), you can dissolve by that column:
# dissolving
brazil_municipalities.dissolve(by='ADM1_PT').plot(facecolor='yellow', edgecolor='black',linewidth=0.2)
Again, let me save this result:
Brazil_adm1_diss=brazil_municipalities.dissolve(by='ADM1_PT')
We know we have a GeoDF; let's see contents:
Brazil_adm1_diss
Again, we can drop columns that do not bring important information:
Brazil_adm1_diss.drop(columns=['ADM2_PT','ADM2_PCODE','ET_ID'],inplace=True)
Brazil_adm1_diss.reset_index(inplace=True)
Brazil_adm1_diss.info()
c. Dissolve and aggregate¶
In pandas, you can aggregate data using some statistics. Let me open the map with indicators we created in a previous session:
linkInd="https://github.com/DACSS-Spatial/data_forSpatial/raw/refs/heads/main/WORLD/worldindicators.json"
indicators=gpd.read_file(linkInd)
indicators.head()
You can compute the mean of the countries by region, using a DF approach like this:
indicators.groupby('region').agg({'fragility':'mean'})
The RESULT is just data, you see no spatial information. It got lost.
The appropriate operation to conserve spatial information is Dissolve:
indicatorsByRegion=indicators.dissolve(
by="region",
aggfunc={
"fragility": ["mean"]},as_index=False, grid_size=0.1
)
## see the spatial info
indicatorsByRegion
Without renaming, you can request a choropleth:
# !pip install mapclassify
indicatorsByRegion.plot(column =('fragility', 'mean'),scheme='quantiles', cmap='OrRd_r',
legend=True,
legend_kwds={"title": "Fragility",'loc': 'lower left'},
edgecolor='black',linewidth=0.2,
figsize=(15, 10))
brazil_5880=gpd.read_file(LinkBrazil,layer='country')
brazil_5880.centroid
# then
brazil_5880.centroid.x.values[0],brazil_5880.centroid.y.values[0]
Let me open the airports:
airports_5880=gpd.read_file(LinkBrazil,layer='airports')
airports_5880.head()
Now, let me plot some airports, using the centroid:
# coordinates
centroidX,centroidY=brazil_5880.centroid.x.values[0],brazil_5880.centroid.y.values[0]
# subsets of medium airports
Brazil_AirTopLeft=airports_5880[airports_5880.kind=='medium_airport'].cx[:centroidX,centroidY:]
Brazil_AirTopRight=airports_5880[airports_5880.kind=='medium_airport'].cx[centroidX:,centroidY:]
Brazil_AirBottomLeft=airports_5880[airports_5880.kind=='medium_airport'].cx[:centroidX,:centroidY]
Brazil_AirBottomRight=airports_5880[airports_5880.kind=='medium_airport'].cx[centroidX:,:centroidY]
Let me plot the subsets:
base=Brazil_AirTopLeft.plot(facecolor='grey', alpha=0.4)
Brazil_AirTopRight.plot(ax=base,facecolor='orange', alpha=0.4)
Brazil_AirBottomLeft.plot(ax=base,facecolor='green', alpha=0.4)
Brazil_AirBottomRight.plot(ax=base,facecolor='red', alpha=0.4)
Notice we have simple points in each subset:
Brazil_AirBottomLeft
In this situation, you can NOT make a convex hull:
Brazil_AirBottomLeft.convex_hull.plot()
You first need to dissolve, and then you create a hull, an envelope of convex angles:
Brazil_AirBottomLeft.dissolve().convex_hull.plot()
As we saw, the convex hull is a polygon:
Brazil_AirBottomLeft.dissolve().convex_hull
What if we have the polygons that had not been previously combined?
brazil_municipalities.cx[:centroidX,:centroidY].convex_hull.plot(edgecolor='red')
# notice
brazil_municipalities.cx[:centroidX,:centroidY].dissolve().convex_hull.plot(edgecolor='red')
That is, you get a convex hull for each geometry.
We can also use union before creating a convex hull:
# just the union
large_airport=airports_5880[airports_5880.kind=='large_airport']
large_airport.union_all()
# hull of the union
large_airport.union_all().convex_hull
Let's turn the GS into a GDF:
LargeAirport_hull= gpd.GeoDataFrame(index=[0],
crs=large_airport.crs,
geometry=[large_airport.union_all().convex_hull])
LargeAirport_hull['name']='large airports hull' # optional
# then
LargeAirport_hull
Let's use the GDF in plotting:
base=brazil_5880.plot(facecolor='yellow')
large_airport.plot(ax=base)
LargeAirport_hull.plot(ax=base,facecolor='green',
edgecolor='white',alpha=0.4,
hatch='X')
# the north
MunisN_brazil=brazil_municipalities.cx[:,centroidY:]
# the south
MunisS_brazil=brazil_municipalities.cx[:,:centroidY]
# the west
MunisW_brazil=brazil_municipalities.cx[:centroidX,:]
# the east
MunisE_brazil=brazil_municipalities.cx[centroidX:,:]
Let me plot:
base=MunisN_brazil.plot(facecolor='yellow', edgecolor='black',linewidth=0.2, alpha=0.6)
MunisS_brazil.plot(facecolor='grey', edgecolor='black',linewidth=0.2,ax=base, alpha=0.4)
base=MunisE_brazil.plot(facecolor='yellow', edgecolor='black',linewidth=0.2, alpha=0.6)
MunisW_brazil.plot(facecolor='grey', edgecolor='black',linewidth=0.2,ax=base, alpha=0.4)
Intersection¶
We keep what is common between GeoDFs:
munisNS_brazil=MunisN_brazil.overlay(MunisS_brazil, how="intersection",keep_geom_type=True)
munisNS_brazil.plot()
# keeping the overlay
munisWE_brazil=MunisW_brazil.overlay(MunisE_brazil, how="intersection",keep_geom_type=True)
munisWE_brazil.plot(edgecolor='white',linewidth=0.1)
Union¶
Let me unite any polygons that touche each other. First, take a look at each one:
munisNS_brazil.info()
munisWE_brazil.info()
Let me subset and show you:
keep=['ADM0_EN_1','ADM1_PT_1','ADM2_PT_1','geometry']
munisNS_brazil=munisNS_brazil.loc[:,keep]
munisWE_brazil=munisWE_brazil.loc[:,keep]
# now
munisNS_brazil.overlay(munisWE_brazil,how="union",keep_geom_type=True)
As you see, geometries are fine, but not attributes. It is strictly NOT appending the GeoDFs:
# appending
import pandas as pd
pd.concat([munisNS_brazil,munisWE_brazil],ignore_index=True)
These are different operations. UNION will combine any geometries that have overlaping portions in common.
munisNS_brazil.overlay(munisWE_brazil, how="union",keep_geom_type=True)
Let me create an object to save the previous result:
muniMidBrazil=munisNS_brazil.overlay(munisWE_brazil, how="union",keep_geom_type=True).dissolve()
muniMidBrazil
# some cleaning
muniMidBrazil['zone']='middles'
muniMidBrazil=muniMidBrazil.loc[:,['ADM0_EN_1_1','zone','geometry']]
muniMidBrazil
Difference¶
Here, you keep what belongs to the GeoDF to left that is not in the GeoDF to the right:
# with the municipalities
brazil_municipalities.overlay(muniMidBrazil, how='difference').plot()
Symmetric Difference¶
This is the opposite to intersection, you keep what is not in the intersection:
MunisN_brazil.overlay(MunisS_brazil, how="symmetric_difference",keep_geom_type=True).plot()
MunisW_brazil.overlay(muniMidBrazil, how="symmetric_difference",keep_geom_type=False).plot()
# non valid
MunisS_brazil[~MunisS_brazil.is_valid]
# see the invalid:
MunisS_brazil[~MunisS_brazil.is_valid].plot()
It is difficult to see what is wrong. Let's get some information:
# what is wrong?
from shapely.validation import explain_validity, make_valid
explain_validity(MunisS_brazil[~MunisS_brazil.is_valid].geometry)
# varieties?
MunisS_brazil['validity']=[x.split('[')[0] for x in MunisS_brazil.geometry.apply(lambda x: explain_validity(x))]
MunisS_brazil['validity'].value_counts()
# solving the issue:
MunisS_brazil.drop(columns=['validity'],inplace=True)
MunisS_brazil_valid=MunisS_brazil.copy()
MunisS_brazil_valid['geometry'] = [make_valid(row) if not row.is_valid else row for row in MunisS_brazil_valid['geometry'] ]
#any invalid?
MunisS_brazil_valid[~MunisS_brazil_valid.is_valid]
The solution we got may help for some advanced techniques, but may also give us some extra trouble. Notice that once geopandas solved the problem, you have created collections:
pd.Series([type(x) for x in MunisS_brazil_valid.geometry]).value_counts()
The Buffer¶
The buffer will create polygon that follows the same shape of the original vector (line, polygon, point).
# this is the original
riversBrazil_clipped.plot()
But, verify crs as we are going to use distances:
riversBrazil_clipped.crs
Let's reproject:
riversBrazil_5880=riversBrazil_clipped.copy()
riversBrazil_5880 = riversBrazil_5880.to_crs('EPSG:5880')
Now I can use the rivers to create a buffer of 50000 meters:
# 50000 at each side (radius)
riversBrazil_5880.buffer(50000).plot(facecolor='yellow', edgecolor='black',linewidth=0.2)
Then
base=riversBrazil_5880.buffer(50000).plot(facecolor='orange', linewidth=0.2)
riversBrazil_5880.plot( edgecolor='grey',ax=base)
The buffering process helps cleaning simple invalidities:
MunisS_brazil_valid=MunisS_brazil.copy()
MunisS_brazil_valid['geometry'] = MunisS_brazil_valid['geometry'].buffer(0)
#any invalid?
MunisS_brazil_valid[~MunisS_brazil_valid.is_valid]
Comparing with the previous result, this time you got no collections:
# then:
pd.Series([type(x) for x in MunisS_brazil_valid.geometry]).value_counts()
This 'buffer trick' may not always work:
# previously
indicators.dissolve(
by="region",
aggfunc={
"fragility": ["mean"]},as_index=False
).plot(column =('fragility', 'mean'),scheme='quantiles', cmap='OrRd_r',
legend=True,
legend_kwds={"title": "Fragility",'loc': 'lower left'},
edgecolor='black',linewidth=0.2,
figsize=(15, 10))
Notice that AFRICA has some lines that should have dissappeared after dissolving, but you can still see some lines.
We could try the buffer trick:
indicators_valid=indicators.copy()
indicators_valid['geometry'] = indicators_valid['geometry'].buffer(0)
dissolved_gdf=indicators_valid.dissolve(
by="region",
aggfunc={
"fragility": ["mean"]},as_index=False, grid_size=0.1
)
dissolved_gdf.plot(column =('fragility', 'mean'),scheme='quantiles', cmap='OrRd_r',
legend=True,
legend_kwds={"title": "Fragility",'loc': 'lower left'},
edgecolor='black',linewidth=0.2,
figsize=(15, 10))
It did not work either. We may use a new functionality in GeoPandas's dissolve:
dissolved_gdf=indicators.dissolve(
by="region",
aggfunc={
"fragility": ["mean"]},as_index=False, grid_size=0.1
)
dissolved_gdf.plot(column =('fragility', 'mean'),scheme='quantiles', cmap='OrRd_r',
legend=True,
legend_kwds={"title": "Fragility",'loc': 'lower left'},
edgecolor='black',linewidth=0.2,
figsize=(15, 10))
Now, you do see a clean map.