No description has been provided for this image
Open In Colab

Basic spatial operations in Geo Dataframes¶

We will review some important operations for geodataframes.

Let's remember the contents of the world map from last session:

In [ ]:
linkWorldMap="https://github.com/DACSS-Spatial/data_forSpatial/raw/refs/heads/main/WORLD/worldMaps.gpkg"

import geopandas as gpd
gpd.list_layers(linkWorldMap)

Let's open all the layers (this takes a minute):

In [ ]:
countries=gpd.read_file(linkWorldMap,layer='countries')
rivers=gpd.read_file(linkWorldMap,layer='rivers')
cities=gpd.read_file(linkWorldMap,layer='cities')

Now, let's see some important spatial operations:

  1. Subsetting
  2. Combining GeoDF rows
  3. The convex hull
  4. Spatial Overlays
  5. Checking Validity

Subsetting¶

Filtering¶

You can keep some elements by subsetting by filtering, as we used to do in common pandas data frames.

In [ ]:
countries.head()
In [ ]:
# as DF
countries.iloc[50:,]
In [ ]:
# as DF
countries.loc[50:,'geometry']

But as a GeDF, you can also filter using a coordinate point via cx. Let me get the bounding box of the map:

In [ ]:
countries.total_bounds

As you are getting [minx, miny, maxx, maxy] let me select a valid coordinate, i.e. (0,0)

In [ ]:
countries.cx[:0,:0]
In [ ]:
#then
countries.cx[:0,:0].plot()

Notice cx would be cleaner if spatial element is a point.

Clipping¶

Let me keep one country:

In [ ]:
brazil=countries[countries.COUNTRY=='Brazil']
#see
brazil

Pay attention to this GDF:

In [ ]:
cities

The GDF has a column 'COUNTRY' too.

Now, check the rivers GDF:

In [ ]:
rivers

As you see, this GDF has no Country. But since it has geometry, you can keep the rivers, or their sections, that serve a country:

In [ ]:
riversBrazil_clipped = gpd.clip(gdf=rivers,
                               mask=brazil)

Then, you can plot the clipped version:

In [ ]:
base = brazil.plot(facecolor="greenyellow", edgecolor='black', linewidth=0.4,figsize=(5,5))
riversBrazil_clipped.plot(edgecolor='blue', linewidth=0.5,
                    ax=base)

The geometry types are not modified:

In [ ]:
set(brazil.geom_type), set(riversBrazil_clipped.geom_type)

TOC


Combining GeoDF rows¶

Let me bring the projected data from Brazil.

In [ ]:
LinkBrazil="https://github.com/DACSS-Spatial/data_forSpatial/raw/refs/heads/main/BRAZIL/brazil_5880.gpkg"
## we have
gpd.list_layers(LinkBrazil)

Let me open municipalities:

In [ ]:
brazil_municipalities=gpd.read_file(LinkBrazil,layer='municipalities')
brazil_municipalities.plot(facecolor='lightgrey', edgecolor='black',linewidth=0.2)
In [ ]:
#see
brazil_municipalities.head()
In [ ]:
## we have
len(brazil_municipalities.ADM2_PT)
In [ ]:
# higher level count
len(set(brazil_municipalities.ADM1_PT))

Then, this is Minas Gerais:

In [ ]:
brazil_municipalities[brazil_municipalities.ADM1_PT=='Minas Gerais'].plot()

Let's see the options to combine:

Unary UNION¶

We can combine all these polygons into one:

In [ ]:
brazil_municipalities[brazil_municipalities.ADM1_PT=='Minas Gerais'].union_all()

Let's save that result:

In [ ]:
MinasGerais_union=brazil_municipalities[brazil_municipalities.ADM1_PT=='Minas Gerais'].union_all()
In [ ]:
# what do we have?
type(MinasGerais_union)

You can turn that shapely object into a GeoDF like this:

In [ ]:
gpd.GeoDataFrame(index=[0],data={'ADM':'Minas Gerais'},
                 crs=brazil_municipalities.crs,
                 geometry=[MinasGerais_union])

Dissolve¶

a. Dissolve as Union¶

Using dissolve is an alternative to UNION:

In [ ]:
brazil_municipalities[brazil_municipalities.ADM1_PT=='Minas Gerais'].dissolve().plot()

Let me save the result, and see the type :

In [ ]:
MinasGerais_dissolve=brazil_municipalities[brazil_municipalities.ADM1_PT=='Minas Gerais'].dissolve()

# we got?
type(MinasGerais_dissolve)

You got a GEOdf this time:

In [ ]:
## see
MinasGerais_dissolve
In [ ]:
# keeping what is relevant
MinasGerais_dissolve.drop(columns=['ADM2_PT','ADM2_PCODE','ET_ID'],inplace=True)

# then
MinasGerais_dissolve

b. Dissolve for groups¶

Using dissolve() with no arguments returns the union of the polygons, BUT also you get a GEOdf. However, if you have a column that represents a grouping (as we do), you can dissolve by that column:

In [ ]:
# dissolving
brazil_municipalities.dissolve(by='ADM1_PT').plot(facecolor='yellow', edgecolor='black',linewidth=0.2)

Again, let me save this result:

In [ ]:
Brazil_adm1_diss=brazil_municipalities.dissolve(by='ADM1_PT')

We know we have a GeoDF; let's see contents:

In [ ]:
Brazil_adm1_diss

Again, we can drop columns that do not bring important information:

In [ ]:
Brazil_adm1_diss.drop(columns=['ADM2_PT','ADM2_PCODE','ET_ID'],inplace=True)
Brazil_adm1_diss.reset_index(inplace=True)
Brazil_adm1_diss.info()

c. Dissolve and aggregate¶

In pandas, you can aggregate data using some statistics. Let me open the map with indicators we created in a previous session:

In [ ]:
linkInd="https://github.com/DACSS-Spatial/data_forSpatial/raw/refs/heads/main/WORLD/worldindicators.json"
indicators=gpd.read_file(linkInd)
indicators.head()

You can compute the mean of the countries by region, using a DF approach like this:

In [ ]:
indicators.groupby('region').agg({'fragility':'mean'})

The RESULT is just data, you see no spatial information. It got lost.

The appropriate operation to conserve spatial information is Dissolve:

In [ ]:
indicatorsByRegion=indicators.dissolve(
     by="region",
     aggfunc={
         "fragility": ["mean"]},as_index=False, grid_size=0.1
 )

## see the spatial info
indicatorsByRegion

Without renaming, you can request a choropleth:

In [ ]:
# !pip install mapclassify
In [ ]:
indicatorsByRegion.plot(column =('fragility', 'mean'),scheme='quantiles', cmap='OrRd_r',
                        legend=True,
                        legend_kwds={"title": "Fragility",'loc': 'lower left'},
                        edgecolor='black',linewidth=0.2,
                        figsize=(15, 10))

TOC


The convex hull¶

Some time you may have the need to create a polygon that serves as an envelope to a set of points.

For this example, let me compute the centroid coordinates:

In [ ]:
brazil_5880=gpd.read_file(LinkBrazil,layer='country')
brazil_5880.centroid
In [ ]:
# then
brazil_5880.centroid.x.values[0],brazil_5880.centroid.y.values[0]

Let me open the airports:

In [ ]:
airports_5880=gpd.read_file(LinkBrazil,layer='airports')
airports_5880.head()

Now, let me plot some airports, using the centroid:

In [ ]:
# coordinates
centroidX,centroidY=brazil_5880.centroid.x.values[0],brazil_5880.centroid.y.values[0]

# subsets of medium airports
Brazil_AirTopLeft=airports_5880[airports_5880.kind=='medium_airport'].cx[:centroidX,centroidY:]
Brazil_AirTopRight=airports_5880[airports_5880.kind=='medium_airport'].cx[centroidX:,centroidY:]
Brazil_AirBottomLeft=airports_5880[airports_5880.kind=='medium_airport'].cx[:centroidX,:centroidY]
Brazil_AirBottomRight=airports_5880[airports_5880.kind=='medium_airport'].cx[centroidX:,:centroidY]

Let me plot the subsets:

In [ ]:
base=Brazil_AirTopLeft.plot(facecolor='grey', alpha=0.4)
Brazil_AirTopRight.plot(ax=base,facecolor='orange', alpha=0.4)
Brazil_AirBottomLeft.plot(ax=base,facecolor='green', alpha=0.4)
Brazil_AirBottomRight.plot(ax=base,facecolor='red', alpha=0.4)

Notice we have simple points in each subset:

In [ ]:
Brazil_AirBottomLeft

In this situation, you can NOT make a convex hull:

In [ ]:
Brazil_AirBottomLeft.convex_hull.plot()

You first need to dissolve, and then you create a hull, an envelope of convex angles:

In [ ]:
Brazil_AirBottomLeft.dissolve().convex_hull.plot()

As we saw, the convex hull is a polygon:

In [ ]:
Brazil_AirBottomLeft.dissolve().convex_hull

What if we have the polygons that had not been previously combined?

In [ ]:
brazil_municipalities.cx[:centroidX,:centroidY].convex_hull.plot(edgecolor='red')
In [ ]:
# notice
brazil_municipalities.cx[:centroidX,:centroidY].dissolve().convex_hull.plot(edgecolor='red')

That is, you get a convex hull for each geometry.

We can also use union before creating a convex hull:

In [ ]:
# just the union
large_airport=airports_5880[airports_5880.kind=='large_airport']
large_airport.union_all()
In [ ]:
# hull of the union
large_airport.union_all().convex_hull

Let's turn the GS into a GDF:

In [ ]:
LargeAirport_hull= gpd.GeoDataFrame(index=[0],
                                    crs=large_airport.crs,
                                    geometry=[large_airport.union_all().convex_hull])
LargeAirport_hull['name']='large airports hull' # optional

# then

LargeAirport_hull

Let's use the GDF in plotting:

In [ ]:
base=brazil_5880.plot(facecolor='yellow')
large_airport.plot(ax=base)
LargeAirport_hull.plot(ax=base,facecolor='green',
                       edgecolor='white',alpha=0.4,
                       hatch='X')

TOC


Spatial Overlay¶

We might need to create or find some geometries from the geometries we already have. Using a set theory approach, we will se the use of intersection, union, difference, and symmetric difference.

Let me create this GeoDFs:

In [ ]:
# the north
MunisN_brazil=brazil_municipalities.cx[:,centroidY:]
# the south
MunisS_brazil=brazil_municipalities.cx[:,:centroidY]
# the west
MunisW_brazil=brazil_municipalities.cx[:centroidX,:]
# the east
MunisE_brazil=brazil_municipalities.cx[centroidX:,:]

Let me plot:

In [ ]:
base=MunisN_brazil.plot(facecolor='yellow', edgecolor='black',linewidth=0.2, alpha=0.6)
MunisS_brazil.plot(facecolor='grey', edgecolor='black',linewidth=0.2,ax=base, alpha=0.4)
In [ ]:
base=MunisE_brazil.plot(facecolor='yellow', edgecolor='black',linewidth=0.2, alpha=0.6)
MunisW_brazil.plot(facecolor='grey', edgecolor='black',linewidth=0.2,ax=base, alpha=0.4)

Intersection¶

We keep what is common between GeoDFs:

In [ ]:
munisNS_brazil=MunisN_brazil.overlay(MunisS_brazil, how="intersection",keep_geom_type=True)
munisNS_brazil.plot()
In [ ]:
# keeping the overlay
munisWE_brazil=MunisW_brazil.overlay(MunisE_brazil, how="intersection",keep_geom_type=True)
munisWE_brazil.plot(edgecolor='white',linewidth=0.1)

Union¶

Let me unite any polygons that touche each other. First, take a look at each one:

In [ ]:
munisNS_brazil.info()
In [ ]:
munisWE_brazil.info()

Let me subset and show you:

In [ ]:
keep=['ADM0_EN_1','ADM1_PT_1','ADM2_PT_1','geometry']
munisNS_brazil=munisNS_brazil.loc[:,keep]
munisWE_brazil=munisWE_brazil.loc[:,keep]
In [ ]:
# now
munisNS_brazil.overlay(munisWE_brazil,how="union",keep_geom_type=True)

As you see, geometries are fine, but not attributes. It is strictly NOT appending the GeoDFs:

In [ ]:
# appending
import pandas as pd

pd.concat([munisNS_brazil,munisWE_brazil],ignore_index=True)

These are different operations. UNION will combine any geometries that have overlaping portions in common.

In [ ]:
munisNS_brazil.overlay(munisWE_brazil, how="union",keep_geom_type=True)

Let me create an object to save the previous result:

In [ ]:
muniMidBrazil=munisNS_brazil.overlay(munisWE_brazil, how="union",keep_geom_type=True).dissolve()
muniMidBrazil
In [ ]:
# some cleaning

muniMidBrazil['zone']='middles'
muniMidBrazil=muniMidBrazil.loc[:,['ADM0_EN_1_1','zone','geometry']]
muniMidBrazil

Difference¶

Here, you keep what belongs to the GeoDF to left that is not in the GeoDF to the right:

In [ ]:
# with the municipalities
brazil_municipalities.overlay(muniMidBrazil, how='difference').plot()

Symmetric Difference¶

This is the opposite to intersection, you keep what is not in the intersection:

In [ ]:
MunisN_brazil.overlay(MunisS_brazil, how="symmetric_difference",keep_geom_type=True).plot()
In [ ]:
MunisW_brazil.overlay(muniMidBrazil, how="symmetric_difference",keep_geom_type=False).plot()

TOC


Validity of Geometry¶

Geometries are created in a way that some issues may appear, especially in (multi) polygons. Let's check if our recent maps on states and municipalities are valid:

In [ ]:
# non valid
MunisS_brazil[~MunisS_brazil.is_valid]
In [ ]:
# see the invalid:
MunisS_brazil[~MunisS_brazil.is_valid].plot()

It is difficult to see what is wrong. Let's get some information:

In [ ]:
# what is wrong?

from shapely.validation import explain_validity, make_valid

explain_validity(MunisS_brazil[~MunisS_brazil.is_valid].geometry)
In [ ]:
# varieties?
MunisS_brazil['validity']=[x.split('[')[0] for x in MunisS_brazil.geometry.apply(lambda x: explain_validity(x))]
MunisS_brazil['validity'].value_counts()
In [ ]:
# solving the issue:
MunisS_brazil.drop(columns=['validity'],inplace=True)

MunisS_brazil_valid=MunisS_brazil.copy()

MunisS_brazil_valid['geometry'] = [make_valid(row)  if not row.is_valid else row for row in MunisS_brazil_valid['geometry'] ]
#any invalid?
MunisS_brazil_valid[~MunisS_brazil_valid.is_valid]

The solution we got may help for some advanced techniques, but may also give us some extra trouble. Notice that once geopandas solved the problem, you have created collections:

In [ ]:
pd.Series([type(x) for x in MunisS_brazil_valid.geometry]).value_counts()

The Buffer¶

The buffer will create polygon that follows the same shape of the original vector (line, polygon, point).

In [ ]:
# this is the original
riversBrazil_clipped.plot()

But, verify crs as we are going to use distances:

In [ ]:
riversBrazil_clipped.crs

Let's reproject:

In [ ]:
riversBrazil_5880=riversBrazil_clipped.copy()
riversBrazil_5880 = riversBrazil_5880.to_crs('EPSG:5880')

Now I can use the rivers to create a buffer of 50000 meters:

In [ ]:
# 50000 at each side (radius)
riversBrazil_5880.buffer(50000).plot(facecolor='yellow', edgecolor='black',linewidth=0.2)

Then

In [ ]:
base=riversBrazil_5880.buffer(50000).plot(facecolor='orange', linewidth=0.2)
riversBrazil_5880.plot( edgecolor='grey',ax=base)

The buffering process helps cleaning simple invalidities:

In [ ]:
MunisS_brazil_valid=MunisS_brazil.copy()

MunisS_brazil_valid['geometry'] = MunisS_brazil_valid['geometry'].buffer(0)

#any invalid?
MunisS_brazil_valid[~MunisS_brazil_valid.is_valid]

Comparing with the previous result, this time you got no collections:

In [ ]:
# then:
pd.Series([type(x) for x in MunisS_brazil_valid.geometry]).value_counts()

This 'buffer trick' may not always work:

In [ ]:
# previously
indicators.dissolve(
     by="region",
     aggfunc={
         "fragility": ["mean"]},as_index=False
 ).plot(column =('fragility', 'mean'),scheme='quantiles', cmap='OrRd_r',
                        legend=True,
                        legend_kwds={"title": "Fragility",'loc': 'lower left'},
                        edgecolor='black',linewidth=0.2,
                        figsize=(15, 10))

Notice that AFRICA has some lines that should have dissappeared after dissolving, but you can still see some lines.

We could try the buffer trick:

In [ ]:
indicators_valid=indicators.copy()
indicators_valid['geometry'] = indicators_valid['geometry'].buffer(0)
dissolved_gdf=indicators_valid.dissolve(
     by="region",
     aggfunc={
         "fragility": ["mean"]},as_index=False, grid_size=0.1
 )




dissolved_gdf.plot(column =('fragility', 'mean'),scheme='quantiles', cmap='OrRd_r',
                        legend=True,
                        legend_kwds={"title": "Fragility",'loc': 'lower left'},
                        edgecolor='black',linewidth=0.2,
                        figsize=(15, 10))

It did not work either. We may use a new functionality in GeoPandas's dissolve:

In [ ]:
dissolved_gdf=indicators.dissolve(
     by="region",
     aggfunc={
         "fragility": ["mean"]},as_index=False, grid_size=0.1
 )




dissolved_gdf.plot(column =('fragility', 'mean'),scheme='quantiles', cmap='OrRd_r',
                        legend=True,
                        legend_kwds={"title": "Fragility",'loc': 'lower left'},
                        edgecolor='black',linewidth=0.2,
                        figsize=(15, 10))

Now, you do see a clean map.