Finding out if a certain point is located inside or outside of an area, or finding out if a line intersects with another line or polygon are fundamental geospatial operations that are often used e.g. to select data based on location. Such spatial queries are one of the typical first steps of the workflow when doing spatial analysis. Performing a spatial join (will be introduced later) between two spatial datasets is one of the most typical applications where Point in Polygon query is used.
How to check if point is inside a polygon?
Computationally, detecting if a point is inside a polygon is a complicated matter. Luckily, we can use ready-made function for conducting the Point in Polygon query. We can take advantage of Shapely’s binary predicates that can evaluate the topolocical relationships between geographical objects, such as the PIP as we’re interested here.
There are basically two ways of conducting Point in Polygon queries in Shapely:
using a function called .within() that checks if a point is within a polygon
using a function called .contains() that checks if a polygon contains a point
Notice: even though we are talking here about Point in Polygon operation, it is also possible to check if a LineString or Polygon is inside another Polygon.
Let’s first create a Polygon using a list of coordinate-tuples and a couple of Point objects
from shapely.geometry import Point, Polygon# Create Point objectsp1 = Point(24.952242, 60.1696017)p2 = Point(24.976567, 60.1612500)# Create a Polygoncoords = [(24.950899, 60.169158), (24.953492, 60.169158), (24.953510, 60.170104), (24.950958, 60.169990)]poly = Polygon(coords)# Let's check what we haveprint(p1)print(p2)print(poly)
POINT (24.952242 60.1696017)
POINT (24.976567 60.16125)
POLYGON ((24.950899 60.169158, 24.953492 60.169158, 24.95351 60.170104, 24.950958 60.16999, 24.950899 60.169158))
Let’s check if those points are within the polygon
# Check if p1 is within the polygon using the within functionprint(p1.within(poly))# Check if p2 is within the polygonprint(p2.within(poly))
True
False
Ok, so we can see that the first point seems to be inside that polygon and the other one doesn’t.
In fact, the first point is close to the center of the polygon as we can see:
# Our pointprint(p1)# The centroidprint(poly.centroid)
POINT (24.952242 60.1696017)
POINT (24.952242428492358 60.16960179038188)
It is also possible to do PIP other way around, i.e. to check if polygon contains a point:
# Does polygon contain p1? print(poly.contains(p1))# Does polygon contain p2? print(poly.contains(p2))
True
False
Thus, both ways of checking the spatial relationship results in the same way.
Which one should you use then? Well, it depends:
if you have many points and just one polygon and you try to find out which one of them is inside the polygon:
you need to iterate over the points and check one at a time if it is within() the polygon specified
if you have many polygons and just one point and you want to find out which polygon contains the point
you need to iterate over the polygons until you find a polygon that contains() the point specified (assuming there are no overlapping polygons)
Intersect
Another typical geospatial operation is to see if a geometry intersect or touches another one. The difference between these two is that:
if objects intersect, the boundary and interior of an object needs to intersect in any way with those of the other.
If an object touches the other one, it is only necessary to have (at least) a single point of their boundaries in common but their interiors shoud NOT intersect.
Let’s try these out.
Let’s create two LineStrings
from shapely.geometry import LineString, MultiLineString# Create two linesline_a = LineString([(0, 0), (1, 1)])line_b = LineString([(1, 1), (0, 2)])# Let's see if they intersectdisplay(line_a.intersects(line_b))
True
Do they also touch each other?
display(line_a.touches(line_b))
True
Indeed, they do and we can see this by plotting the features together
# Create a MultiLineStringmulti_line = MultiLineString([line_a, line_b])display(multi_line)
Thus, the line_b continues from the same node ( (1,1) ) where line_a ends.
However, if the lines overlap fully, they don’t touch due to the spatial relationship rule, as we can see:
Check if line_a touches itself:
# Does the line touch with itself?print(line_a.touches(line_a))
False
It does not. However, it does intersect - a curious thing to note!
# Does the line intersect with itself?print(line_a.intersects(line_a))
True
Point in Polygon using Geopandas
Next we will do a practical example where we check which of Estonian Category III protected species sightings from a prepared monitoring GeoPackage file, category_3_species_porijogi.gpkg, are located in the Idaoja sub-catchment of the Porijogi river, by cross-checking with the polygons from a GeoJSON-file. The polygons are the modelled sub-catchments of the Porijogi river.
However, reading a layer from a GeoPackage file needs an additional information of the layer name, because GeoPackage is basically an embedded database format, building on top of SQLite.
Let’s start by reading the addresses from the GeoPackage layer file.
import geopandas as gpd# protected species under class 3 monitoring sightingsspecies_fp ="../files/data/L3/category_3_species_porijogi.gpkg"species_data = gpd.read_file(species_fp, layer='category_3_species_porijogi', driver='GPKG')display(species_data.head(5))
OBJECTID
LIIK
NIMI
EXT_SYST_I
KKR_KOOD
PRIV_TYYP
STAATUS
IMPORT
LAADIMISKP
geometry
0
148179
taimed III
soo-neiuvaip
652542557
KLO9320309
Avalik
kontrollitud
0
2018-10-29
POINT (646978.483 6444887.321)
1
148180
taimed III
soo-neiuvaip
1989720139
KLO9320255
Avalik
kontrollitud
0
2018-10-29
POINT (646730.472 6459776.774)
2
162026
loomad III
valge-toonekurg
-665748946
KLO9108356
Peidetud
arhiveeritud
0
2019-09-26
POINT (653008.611 6467205.284)
3
144301
taimed võõrliik
Sosnovski karuputk
-297982508
VLL1000576
Peidetud
arhiveeritud
0
2018-10-29
POINT (638031.354 6444230.237)
4
144305
taimed võõrliik
Sosnovski karuputk
1137537662
VLL1000598
Peidetud
arhiveeritud
0
2018-10-29
POINT (669298.005 6443792.562)
Reading GeoJSON-files in Geopandas
It is possible to read the data from GeoJSON-file in the same manner as a Shapefile.
Nice, now we can see that we have the sub-diveded catchments for the Porijogi river.
Warning
ATTENTION: The GeoJSON specification requires that coordinates are given in WGS84 geographic coordinate reference system. The species GPKG data are in the Estonian National Grid EPSG:3301!
So we need to reproject the catchments GeoJSON data to the same coordinate reference system!
In order to be geographically correct, we need to reproject the data to the locally more accurate projected coordinate reference system, EPSG:3301.
# Reproject to EPSG:3301# here we intentially overwrite the original GeoDataFrame with the reprojected onepolys = polys.to_crs(3301)# attention to the coordinates in the geometry columndisplay(polys.head(5))
OBJECTID
NAME_1
AREA_1
Shape_Leng
Shape_Area
ID
geometry
0
8
Idaoja
3823.427995
35446.162219
3.823428e+07
1
MULTIPOLYGON (((660834.858 6455555.914, 660851...
1
9
Keskjooks
5087.809731
42814.174755
5.087810e+07
2
MULTIPOLYGON (((666339.502 6455972.600, 666384...
2
10
Peeda
5634.162684
47792.268153
5.634163e+07
3
MULTIPOLYGON (((659914.002 6456514.131, 659817...
3
11
Sipe
890.280919
16449.028656
8.902809e+06
4
MULTIPOLYGON (((665928.914 6460634.243, 665985...
4
12
Tatra
3306.643841
31108.960376
3.306644e+07
5
MULTIPOLYGON (((658678.470 6457825.152, 658579...
We are interested in the sub-catchment that is called Idaoja. Let’s sub-select this single catchment and see where our data is located, and how they are relating with a plot of these two datasets on top of each other on a map.
import matplotlib.pyplot as pltplt.style.use('ggplot')plt.rcParams['figure.figsize'] = (7, 8)subcatch = polys.loc[polys['NAME_1']=='Idaoja']subcatch.reset_index(drop=True, inplace=True)# create a plot basis, with the ax objectfig, ax = plt.subplots()# use the ax object as shared basis for the plot# plot all the sub-catchmentspolys.plot(ax=ax, facecolor='gray')# plot the Idaoja subcatchment in redsubcatch.plot(ax=ax, facecolor='red')# plot the species point data on top of the sub-catchmentsspecies_data.plot(ax=ax, color='blue', markersize=5)plt.tight_layout()
Okey, so we can see that, indeed, certain points are within the selected red Polygon.
Let’s find out which one of them are located within the Polygon. Hence, we are conducting a Point in Polygon query.
Let’s check which Points are within the subcatch Polygon. Notice, that here we check if the Points are within the geometry of the subcatch GeoDataFrame. Hence, we use the loc[0, 'geometry'] to parse the actual Polygon geometry object from the GeoDataFrame.
# test the whole species point geodataframe against the single subcatchment polygon geometrypip_mask = species_data.within(subcatch.loc[0, 'geometry'])display(pip_mask)
As we can see, we now have an array of boolean values for each row, where the result is True if Point was inside the Polygon, and False if it was not.
We can now use this mask array to select the Points that are inside the Polygon. Selecting data with this kind of mask array (of boolean values) is easy by passing the array inside the loc indexing function of Pandas.