I have written a function to check if a point is within a polygon, and if it is to return "True,{branch_name}" where branch_name is the name of the polygon, and when trying to extend this function to the whole column, I keep encountering the "GeometryTypeError: Unknown geometry type: featurecollection" error.
The function I have written is:
def latlon_check(intsct, area, branch): #intsct = point, area = polygon, branch= branch name
check = intsct.within(shape(area))
if check == True:
within.append(f"True,{branch}")
else:
within.append(f"False")
return within
intsct is in df1 - a dataframe with a few hundred rows in it
area and branch are in df2 - a data frame with 10 rows
Note: The function works fine when inputting single values as arguments.
I want to make a new column in df1 where every row will either say "False" or "True,{branch_name}" showing which branch the point is in.
Using:
df1['within'] = df1['intsct'].apply(latlon_check, args = (df2['area'],df2['branch']))
and get the error:
GeometryTypeError: Unknown geometry type: featurecollection
I have tried rewriting the column as a string then converting it back to 'geometry' and still had the same error, will appreciate any help!
A few points
shape()
it's not clear what this is. Possibly takes a WKT string and converts it into a polygon. Have removed as a polygon is being passed in context I have simulateddf2
by the way you are calling apply()
. Hence have refactored latlon_check()
to use series rather than singletons.within.append()
this is function to add items to a list there is no list. Have just simplified to return
the desired stringimport geopandas as gpd
# simulate data that matches question
df1 = gpd.read_file(gpd.datasets.get_path('naturalearth_cities'))
df2 = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
df1 = df1.rename(columns={"geometry":"intsct"})
df2 = df2.sample(10, random_state=42).rename(columns={"geometry":"area","name":"branch"}).loc[:,["branch","area"]]
def latlon_check(intsct, area, branch): #intsct = point, area = polygon series, branch= branch name series
for i, area_ in area.items():
if intsct.within(area_):
return f"True,{branch.loc[i]}"
return "False"
df1["within"] = df1['intsct'].apply(latlon_check, args = (df2['area'],df2['branch']))
name | intsct | within | |
---|---|---|---|
0 | Vatican City | POINT (12.4533865 41.9032822) | False |
1 | San Marino | POINT (12.4417702 43.9360958) | False |
2 | Vaduz | POINT (9.5166695 47.1337238) | False |
3 | Lobamba | POINT (31.1999971 -26.4666675) | False |
4 | Luxembourg | POINT (6.1300028 49.6116604) | False |
22 | Podgorica | POINT (19.2663069 42.4659725) | True,Montenegro |
57 | Port-au-Prince | POINT (-72.3379804 18.5429705) | True,Haiti |
84 | Riga | POINT (24.0999654 56.9500238) | True,Latvia |
114 | Sucre | POINT (-65.2595156 -19.0409708) | True,Bolivia |
119 | Yerevan | POINT (44.5116055 40.1830966) | True,Armenia |
122 | La Paz | POINT (-68.151931 -16.4960278) | True,Bolivia |
198 | Ürümqi | POINT (87.5730598 43.8069581) | True,China |
199 | Chengdu | POINT (104.0680736 30.6719459) | True,China |
214 | Taipei | POINT (121.5683333 25.0358333) | True,Taiwan |
227 | Beijing | POINT (116.39420089260611 39.901720309862675) | True,China |
232 | Shanghai | POINT (121.4345588 31.2183983) | True,China |
242 | Hong Kong | POINT (114.1830635 22.3069268) | True,China |