Search code examples
pythonpandasgeometrycoordinatesgeopandas

(GeometryTypeError: Unknown geometry type: featurecollection) How to extend a function to all values in a column dependent on another dataframe?


I have written a function to check if a point is within a polygon, and if it is to return "True,{branch_name}" where branch_name is the name of the polygon, and when trying to extend this function to the whole column, I keep encountering the "GeometryTypeError: Unknown geometry type: featurecollection" error.

The function I have written is:

def latlon_check(intsct, area, branch): #intsct = point, area = polygon, branch= branch name
    check = intsct.within(shape(area))
    if check == True:
        within.append(f"True,{branch}")
    else:
        within.append(f"False")
    return within

intsct is in df1 - a dataframe with a few hundred rows in it

area and branch are in df2 - a data frame with 10 rows

Note: The function works fine when inputting single values as arguments.

I want to make a new column in df1 where every row will either say "False" or "True,{branch_name}" showing which branch the point is in.

Using:

df1['within'] = df1['intsct'].apply(latlon_check, args = (df2['area'],df2['branch']))

and get the error:

GeometryTypeError: Unknown geometry type: featurecollection

I have tried rewriting the column as a string then converting it back to 'geometry' and still had the same error, will appreciate any help!


Solution

  • A few points

    • have used geopandas sample geometry to simulate data you describe. Making columns of two data frames consistent with what your describe
    • your sample code uses a function shape() it's not clear what this is. Possibly takes a WKT string and converts it into a polygon. Have removed as a polygon is being passed in context I have simulated
    • core you are passing series from df2 by the way you are calling apply(). Hence have refactored latlon_check() to use series rather than singletons.
    • within.append() this is function to add items to a list there is no list. Have just simplified to return the desired string

    MWE

    import geopandas as gpd
    
    # simulate data that matches question
    df1 = gpd.read_file(gpd.datasets.get_path('naturalearth_cities'))
    df2 = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
    df1 = df1.rename(columns={"geometry":"intsct"})
    df2 = df2.sample(10, random_state=42).rename(columns={"geometry":"area","name":"branch"}).loc[:,["branch","area"]]
    
    def latlon_check(intsct, area, branch): #intsct = point, area = polygon series, branch= branch name series
        for i, area_ in area.items():
            if intsct.within(area_):
                return f"True,{branch.loc[i]}"
        return "False"
    df1["within"] = df1['intsct'].apply(latlon_check, args = (df2['area'],df2['branch']))
    
    

    sample output

    name intsct within
    0 Vatican City POINT (12.4533865 41.9032822) False
    1 San Marino POINT (12.4417702 43.9360958) False
    2 Vaduz POINT (9.5166695 47.1337238) False
    3 Lobamba POINT (31.1999971 -26.4666675) False
    4 Luxembourg POINT (6.1300028 49.6116604) False
    22 Podgorica POINT (19.2663069 42.4659725) True,Montenegro
    57 Port-au-Prince POINT (-72.3379804 18.5429705) True,Haiti
    84 Riga POINT (24.0999654 56.9500238) True,Latvia
    114 Sucre POINT (-65.2595156 -19.0409708) True,Bolivia
    119 Yerevan POINT (44.5116055 40.1830966) True,Armenia
    122 La Paz POINT (-68.151931 -16.4960278) True,Bolivia
    198 Ürümqi POINT (87.5730598 43.8069581) True,China
    199 Chengdu POINT (104.0680736 30.6719459) True,China
    214 Taipei POINT (121.5683333 25.0358333) True,Taiwan
    227 Beijing POINT (116.39420089260611 39.901720309862675) True,China
    232 Shanghai POINT (121.4345588 31.2183983) True,China
    242 Hong Kong POINT (114.1830635 22.3069268) True,China