Search code examples
pythondataframegisgeopandas

How to calculate the values from different data frame based on boundaries


I have a geodataframe containing columns of :-

  1. no. of mobile subscription
  2. longitude (X)
  3. latitude (Y)

and another geodataframe called "boudaries" which containing the geometry of boundaries

I want to create another column in boundaries geodataframe which calculate the sum of mobile subscription based on the latitude and longitude that falls on the boundaries in the boundary dataframe.

I really hope someone can help me in this issue. Appreciate your kind assistance.

I have tried to merge both data frames, but I have no idea on how to calculate the data based on the boundaries


Solution

  • This answer outputs the num of subscription given a specific area:

    import geopandas as gpd
    import pandas as pd
    
    # creating a dummy boundary geodataframe
    df = pd.DataFrame({'name': ['first boundary', 'second boundary'],
                        'area': ['POLYGON ((-10 -3, -10 3, 3 3, 3 -10, -10 -3))', 'POLYGON ((-20 -21, -12 -17, 2 -15, 5 -20, -20 -21))']})
    
    boundaries = gpd.GeoDataFrame(df[['name']], geometry=gpd.GeoSeries.from_wkt(df.area, crs = 'epsg:4326'))
    
    # creating a dummy geodataframe with some points (you can change it to your coordenates)
    points = pd.DataFrame({'num_sub': [1, 2, 3, 4, 5],
                           'coordenates': ['POINT(-7 1)', 'POINT(1 -2)', 'POINT(-17 -20)', 'POINT(0 -18)', 'POINT(-5 0)']})
    
    subs_coordenates = gpd.GeoDataFrame(points[['num_sub']], geometry=gpd.GeoSeries.from_wkt(points.coordenates, crs = 'epsg:4326'))
    
    # returning the sum of subscription for each area and storing in a num_subs column
    boundaries['num_subs'] =  boundaries.geometry.apply(lambda x: x.contains(subs_coordenates.geometry).sum())
    

    If you have the X and Y cordenates in diferent columns (named X and Y in this example), you can do as folows:

    points = pd.DataFrame({'num_sub': [1, 2, 3, 4, 5],
                           'X': [-7, 1, -17, 0, -5],
                           'Y': [1, -2, -20, -18, 0]})
    
    # Converting the x and y columns to geometry points
    points['coordenates'] = points[['X', 'Y']].apply(lambda x: 'POINT('+str(x.X)+' '+str(x.Y)+')', axis=1)
    
    # creating the geopandas dataframe
    subs_coordenates = gpd.GeoDataFrame(points[['num_sub']], geometry=gpd.GeoSeries.from_wkt(points.coordenates, crs = 'epsg:4326'))
    
    # returning the sum of subscription for each area and storing in a num_subs column
    boundaries['num_subs'] =  boundaries.geometry.apply(lambda x: x.contains(subs_coordenates.geometry).sum())
    

    Hope it works for you.