Search code examples
pythonpandasjupyter-notebookdata-analysis

Counting the number of rows having values greater than particular row after grouping in a dataframe


I have a dataframe containing the population of important cities. Dataframe image

It contains cities from different countries. out of all cities in a particular country there is only one city which is consiedered as primary(mentioned in the column 'capital').I need to find out the number of cities which is having population greater than that of the primary city.Kindly provide one solution?


Solution

  • Use:

    #test primary
    m = df['capital'].eq('primary')
    
    #get dict for primary population
    d = df[m].set_index('country')['population'].to_dict()
    
    #filter not primary
    df1 = df[~m].copy()
    
    #compare non primary population with primary by map and count with sum
    out = df1['population'].gt(df1['country'].map(d)).sum()
    
    #filter compare rows if necessary
    df2 = df1[df1['population'].gt(df1['country'].map(d))]