Search code examples
pythonpandasfrequencymedian

calculate the median age for each region from frequency table with python


I have a dataframe that is similar to:

enter image description here

I would like to calculate the median age for each city but given that it is a frequency table I'm finding it somewhat tricky. Is there any function in pandas or other that would help me achieve this?


Solution

  • Maybe this works for you:

    import numpy as np
    import pandas as pd
    
    # create dataframe
    df = pd.DataFrame(
        [
            ['Alabama', 34, 67, 89, 89, 67, 545, 4546, 3, 23], 
            ['Georgia', 345, 65, 67, 32, 23, 567, 87, 647, 68]
        ],
        columns=['City', 0, 1, 2, 3, 4, 5, 6, 7, 8]
    ).set_index('City')
    print(df)
    
    # calculate median for freq table
    m = list() # median list
    for index, row in df.iterrows():
        v = list() # value list
        z = zip(row.index, row.values)
        for item in z:
            for f in range(item[1]):
                v.append(item[0])
        m.append(np.median(v))
    df_m = pd.DataFrame({'City': df.index, 'Median': m})
    print(df_m)
    

    Input:

               0   1   2   3   4    5     6    7   8
    City                                            
    Alabama   34  67  89  89  67  545  4546    3  23
    Georgia  345  65  67  32  23  567    87  647  68
    

    Output:

          City  Median
    0  Alabama     6.0
    1  Georgia     5.0