Search code examples
pythonpandasfunctionmedian

Create a function to calculate median cost across different years


I have a sample dataset which contains id and costs in diff years as the one below:

Id 2015-04 2015-05 2015-06 2015-07 2016-04 2016-05 2016-06 2016-07 2017-04 2017-05 2017-06 2017-07 2018-04 2018-05 2018-06 2018-07
10 58500 58500 58300 57800 57500 57700 57800 57800 57800 57900 58400 59000 59500 59500 59000 58500
11 104600 104600 105700 106100 106300 107300 108000 107600 107800 108300 109200 109600 109300 108700 109000 110700
12 104900 106700 107900 107500 106100 105200 105700 106400 106700 107100 107200 107100 107500 108300 109200 110500
13 50500 49600 48900 48400 48100 48000 47700 47500 47400 47600 47800 47800 47600 47600 48100 48400
14 49800 49900 50300 50800 51100 51200 51200 51400 51600 51900 52400 52600 52300 51800 51100 50900

How can I create a function in Python to find the median cost of each year belonging to their respective id? I want the function to be dynamic in terms of the start and end year so that if new data comes for different years, the code will calculate the changes accordingly. For example, if new data comes for 2019, the end date would automatically be considered as 2019 instead of 2018 and calculate its median respectively.

With the current data sample given above, the result should look something like one below:

Id 2015 2016 2017 2018
10 58400 57750 58150 59250
11 105150 107450 108750 109150
12 107100 105900 107100 108750
13 49250 47850 47700 47850
14 50100 51200 52150 51450

Solution

  • First we split the column names on - and get only the year. Then we groupby over axis=1 based on these years and take the median:

    df = df.set_index("Id")
    df = df.groupby(df.columns.str.split("-").str[0], axis=1).median().reset_index()
    # or get first 4 characters
    # df = df.groupby(df.columns.str[:4], axis=1).median().reset_index()
    
       Id    2015    2016    2017    2018
    0  10   58400   57750   58150   59250
    1  11  105150  107450  108750  109150
    2  12  107100  105900  107100  108750
    3  13   49250   47850   47700   47850
    4  14   50100   51200   52150   51450