Search code examples
python-3.xpandasnumpyforecast

Creating multiple cohort from the pivot table


I have a requirement like below.

The initial information is a list of gross adds.

201910 201911 201912 202001 202002
20000 30000 32000 40000 36000

I have a pivot table as below.

201910 201911 201912 202001 202002
1000 2000 2400 3200 1800
500 400 300 200 nan
200 150 100 nan nan
200 100 nan nan nan
160 nan nan nan nan

Need to generate the report like below.

Cohort01:

5%
3%
3%
1%
1%
1%

From Cohort02 onwards it will take the average of last value of cohort01. Similarly for Cohort03 for both nan values it will take the average of corresponding values of cohort01 and cohort2. Again while calculating for cohort04 it will take the average of previous two cohorts(cohort02 and cohort03 values) to add all three nan value.

Is there anyone who can provide me a solution on this in Python.

The report should be generated as below.

All cohorts should be created separately.

enter image description here


Solution

  • You could try it like this:

    res = df.apply(lambda x: round(100/(df_gross.iloc[0]/x),1),axis=1)
    print(res)
    
       201910  201911  201912  202001  202002
    0     5.0     6.7     7.5     8.0     5.0
    1     2.5     1.3     0.9     0.5     NaN
    2     1.0     0.5     0.3     NaN     NaN
    3     1.0     0.3     NaN     NaN     NaN
    4     0.8     NaN     NaN     NaN     NaN
    
    for idx,col in enumerate(res.columns[1:],1):
        res[col] = res[col].fillna((res.iloc[:,max(idx-2,0)]+res.iloc[:,idx-1])/2)
    print(res)
    
       201910  201911  201912  202001  202002
    0     5.0     6.7    7.50   8.000  5.0000
    1     2.5     1.3    0.90   0.500  0.7000
    2     1.0     0.5    0.30   0.400  0.3500
    3     1.0     0.3    0.65   0.475  0.5625
    4     0.8     0.8    0.80   0.800  0.8000