Search code examples
pythonnumpydataframereplacecalculated-columns

how replace NaN columns with calculated CAGR values


i have a dataframe with NaN values. i want to replace that NaN values to CAGR values

  val1  val2  val3  val4  val5 
0 100    100   100   100  100
1  90    110    80   110   50
2  70    150    70   NaN   NaN
3  NaN    NaN   NaN  NaN   NaN

CAGR(compound annual growth rate) = (end value / first value) ** (1/number of years)

for example, val1's CAGR is -23%. so the last value of val1 will be 53.9

the column val4's CAGR value is 10%

so row2 NaN will be 121 and row3 NaN replace as 133

how can i replace NaN automatically?

the questions is

1) how can i calculate CAGR each columns?

i used isnull() so, i found which row is empty. but i don't know how to except the row to calculate CAGR.

2) how can i replace NaN with calculated values?

thank you.


Solution

  • from __future__ import division # for python2.7
    import numpy as np
    
    # tab delimited data
    a = '''100  100 100 100 100
    90  110 80  110 50
    70  150 70  NaN NaN
    NaN NaN NaN NaN NaN
    '''
    
    # parse and make a numpy array
    data = np.array( [[np.nan if aaa=='NaN' else int(aaa) for aaa in aa.split('\t')] for aa in a.splitlines()] )
    
    for col in range(5):
        
        Nyears = np.isnan(data[:,col]).argmax()-1 # row index for the last non-NaN value
        endvalue = data[Nyears,col]
        cagr = (endvalue / 100) ** (1 / Nyears)
        print Nyears, endvalue, cagr
    
        for year in np.argwhere(np.isnan(data[:,col])):
            data[year,col] = data[year-1,col] * cagr
    
    print data
    

    I get:

    [[ 100.          100.          100.          100.          100.        ]
     [  90.          110.           80.          110.           50.        ]
     [  70.          150.           70.          121.           25.        ]
     [  58.56620186  183.71173071   58.56620186  133.1          12.5       ]]