Search code examples
pythonpandasiterationmultiplication

Combine row iteration with pairwise multiplication


I have the following dataframe:

                in_scenario_USA  USA index_in
month year                               
4     1960              NaN           0
5     1960              NaN           0
6     1960         0.000000           1.0
7     1960         0.000000           1.0
8     1960         0.000000           1.0
9     1960         0.000000           1.0
10    1960         0.000000           1.0
11    1960         0.000000           1.0
12    1960         0.000000           1.0
1     1961         0.000000           1.0
2     1961         0.025536           1.0
3     1961         0.003843           1.0
4     1961         0.019139           1.0
5     1961         0.000000           1.0

The first column is the percentage return of one month. Based on my scenarios, it is either 0 or a number. I would like my second column to have its first non null value to remain 1 and then each subsequent value can be explained with the following formula:

 USA index_in[i] = USA index_in[i-1] * (1 + in_scenario_USA)[i]

so in the end it would look like this:

                in_scenario_USA  USA index_in
month year                               
4     1960              NaN           0
5     1960              NaN           0
6     1960         0.000000           1.0
7     1960         0.000000           1.0
8     1960         0.000000           1.0
9     1960         0.500000           1.5
10    1960         0.500000           2.25
11    1960         0.000000           2.25
12    1960         0.000000           2.25
1     1961         0.000000           2.25
2     1961         -0.200000          1.8
3     1961         0.100000           1.98
4     1961         0.100000           2.178
5     1961         0.000000           2.178

I've tried many loops but the one I feel the closest to be accurate is this one but I only get NaN values in the end.

for i in range(0, len(df_merged[col + ' index_in'])):
    if df_merged[col + ' index_in'].iloc[i] == 1 and (df_merged[col + ' index_in'].iloc[-i] == 0):
        continue
    else:
        df_merged[col + ' index_in'].iloc[i] = np.multiply(df_merged[col + ' index_in'].iloc[i-1], df_merged['in_scenario_' + col].iloc[i])

Thanks for your help.


Solution

  • I think your problem is the NaN values in your first couple rows, that get multiplied through and result in adding NaN values. Try this:

    newcol = []
    firstnonnan = True
    for index, row in df.iterrows():
        if row['in_scenario_USA'].isnull():
            newcol.append(row['USA_index_in'])
        elif firstnonnan = True:
            newcol.append(1)
            firstnonnan = False
        else:
            newcol.append(newcol[-1]*(1+row['in_scenario_USA']))
    df['USA index_in'] = newcol
    

    Basically, you're going to iterate through the dataframe and build a list to overwrite your column with. If you have a NaN value, you're going to take the same value that's already in that column. The first time you see a non-NaN, you'll add a 1 to your list. After that, you'll use the last item in the list as your previous value.