Search code examples
pythondataframecalculated-columns

Calculate and add columns to a data frame using multiple columns for sorting


I have a pretty simple data frame with Columns A, B, C and I am would like to add several. I would like to create two cumulative summed columns and have these stored in that same data frame. Currently I'm doing it by creating two different data frames that are order differently and then plotting the results on the same graph but I'm guessing there is a more efficient approach. The columns I'm trying to create are: (1) Column D = the cumulative sum of Column C ordered by increasing values in Column A (2) Column E = The cumulative sum of Column C ordered by decreasing values in column B

Examples of the what my data looks like and the 2 columns I'm trying to calculate


Solution

  • This should work:

    # Cumsum helps us get the cummulative sum and we sort after for correct order of column
    df = pd.read_csv('Sample.csv')
    df.insert(3,'D',df.sort_values(by = ['A']).C.cumsum().sort_values().values)
    df.insert(4,'E',df.sort_values(by = ['B'], ascending = False).C.cumsum().sort_values().values)
    
    print(df)
    
       A    B  C  D  E
    0  1  0.1  1  1  2
    1  2  0.3  3  4  3
    2  3  0.6  1  5  6
    3  4  0.7  2  7  8
    4  5  0.3  2  9  9