Search code examples
pythonpandasdataframesumrows

How to sum previous rows in a dataframe?


I'm not sure how to correctly word what I want to do, or if this is even possible, so sorry if this sounds confusing and doesn't make sense. What I want to do basically is some way to create a separate column in a dataframe that sums consecutive values. For instance, I have this column, shown below, and I would like a separate column, which contains the sum of 0.00871 and 0.001, and then 0.000603 and 0.0123, and 0.000871 and 4.37E-05, and etc. Does this make sense? If so, is there a way I can code for this, instead of doing it manually? All help is appreciated! I would prefer to do this on Python, but any language would suffice.

Not allowed to embed the image, so hopefully this works


Solution

    • given your data is presented as an image, I have simulated it
    • what you state
      1. for every second row d.index % 2
      2. sum current row and previous row d["sourceValue"] + d["sourceValue"].shift()
    import numpy as np
    import pandas as pd
    
    pd.DataFrame({"sourceValue": np.random.uniform(0, 1, 30)}).assign(
        required=lambda d: np.where(
            d.index % 2, d["sourceValue"] + d["sourceValue"].shift(), np.nan
        )
    )
    
    sourceValue required
    0 0.804529 nan
    1 0.24858 1.05311
    2 0.710556 nan
    3 0.415231 1.12579
    4 0.879645 nan
    5 0.599331 1.47898
    6 0.635365 nan
    7 0.587647 1.22301
    8 0.902475 nan
    9 0.0398323 0.942307
    10 0.357436 nan
    11 0.469155 0.826591
    12 0.732759 nan
    13 0.721996 1.45476
    14 0.030494 nan
    15 0.0047623 0.0352563
    16 0.01081 nan
    17 0.130903 0.141713
    18 0.643588 nan
    19 0.837585 1.48117
    20 0.63328 nan
    21 0.506582 1.13986
    22 0.855085 nan
    23 0.100174 0.95526
    24 0.146992 nan
    25 0.64605 0.793041
    26 0.460859 nan
    27 0.818967 1.27983
    28 0.809294 nan
    29 0.664239 1.47353

    supplementary - multiple columns

    import numpy as np
    import pandas as pd
    
    df = pd.DataFrame({f"sourceValue{i+1}": np.random.uniform(0, 1, 30) for i in range(7)})
    
    # static version
    df["targetValue1"] = np.where(df.index%2, df["sourceValue1"]+df["sourceValue1"].shift(), np.nan)
    df["targetValue2"] = np.where(df.index%2, df["sourceValue2"]+df["sourceValue2"].shift(), np.nan)
    df["targetValue3"] = np.where(df.index%2, df["sourceValue3"]+df["sourceValue3"].shift(), np.nan)
    df["targetValue4"] = np.where(df.index%2, df["sourceValue4"]+df["sourceValue4"].shift(), np.nan)
    df["targetValue5"] = np.where(df.index%2, df["sourceValue5"]+df["sourceValue5"].shift(), np.nan)
    df["targetValue6"] = np.where(df.index%2, df["sourceValue6"]+df["sourceValue6"].shift(), np.nan)
    df["targetValue7"] = np.where(df.index%2, df["sourceValue7"]+df["sourceValue7"].shift(), np.nan)
    
    # dynamic version
    df = df.assign(**{c.replace("source", "target"): 
                      np.where(df.index % 2, df[c] + df[c].shift(), np.nan)
                      for c in df.columns if "sourceValue" in c})
    
    df