Search code examples
pythonpandasdataframeiterationsubtraction

How subtract a Dataframe with totals another Dataframe based on condition and until 0


I'm new with python and pandas so here's my question:

I have two dataframes, df1 has two columns one for labels and one for integers which correspond to the Toal of each label while df2 contains the quantity used by day. I would like to subtract each row of df2 until df1 is equal or closer to 0 and add a column to df1 the date of the last row subtracted (it could be in a new dataframe df3). The subtraction needs to have two conditions where:

  1. labels need to be equal
  2. subtraction can not be minor to 0

df1 = DF with Totals

df2 = DF with quantities

I hope someone can help me.


Solution

  • I believe a .cumsum() and .idxmin() will help with this.

    1. Join your dataframes on label
    2. Create a new "Running Quantity" column that is a .cumsum() on the "Quantity" column (pandas documenation on .cumsum(); blog post on .cumsum())
    3. Create a new "Running Total" column that is "Total" - "Running Quantity"
    4. Filter to only the positive values in "Running Total" (StackOverflow answer about filtering out negative values)
    5. Filter to the minimum value of "Running Total" per label using .idxmin() (pandas documentation on .idxmin(); StackOverflow answer about .idxmin())

    This should give you a three-column data frame with one row per label, the date when the running total was closest to but not lower than 0, and the amount (Total - Running Quantity at that date).