Search code examples
pythonpandasdataframeminimizeminimization

Minimize differences between dataframes/arrays with condition


I have a dataframe containing the quarterly data, that looks like this:

Country Q1 Q2 Q1_ratio Q2_ratio Total
USA 70 30 0.7 0.3 100
Canada 60 40 0.6 0.4 100
Japan 40 60 0.4 0.6 100
TOTAL 170 130

My issue is this: When I change the value in Total column, I need to recalculate the values for Q1 and Q2, so that that the sum of Q1 and Q2 equals the row total and also the column total of Q1 and Q2 can't change. Also, the Q1 and Q2 ratios should stay as close to the original values as possible to keep the seasonality.

So, in other words, I need to minimize the difference between quarterly ratios before and after the total value update.

Example: Let's say I change the total value for USA to 80, Canada to 90 and Japan to 130. If I keep the quarterly ratios exactly as they were before, I get this:

Country Q1 Q2 Q1_ratio Q2_ratio Total
USA 56 24 0.7 0.3 80
Canada 54 36 0.6 0.4 90
Japan 52 78 0.4 0.6 130
TOTAL 162 138

But this result doesn't comply with the second constraint (the quarterly column totals can't change).

If I keep the original column totals and do some trial and error to minimize the quarterly ratio differences, I get something like this:

Country Q1 Q2 Q1_ratio Q2_ratio Total
USA 58 22 0.725 0.275 80
Canada 56.5 33.5 0.628 0.372 90
Japan 55.5 74.5 0.427 0.573 130
TOTAL 170 130

I feel like this shouldn't be too hard to figure out but the solution eludes me. Any idea how to calculate this/ implement this in Python?


Solution

  • You haven't shared code, and I don't know exactly what your data looks like, so I'll answer accordingly in pseudocode. In fact this isn't really a Pandas question, besides for the simple extraction/replacement of data.

    • The first step of the solution, which you show, is to calculate the difference in totals (new Total - old Total), and for each row, for each quarter, multiply the change by that quarter's ratio. This way your ratio is exactly preserved (the middle table in your example).
    • Now for your second constraint, we calculate the difference between the new column totals and the old ones (ie 170 - 162 and 130 - 138). We see that Q1 needs +8 and Q2 needs -8, so we will distribute this according to the following ratio
    • Divide each ratio column by the sum of the column. So 0.7 = 0.41, 0.6 = 0.35, etc.
    • Then for each country multiply the column diff (ie +8 and -8) by this ratio.

    This yields:

        Country Q1      Q2   Q1_ratio Q2_ratio Total
    0   USA     59.29   22.15   0.74    0.28    80
    1   Canada  56.82   33.54   0.63    0.37    90
    2   Japan   53.88   74.31   0.41    0.57    130
    

    Which is about the same as you got by hand.

    Note that this solution does not using analytical minimisation techniques, but just applies row and column ratios in order.