Search code examples
pythonpandasdataframemultiplication

Pandas multiplication of two dataframes with uneven date rows and label columns


I have the following sample of historical Bitcoin rates in US dollars:

BTC_USD_rates = {
    "Open": {"01/01/2022": 46217.5, "02/01/2022": 47738.7, "03/01/2022": 47293.9, "04/01/2022": 46435.7, "05/01/2022": 45833.1, "06/01/2022": 43431.6, "07/01/2022": 43097.9, "08/01/2022": 41551.3},
    "Low": {"01/01/2022": 46217.5, "02/01/2022": 46718.2, "03/01/2022": 45704.0, "04/01/2022": 45602.1, "05/01/2022": 42535.1, "06/01/2022": 42481.1, "07/01/2022": 40810.0, "08/01/2022": 40574.3},
    "High": {"01/01/2022": 47917.6, "02/01/2022": 47944.9, "03/01/2022": 47556.0, "04/01/2022": 47505.4, "05/01/2022": 47019.4, "06/01/2022": 43772.3, "07/01/2022": 43127.7, "08/01/2022": 42304.4},
    "Close": {"01/01/2022": 47738.0, "02/01/2022": 47311.8, "03/01/2022": 46430.2, "04/01/2022": 45837.3, "05/01/2022": 43425.9, "06/01/2022": 43097.5, "07/01/2022": 41546.7, "08/01/2022": 41672.0},
    "Volume": {"01/01/2022": 31239, "02/01/2022": 27020, "03/01/2022": 41062, "04/01/2022": 55589, "05/01/2022": 83744, "06/01/2022": 63076, "07/01/2022": 88358, "08/01/2022": 52544},
}
df1 = pd.DataFrame.from_dict(BTC_USD_rates)
df1

            Open    Low     High    Close   Volume
01/01/2022  46217.5 46217.5 47917.6 47738.0 31239
02/01/2022  47738.7 46718.2 47944.9 47311.8 27020
03/01/2022  47293.9 45704.0 47556.0 46430.2 41062
04/01/2022  46435.7 45602.1 47505.4 45837.3 55589
05/01/2022  45833.1 42535.1 47019.4 43425.9 83744
06/01/2022  43431.6 42481.1 43772.3 43097.5 63076
07/01/2022  43097.9 40810.0 43127.7 41546.7 88358
08/01/2022  41551.3 40574.3 42304.4 41672.0 52544

And then for the same period I have the following historical New Zealand Dollars to $1 US Dollar rates:

USD_NZD_rates = {
    "Open": {"03/01/2022": 1.465, "04/01/2022": 1.4719, "06/01/2022": 1.4717, "07/01/2022": 1.4819},
    "Low": {"03/01/2022": 1.4583, "04/01/2022": 1.4651, "06/01/2022": 1.4708, "07/01/2022": 1.4733},
    "High": {"03/01/2022": 1.4763, "04/01/2022": 1.4784, "06/01/2022": 1.4854, "07/01/2022": 1.4849},
    "Close": {"03/01/2022": 1.4732, "04/01/2022": 1.4669, "06/01/2022": 1.4817, "07/01/2022": 1.4741},
}
df2 = pd.DataFrame.from_dict(USD_NZD_rates)
df2

            Open    Low     High    Close
03/01/2022  1.4650  1.4583  1.4763  1.4732
04/01/2022  1.4719  1.4651  1.4784  1.4669
06/01/2022  1.4717  1.4708  1.4854  1.4817
07/01/2022  1.4819  1.4733  1.4849  1.4741

What I need to accomplish is convert each date's Open, Low, High and Close BTC rates to NZD's using the USD_NZD Close rate for each respective date.

There are two caveats however, and those are preventing me to get there by just going plain vanilla like df1.multiply(df2["Close"], axis="index"):

  1. Ignore the Volume column in df1.
  2. df2 doesn't bring the USD_NZD Close rates I need for some dates (01/01/2022, 02/01/2022, 05/01/2022 and 08/01/2022), so for such cases I need the method to make sure each of the two situations are dealt with accordingly:
    • For those initial missing dates (01/01/2022 and 02/01/2022) the FIRST available date must be used (03/01/2022) as the USD_NZD Close rate that will be used to convert all 4 BTC_USD rates.
    • In case of any missing date down the dataframe (05/01/2022 and 08/01/2022), the PREVIOUS available date must be used (04/01/2022 and 07/01/2022 respectively) as the USD_NZD Close rate that will be used to convert all 4 BTC_USD rates.

How can I get there considering all those exceptions?!


Solution

  • Based on @Quang Hoang first stab at it I managed to get to the answer I needed:

    First, following his proposed method I expanded the df2 dataframe by adding those missing dates that show up under df1:

    df2 = df2.reindex(df1.index.union(df2.index))
    
                Open    Low     High    Close
    01/01/2022  NaN     NaN     NaN     NaN
    02/01/2022  NaN     NaN     NaN     NaN
    03/01/2022  1.4650  1.4583  1.4763  1.4732
    04/01/2022  1.4719  1.4651  1.4784  1.4669
    05/01/2022  NaN     NaN     NaN     NaN
    06/01/2022  1.4717  1.4708  1.4854  1.4817
    07/01/2022  1.4819  1.4733  1.4849  1.4741
    08/01/2022  NaN     NaN     NaN     NaN
    

    Then, also piggybacking on his method, but with some necessary tweaks, I managed to backward fill the initial missing dates with the rates from the first available date, and then forward fill the rest of the missing dates with rates from their respective prior available dates:

    df2 = df2.bfill()
    df2 = df2.ffill()
    
                Open    Low     High    Close
    01/01/2022  1.4650  1.4583  1.4763  1.4732
    02/01/2022  1.4650  1.4583  1.4763  1.4732
    03/01/2022  1.4650  1.4583  1.4763  1.4732
    04/01/2022  1.4719  1.4651  1.4784  1.4669
    05/01/2022  1.4719  1.4651  1.4784  1.4669
    06/01/2022  1.4717  1.4708  1.4854  1.4817
    07/01/2022  1.4819  1.4733  1.4849  1.4741
    08/01/2022  1.4819  1.4733  1.4849  1.4741
    

    Since I need to convert all df1 rates (Open, Low, High and Close) by the df2 Close rates only, I ran these additional lines to apply the Close rates column to the other columns, and also created a Volume column with all 1 values since this column should not be converted - I'm pretty sure there might be a way to do it all in one line of code, but that's the only way I know how to do it:

    df2['Open'] = df2['Close']
    df2['Low'] = df2['Close']
    df2['High'] = df2['Close']
    df2['Volume'] = 1
    
                Open    Low     High    Close   Volume
    01/01/2022  1.4732  1.4732  1.4732  1.4732  1
    02/01/2022  1.4732  1.4732  1.4732  1.4732  1
    03/01/2022  1.4732  1.4732  1.4732  1.4732  1
    04/01/2022  1.4669  1.4669  1.4669  1.4669  1
    05/01/2022  1.4669  1.4669  1.4669  1.4669  1
    06/01/2022  1.4817  1.4817  1.4817  1.4817  1
    07/01/2022  1.4741  1.4741  1.4741  1.4741  1
    08/01/2022  1.4741  1.4741  1.4741  1.4741  1
    

    And finally I run the multiplication of df1 to df2 to get to the sought for result:

    df1 = df1.mul(df2, axis = 0)
    
                Open        Low         High        Close       Volume
    01/01/2022  68087.62100 68087.62100 70592.20832 70327.62160 31239
    02/01/2022  70328.65284 68825.25224 70632.42668 69699.74376 27020
    03/01/2022  69673.37348 67331.13280 70059.49920 68400.97064 41062
    04/01/2022  68116.52833 66893.72049 69685.67126 67238.73537 55589
    05/01/2022  67232.57439 62394.73819 68972.75786 63701.45271 83744
    06/01/2022  64352.60172 62944.24587 64857.41691 63857.56575 63076
    07/01/2022  63530.61439 60158.02100 63574.54257 61243.99047 88358
    08/01/2022  61250.77133 59810.57563 62360.91604 61428.69520 52544