Search code examples
pythonpandasnumpybloomberg

Pandas efficiently interpolate sections of a larger dataframe


I have data coming in via the bloomberg API, the returned dataframe is a large singular dataframe (66 rows, 5 cols) and then sent out in one block via websocket as a singular JSON str.

I need to linearly/ simple interpolate this 66 row dataframe, however this interpolation must be performed separately for each currency (e.g. KWN = Korean Won, the price is around 1190, whereas for Chinese yuan is only around 6, so we can't interpolated between currencies).

I am currently very inefficiently filtering my dataframe on the index.str so that the first 3 chars match the iterated chosen currency.

I will be very very grateful if anyone has any ideas to help speed this all up/ tips. thanks very much :)

        self.ccy_prefix = ['KWN', 'IRN', 'NTN', 'IHN', 'PPN', 'CCN']
        for ccy in self.ccy_prefix:
            #interp for each section
            df[df.index.str[:3]==ccy] = df[df.index.str[:3]==ccy].interpolate(method='linear', limit_direction='forward', axis=0)
      

Filtered for Korean Wont, note the NAN at the +4Y tenor

This does successfully work, however it is very slow and inefficient, is there a way to use a map or some other clever Pandas method, I'd attempted to find an alternative but couldn't find anything thus far.

Filtered interpolated DF: enter image description here

Large singular DataFrame: enter image description here

Raw DataFrame:

{'BID': {'KWN+1W BGN Curncy': 1192.83, 'KWN+1M BGN Curncy': 1193.46, 'KWN+2M BGN Curncy': 1194.2, 'KWN+3M BGN Curncy': 1194.68, 'KWN+6M BGN Curncy': 1195.74, 'KWN+9M BGN Curncy': 1196.15, 'KWN+12M BGN Curncy': 1195.99, 'KWN+2Y BGN Curncy': 1195.57, 'KWN+3Y BGN Curncy': 1194.0, 'KWN+4Y BGN Curncy': nan, 'KWN+5Y BGN Curncy': 1188.95, 'IRN+1W BGN Curncy': 74.61, 'IRN+1M BGN Curncy': 74.83, 'IRN+2M BGN Curncy': 75.07, 'IRN+3M BGN Curncy': 75.51, 'IRN+6M BGN Curncy': 76.37, 'IRN+9M BGN Curncy': 77.22, 'IRN+12M BGN Curncy': 78.07, 'IRN+2Y BGN Curncy': 81.63, 'IRN+3Y BGN Curncy': nan, 'IRN+4Y BGN Curncy': 87.98, 'IRN+5Y BGN Curncy': 91.65, 'NTN+1W BGN Curncy': 27.576, 'NTN+1M BGN Curncy': 27.517, 'NTN+2M BGN Curncy': 27.442, 'NTN+3M BGN Curncy': 27.372, 'NTN+6M BGN Curncy': 27.174, 'NTN+9M BGN Curncy': 26.98, 'NTN+12M BGN Curncy': 26.784, 'NTN+2Y BGN Curncy': nan, 'NTN+3Y BGN Curncy': nan, 'NTN+4Y BGN Curncy': nan, 'NTN+5Y BGN Curncy': nan, 'IHN+1W BGN Curncy': 14337.8, 'IHN+1M BGN Curncy': 14369.2, 'IHN+2M BGN Curncy': 14417.0, 'IHN+3M BGN Curncy': 14448.8, 'IHN+6M BGN Curncy': 14595.9, 'IHN+9M BGN Curncy': 14703.8, 'IHN+12M BGN Curncy': 14896.0, 'IHN+2Y BGN Curncy': 15504.8, 'IHN+3Y BGN Curncy': nan, 'IHN+4Y BGN Curncy': nan, 'IHN+5Y BGN Curncy': nan, 'PPN+1W BGN Curncy': 51.58, 'PPN+1M BGN Curncy': 51.81, 'PPN+2M BGN Curncy': 52.01, 'PPN+3M BGN Curncy': 52.15, 'PPN+6M BGN Curncy': 52.56, 'PPN+9M BGN Curncy': 52.89, 'PPN+12M BGN Curncy': 53.17, 'PPN+2Y BGN Curncy': 54.32, 'PPN+3Y BGN Curncy': 55.68, 'PPN+4Y BGN Curncy': 56.46, 'PPN+5Y BGN Curncy': 57.72, 'CCN+1W BGN Curncy': 6.361, 'CCN+1M BGN Curncy': 6.373, 'CCN+2M BGN Curncy': 6.3853, 'CCN+3M BGN Curncy': 6.3976, 'CCN+6M BGN Curncy': 6.428, 'CCN+9M BGN Curncy': 6.4541, 'CCN+12M BGN Curncy': 6.4776, 'CCN+2Y BGN Curncy': 6.5653, 'CCN+3Y BGN Curncy': 6.6229, 'CCN+4Y BGN Curncy': 6.7332, 'CCN+5Y BGN Curncy': 6.8305}, 'ASK': {'KWN+1W BGN Curncy': 1193.65, 'KWN+1M BGN Curncy': 1194.46, 'KWN+2M BGN Curncy': 1195.2, 'KWN+3M BGN Curncy': 1195.72, 'KWN+6M BGN Curncy': 1197.06, 'KWN+9M BGN Curncy': 1197.48, 'KWN+12M BGN Curncy': 1197.81, 'KWN+2Y BGN Curncy': 1197.28, 'KWN+3Y BGN Curncy': 1195.0, 'KWN+4Y BGN Curncy': nan, 'KWN+5Y BGN Curncy': 1189.95, 'IRN+1W BGN Curncy': 74.65, 'IRN+1M BGN Curncy': 74.88, 'IRN+2M BGN Curncy': 75.12, 'IRN+3M BGN Curncy': 75.56, 'IRN+6M BGN Curncy': 76.42, 'IRN+9M BGN Curncy': 77.28, 'IRN+12M BGN Curncy': 78.14, 'IRN+2Y BGN Curncy': 81.68, 'IRN+3Y BGN Curncy': nan, 'IRN+4Y BGN Curncy': 89.98, 'IRN+5Y BGN Curncy': 91.99, 'NTN+1W BGN Curncy': 27.606, 'NTN+1M BGN Curncy': 27.533, 'NTN+2M BGN Curncy': 27.472, 'NTN+3M BGN Curncy': 27.402, 'NTN+6M BGN Curncy': 27.204, 'NTN+9M BGN Curncy': 27.014, 'NTN+12M BGN Curncy': 26.829, 'NTN+2Y BGN Curncy': nan, 'NTN+3Y BGN Curncy': nan, 'NTN+4Y BGN Curncy': nan, 'NTN+5Y BGN Curncy': nan, 'IHN+1W BGN Curncy': 14378.0, 'IHN+1M BGN Curncy': 14401.0, 'IHN+2M BGN Curncy': 14439.0, 'IHN+3M BGN Curncy': 14499.7, 'IHN+6M BGN Curncy': 14652.1, 'IHN+9M BGN Curncy': 14803.2, 'IHN+12M BGN Curncy': 14965.0, 'IHN+2Y BGN Curncy': 15545.2, 'IHN+3Y BGN Curncy': nan, 'IHN+4Y BGN Curncy': nan, 'IHN+5Y BGN Curncy': nan, 'PPN+1W BGN Curncy': 51.63, 'PPN+1M BGN Curncy': 51.86, 'PPN+2M BGN Curncy': 52.07, 'PPN+3M BGN Curncy': 52.22, 'PPN+6M BGN Curncy': 52.6, 'PPN+9M BGN Curncy': 52.99, 'PPN+12M BGN Curncy': 53.29, 'PPN+2Y BGN Curncy': 54.4, 'PPN+3Y BGN Curncy': 55.8, 'PPN+4Y BGN Curncy': 57.06, 'PPN+5Y BGN Curncy': 58.1, 'CCN+1W BGN Curncy': 6.366, 'CCN+1M BGN Curncy': 6.3781, 'CCN+2M BGN Curncy': 6.3911, 'CCN+3M BGN Curncy': 6.4026, 'CCN+6M BGN Curncy': 6.433, 'CCN+9M BGN Curncy': 6.4591, 'CCN+12M BGN Curncy': 6.4846, 'CCN+2Y BGN Curncy': 6.5753, 'CCN+3Y BGN Curncy': 6.6441, 'CCN+4Y BGN Curncy': 6.7483, 'CCN+5Y BGN Curncy': 6.8405}, 'MID': {'KWN+1W BGN Curncy': 1193.24, 'KWN+1M BGN Curncy': 1193.96, 'KWN+2M BGN Curncy': 1194.7, 'KWN+3M BGN Curncy': 1195.2, 'KWN+6M BGN Curncy': 1196.4, 'KWN+9M BGN Curncy': 1196.82, 'KWN+12M BGN Curncy': 1196.9, 'KWN+2Y BGN Curncy': 1196.42, 'KWN+3Y BGN Curncy': 1194.5, 'KWN+4Y BGN Curncy': nan, 'KWN+5Y BGN Curncy': 1189.45, 'IRN+1W BGN Curncy': 74.63, 'IRN+1M BGN Curncy': 74.85, 'IRN+2M BGN Curncy': 75.09, 'IRN+3M BGN Curncy': 75.53, 'IRN+6M BGN Curncy': 76.4, 'IRN+9M BGN Curncy': 77.25, 'IRN+12M BGN Curncy': 78.1, 'IRN+2Y BGN Curncy': 81.65, 'IRN+3Y BGN Curncy': nan, 'IRN+4Y BGN Curncy': 88.98, 'IRN+5Y BGN Curncy': 91.82, 'NTN+1W BGN Curncy': 27.591, 'NTN+1M BGN Curncy': 27.525, 'NTN+2M BGN Curncy': 27.457, 'NTN+3M BGN Curncy': 27.387, 'NTN+6M BGN Curncy': 27.189, 'NTN+9M BGN Curncy': 26.997, 'NTN+12M BGN Curncy': 26.806, 'NTN+2Y BGN Curncy': nan, 'NTN+3Y BGN Curncy': nan, 'NTN+4Y BGN Curncy': nan, 'NTN+5Y BGN Curncy': nan, 'IHN+1W BGN Curncy': 14357.9, 'IHN+1M BGN Curncy': 14385.1, 'IHN+2M BGN Curncy': 14428.0, 'IHN+3M BGN Curncy': 14474.2, 'IHN+6M BGN Curncy': 14624.0, 'IHN+9M BGN Curncy': 14753.5, 'IHN+12M BGN Curncy': 14930.5, 'IHN+2Y BGN Curncy': 15525.0, 'IHN+3Y BGN Curncy': nan, 'IHN+4Y BGN Curncy': nan, 'IHN+5Y BGN Curncy': nan, 'PPN+1W BGN Curncy': 51.6, 'PPN+1M BGN Curncy': 51.83, 'PPN+2M BGN Curncy': 52.04, 'PPN+3M BGN Curncy': 52.18, 'PPN+6M BGN Curncy': 52.58, 'PPN+9M BGN Curncy': 52.94, 'PPN+12M BGN Curncy': 53.23, 'PPN+2Y BGN Curncy': 54.36, 'PPN+3Y BGN Curncy': 55.74, 'PPN+4Y BGN Curncy': 56.76, 'PPN+5Y BGN Curncy': 57.91, 'CCN+1W BGN Curncy': 6.3635, 'CCN+1M BGN Curncy': 6.3755, 'CCN+2M BGN Curncy': 6.3882, 'CCN+3M BGN Curncy': 6.4001, 'CCN+6M BGN Curncy': 6.4305, 'CCN+9M BGN Curncy': 6.4566, 'CCN+12M BGN Curncy': 6.4811, 'CCN+2Y BGN Curncy': 6.5703, 'CCN+3Y BGN Curncy': 6.6335, 'CCN+4Y BGN Curncy': 6.7408, 'CCN+5Y BGN Curncy': 6.8355}, 'LAST_BID_TIME_TODAY_REALTIME': {'KWN+1W BGN Curncy': datetime.time(20, 34, 41), 'KWN+1M BGN Curncy': datetime.time(20, 34, 35), 'KWN+2M BGN Curncy': datetime.time(20, 34, 56), 'KWN+3M BGN Curncy': datetime.time(20, 34, 56), 'KWN+6M BGN Curncy': datetime.time(20, 34, 56), 'KWN+9M BGN Curncy': datetime.time(20, 34, 56), 'KWN+12M BGN Curncy': datetime.time(20, 34, 56), 'KWN+2Y BGN Curncy': datetime.time(20, 34, 56), 'KWN+3Y BGN Curncy': datetime.time(20, 34, 34), 'KWN+4Y BGN Curncy': nan, 'KWN+5Y BGN Curncy': datetime.time(20, 31, 31), 'IRN+1W BGN Curncy': datetime.time(19, 50, 20), 'IRN+1M BGN Curncy': datetime.time(20, 34, 49), 'IRN+2M BGN Curncy': datetime.time(20, 34, 48), 'IRN+3M BGN Curncy': datetime.time(20, 34, 48), 'IRN+6M BGN Curncy': datetime.time(20, 34, 48), 'IRN+9M BGN Curncy': datetime.time(20, 34, 43), 'IRN+12M BGN Curncy': datetime.time(20, 34, 48), 'IRN+2Y BGN Curncy': datetime.time(20, 32, 12), 'IRN+3Y BGN Curncy': nan, 'IRN+4Y BGN Curncy': datetime.time(8, 45, 3), 'IRN+5Y BGN Curncy': datetime.time(20, 31, 35), 'NTN+1W BGN Curncy': datetime.time(20, 31, 17), 'NTN+1M BGN Curncy': datetime.time(20, 34, 35), 'NTN+2M BGN Curncy': datetime.time(20, 31, 30), 'NTN+3M BGN Curncy': datetime.time(20, 31, 30), 'NTN+6M BGN Curncy': datetime.time(20, 31, 30), 'NTN+9M BGN Curncy': datetime.time(18, 0, 42), 'NTN+12M BGN Curncy': datetime.time(18, 0, 42), 'NTN+2Y BGN Curncy': nan, 'NTN+3Y BGN Curncy': NaT, 'NTN+4Y BGN Curncy': NaT, 'NTN+5Y BGN Curncy': NaT, 'IHN+1W BGN Curncy': datetime.time(20, 34, 26), 'IHN+1M BGN Curncy': datetime.time(20, 33, 51), 'IHN+2M BGN Curncy': datetime.time(20, 34, 11), 'IHN+3M BGN Curncy': datetime.time(20, 33, 51), 'IHN+6M BGN Curncy': datetime.time(20, 33, 51), 'IHN+9M BGN Curncy': datetime.time(20, 0, 21), 'IHN+12M BGN Curncy': datetime.time(20, 33, 51), 'IHN+2Y BGN Curncy': datetime.time(19, 27, 44), 'IHN+3Y BGN Curncy': nan, 'IHN+4Y BGN Curncy': NaT, 'IHN+5Y BGN Curncy': NaT, 'PPN+1W BGN Curncy': datetime.time(20, 34, 11), 'PPN+1M BGN Curncy': datetime.time(20, 33, 54), 'PPN+2M BGN Curncy': datetime.time(20, 33, 54), 'PPN+3M BGN Curncy': datetime.time(20, 33, 54), 'PPN+6M BGN Curncy': datetime.time(20, 33, 54), 'PPN+9M BGN Curncy': datetime.time(19, 46, 19), 'PPN+12M BGN Curncy': datetime.time(20, 33, 54), 'PPN+2Y BGN Curncy': datetime.time(16, 5, 40), 'PPN+3Y BGN Curncy': datetime.time(20, 34, 56), 'PPN+4Y BGN Curncy': datetime.time(20, 34, 56), 'PPN+5Y BGN Curncy': datetime.time(20, 34, 56), 'CCN+1W BGN Curncy': datetime.time(20, 34, 28), 'CCN+1M BGN Curncy': datetime.time(20, 34, 28), 'CCN+2M BGN Curncy': datetime.time(20, 34, 28), 'CCN+3M BGN Curncy': datetime.time(20, 34, 28), 'CCN+6M BGN Curncy': datetime.time(20, 34, 28), 'CCN+9M BGN Curncy': datetime.time(20, 34, 28), 'CCN+12M BGN Curncy': datetime.time(20, 34, 28), 'CCN+2Y BGN Curncy': datetime.time(20, 32, 13), 'CCN+3Y BGN Curncy': datetime.time(20, 32, 40), 'CCN+4Y BGN Curncy': datetime.time(20, 32, 13), 'CCN+5Y BGN Curncy': datetime.time(20, 23, 29)}, 'SETTLEMENT_DATE_RT': {'KWN+1W BGN Curncy': datetime.datetime(2022, 1, 27, 0, 0), 'KWN+1M BGN Curncy': datetime.datetime(2022, 2, 22, 0, 0), 'KWN+2M BGN Curncy': datetime.datetime(2022, 3, 21, 0, 0), 'KWN+3M BGN Curncy': datetime.datetime(2022, 4, 20, 0, 0), 'KWN+6M BGN Curncy': datetime.datetime(2022, 7, 20, 0, 0), 'KWN+9M BGN Curncy': datetime.datetime(2022, 10, 20, 0, 0), 'KWN+12M BGN Curncy': datetime.datetime(2023, 1, 20, 0, 0), 'KWN+2Y BGN Curncy': datetime.datetime(2024, 1, 22, 0, 0), 'KWN+3Y BGN Curncy': datetime.datetime(2025, 1, 21, 0, 0), 'KWN+4Y BGN Curncy': datetime.datetime(2026, 1, 20, 0, 0), 'KWN+5Y BGN Curncy': datetime.datetime(2027, 1, 20, 0, 0), 'IRN+1W BGN Curncy': datetime.datetime(2022, 1, 27, 0, 0), 'IRN+1M BGN Curncy': datetime.datetime(2022, 2, 22, 0, 0), 'IRN+2M BGN Curncy': datetime.datetime(2022, 3, 21, 0, 0), 'IRN+3M BGN Curncy': datetime.datetime(2022, 4, 20, 0, 0), 'IRN+6M BGN Curncy': datetime.datetime(2022, 7, 20, 0, 0), 'IRN+9M BGN Curncy': datetime.datetime(2022, 10, 20, 0, 0), 'IRN+12M BGN Curncy': datetime.datetime(2023, 1, 20, 0, 0), 'IRN+2Y BGN Curncy': datetime.datetime(2024, 1, 22, 0, 0), 'IRN+3Y BGN Curncy': datetime.datetime(2025, 1, 21, 0, 0), 'IRN+4Y BGN Curncy': datetime.datetime(2026, 1, 20, 0, 0), 'IRN+5Y BGN Curncy': datetime.datetime(2027, 1, 20, 0, 0), 'NTN+1W BGN Curncy': datetime.datetime(2022, 1, 27, 0, 0), 'NTN+1M BGN Curncy': datetime.datetime(2022, 2, 22, 0, 0), 'NTN+2M BGN Curncy': datetime.datetime(2022, 3, 21, 0, 0), 'NTN+3M BGN Curncy': datetime.datetime(2022, 4, 20, 0, 0), 'NTN+6M BGN Curncy': datetime.datetime(2022, 7, 20, 0, 0), 'NTN+9M BGN Curncy': datetime.datetime(2022, 10, 20, 0, 0), 'NTN+12M BGN Curncy': datetime.datetime(2023, 1, 20, 0, 0), 'NTN+2Y BGN Curncy': datetime.datetime(2024, 1, 22, 0, 0), 'NTN+3Y BGN Curncy': datetime.datetime(2025, 1, 21, 0, 0), 'NTN+4Y BGN Curncy': datetime.datetime(2026, 1, 20, 0, 0), 'NTN+5Y BGN Curncy': datetime.datetime(2027, 1, 20, 0, 0), 'IHN+1W BGN Curncy': datetime.datetime(2022, 1, 27, 0, 0), 'IHN+1M BGN Curncy': datetime.datetime(2022, 2, 22, 0, 0), 'IHN+2M BGN Curncy': datetime.datetime(2022, 3, 21, 0, 0), 'IHN+3M BGN Curncy': datetime.datetime(2022, 4, 20, 0, 0), 'IHN+6M BGN Curncy': datetime.datetime(2022, 7, 20, 0, 0), 'IHN+9M BGN Curncy': datetime.datetime(2022, 10, 20, 0, 0), 'IHN+12M BGN Curncy': datetime.datetime(2023, 1, 20, 0, 0), 'IHN+2Y BGN Curncy': datetime.datetime(2024, 1, 22, 0, 0), 'IHN+3Y BGN Curncy': datetime.datetime(2025, 1, 21, 0, 0), 'IHN+4Y BGN Curncy': datetime.datetime(2026, 1, 20, 0, 0), 'IHN+5Y BGN Curncy': datetime.datetime(2027, 1, 20, 0, 0), 'PPN+1W BGN Curncy': datetime.datetime(2022, 1, 26, 0, 0), 'PPN+1M BGN Curncy': datetime.datetime(2022, 2, 22, 0, 0), 'PPN+2M BGN Curncy': datetime.datetime(2022, 3, 21, 0, 0), 'PPN+3M BGN Curncy': datetime.datetime(2022, 4, 19, 0, 0), 'PPN+6M BGN Curncy': datetime.datetime(2022, 7, 19, 0, 0), 'PPN+9M BGN Curncy': datetime.datetime(2022, 10, 19, 0, 0), 'PPN+12M BGN Curncy': datetime.datetime(2023, 1, 19, 0, 0), 'PPN+2Y BGN Curncy': datetime.datetime(2024, 1, 19, 0, 0), 'PPN+3Y BGN Curncy': datetime.datetime(2025, 1, 21, 0, 0), 'PPN+4Y BGN Curncy': datetime.datetime(2026, 1, 20, 0, 0), 'PPN+5Y BGN Curncy': datetime.datetime(2027, 1, 19, 0, 0), 'CCN+1W BGN Curncy': datetime.datetime(2022, 1, 27, 0, 0), 'CCN+1M BGN Curncy': datetime.datetime(2022, 2, 22, 0, 0), 'CCN+2M BGN Curncy': datetime.datetime(2022, 3, 21, 0, 0), 'CCN+3M BGN Curncy': datetime.datetime(2022, 4, 20, 0, 0), 'CCN+6M BGN Curncy': datetime.datetime(2022, 7, 20, 0, 0), 'CCN+9M BGN Curncy': datetime.datetime(2022, 10, 20, 0, 0), 'CCN+12M BGN Curncy': datetime.datetime(2023, 1, 20, 0, 0), 'CCN+2Y BGN Curncy': datetime.datetime(2024, 1, 22, 0, 0), 'CCN+3Y BGN Curncy': datetime.datetime(2025, 1, 21, 0, 0), 'CCN+4Y BGN Curncy': datetime.datetime(2026, 1, 20, 0, 0), 'CCN+5Y BGN Curncy': datetime.datetime(2027, 1, 20, 0, 0)}}

Solution

  • It seems like it should be quicker to groupby and then interpolate. Unfortunately, when I run your code I don't actually get the "filtered interpolated DF" that you list (perhaps you've left out some part of the interpolate where you specify that it should be 15 minute intervals?). You get a slight speedup if you use str.startswith instead of str[:3]:

    %%timeit
    for ccy in ccy_prefix:
      df[df.index.str[:3]==ccy] = df[df.index.str[:3]==ccy].interpolate(limit_direction='forward')
    % 10 loops, best of 5: 25.9 ms per loop
    

    As opposed to:

    %%timeit
    for ccy in ccy_prefix:
      df[df.index.str.startswith(ccy)] = df[df.index.str.startswith(ccy)].interpolate(limit_direction='forward')
    % 10 loops, best of 5: 24.1 ms per loop
    

    Perhaps a better solution is to create a new column with the currency prefixes and then groupby and interpolate, going from a comment provided here.

    df['ccy_prefix'] = df.index.str[:3]
    
    def interpolator(df):
      return(df.interpolate(limit_direction='forward'))
    

    Then this should be quickest of them all:

    df = df.groupby('ccy_prefix').apply(interpolator)