Search code examples
pythonpandasmissing-datalevels

Python - Pandas - Fill missing data based on existing Levels


  • I use pandas and get my data from a SQL database
  • I have two tickers. One is a U.S stock, the other an European stock. Dates are not necessarily the same for both stocks (holidays, etc).
  • All my data is stored in a multi-index DataFrame.
  • Looking to fill missing values based on levels

Running the below code:

import pandas as pd
import datetime
ticker_date = [('US',datetime.date.today()-datetime.timedelta(3)),
('US',datetime.date.today()-datetime.timedelta(2)),
('US',datetime.date.today()-datetime.timedelta(1)),
('EU',datetime.date.today()-datetime.timedelta(3)),
('EU',datetime.date.today()-datetime.timedelta(1))]
index_df = pd.MultiIndex.from_tuples(ticker_date)
example = pd.DataFrame([12.2,12.5,12.6,15.1,15],index_df,['value'])

Output:

Output from code above

enter image description here

I am looking for a method to reshape my output filling the missing data with the previous value:

Objective: add a dec 11th line and fill with previous value

enter image description here


Solution

  • I'd do it this way:

    In [24]: idx = pd.MultiIndex.from_product((
                       example.index.get_level_values(0).unique(), 
                       example.index.get_level_values(1).unique()))
    
    In [25]: example = example.reindex(idx).ffill()
    
    In [26]: example
    Out[26]:
                   value
    US 2017-12-10   12.2
       2017-12-11   12.5
       2017-12-12   12.6
    EU 2017-12-10   15.1
       2017-12-11   15.1
       2017-12-12   15.0