Search code examples
python-3.xpandasdataframetime-seriesmissing-data

how to add missing rows of time series data to panda dataframes in python


I have a time series dataset of product given below:

date    product price   amount
11/17/2019  A   10  20
11/19/2019  A   15  20
11/24/2019  A   20  30
12/01/2019  C   40  50
12/05/2019  C   45  35

This data has a missing days ("MM/dd/YYYY") between the start and end date of data for each product. I am trying to fill missing date with zero rows and convert to previous table into a table given below:

date    product price   amount
11/17/2019  A   10  20
11/18/2019  A   0   0
11/19/2019  A   15  20
11/20/2019  A   0   0
11/21/2019  A   0   0
11/22/2019  A   0   0
11/23/2019  A   0   0
11/24/2019  A   20  30
12/01/2019  C   40  50
12/02/2019  C   0   0
12/03/2019  C   0   0
12/04/2019  C   0   0
12/05/2019  C   45  35

To get this conversion, I used the code:

import pandas as pd
import numpy as np
data=pd.read_csv("test.txt", sep="\t", parse_dates=['date'])
data=data.set_index(["date", "product"])
start=data.first_valid_index()[0]
end=data.last_valid_index()[0]
df=data.set_index("date").reindex(pd.date_range(start,end, freq="1D"), fill_values=0)

However the code gives an error. Is there any way to get this conversion efficiently?


Solution

  • If need add 0 for missing Datetimes for each product separately use custom function in GroupBy.apply with DataFrame.reindex by minimal and maximal datetime:

    df = pd.read_csv("test.txt", sep="\t", parse_dates=['date'])
    
    f = lambda x: x.reindex(pd.date_range(x.index.min(), 
                                          x.index.max(), name='date'), fill_value=0)
    df = (df.set_index('date')
            .groupby('product')
            .apply(f)
            .drop('product', axis=1)
            .reset_index())
    print (df)
       product       date  price  amount
    0        A 2019-11-17     10      20
    1        A 2019-11-18      0       0
    2        A 2019-11-19     15      20
    3        A 2019-11-20      0       0
    4        A 2019-11-21      0       0
    5        A 2019-11-22      0       0
    6        A 2019-11-23      0       0
    7        A 2019-11-24     20      30
    8        C 2019-12-01     40      50
    9        C 2019-12-02      0       0
    10       C 2019-12-03      0       0
    11       C 2019-12-04      0       0
    12       C 2019-12-05     45      35