I have a time series dataset of product given below:
date product price amount
11/17/2019 A 10 20
11/19/2019 A 15 20
11/24/2019 A 20 30
12/01/2019 C 40 50
12/05/2019 C 45 35
This data has a missing days ("MM/dd/YYYY") between the start and end date of data for each product. I am trying to fill missing date with zero rows and convert to previous table into a table given below:
date product price amount
11/17/2019 A 10 20
11/18/2019 A 0 0
11/19/2019 A 15 20
11/20/2019 A 0 0
11/21/2019 A 0 0
11/22/2019 A 0 0
11/23/2019 A 0 0
11/24/2019 A 20 30
12/01/2019 C 40 50
12/02/2019 C 0 0
12/03/2019 C 0 0
12/04/2019 C 0 0
12/05/2019 C 45 35
To get this conversion, I used the code:
import pandas as pd
import numpy as np
data=pd.read_csv("test.txt", sep="\t", parse_dates=['date'])
data=data.set_index(["date", "product"])
start=data.first_valid_index()[0]
end=data.last_valid_index()[0]
df=data.set_index("date").reindex(pd.date_range(start,end, freq="1D"), fill_values=0)
However the code gives an error. Is there any way to get this conversion efficiently?
If need add 0
for missing Datetimes for each product
separately use custom function in GroupBy.apply
with DataFrame.reindex
by minimal and maximal datetime:
df = pd.read_csv("test.txt", sep="\t", parse_dates=['date'])
f = lambda x: x.reindex(pd.date_range(x.index.min(),
x.index.max(), name='date'), fill_value=0)
df = (df.set_index('date')
.groupby('product')
.apply(f)
.drop('product', axis=1)
.reset_index())
print (df)
product date price amount
0 A 2019-11-17 10 20
1 A 2019-11-18 0 0
2 A 2019-11-19 15 20
3 A 2019-11-20 0 0
4 A 2019-11-21 0 0
5 A 2019-11-22 0 0
6 A 2019-11-23 0 0
7 A 2019-11-24 20 30
8 C 2019-12-01 40 50
9 C 2019-12-02 0 0
10 C 2019-12-03 0 0
11 C 2019-12-04 0 0
12 C 2019-12-05 45 35