I want to create a multi-index based on Dates and Times from my DataFrame - df.
This actually works and indexed
returns the correct Multiindex (continuous dates from 02/10/2017 until 31/03/2018 with continuous times from 08:00:00 - 21:55:00 in 5 Min Intervals).
The problem is that my dataset doesn't have complete observations. Means, for a random day there could be missing observations for 08:05:00 until 08:40:00. My Multiindex has a length of 20496, while my DataFrame has a length of 20486.
When I want to merge the index with my DataFrame
all observations are NAN
. Does someone now how to merge the multi index with my initial DataFrame - df
?
df = pd.read_csv(file_name, parse_dates=[0],
index_col=0, sep=',')
df['Date'] = df.index.date
df['Time'] = df.index.time
df['Time'] = df['Time'].astype(str)
df = df[df['Time'] != '22:00:00']
list_date = set(df['Date'])
list_time = set(df['Time'])
list_date = sorted(list_date)
list_time = sorted(list_time)
iterables = [list_date, list_time]
indexed = pd.MultiIndex.from_product(iterables,
names=['date', 'time'])
df_new = pd.DataFrame(df, index=indexed)
df
output:
r2 var_v2x Date Time
TIME1
2017-10-02 08:00:00 NaN 0.00008784 2017-10-02 08:00:00
2017-10-02 08:05:00 0.00000000 0.00008784 2017-10-02 08:05:00
2017-10-02 08:10:00 0.00000008 0.00008784 2017-10-02 08:10:00
Found a solution (probably not an elegant one).
df = pd.read_csv(file_name, parse_dates=[0], index_col=0, sep=',')
df['Date'] = df.index.date
df['Time'] = df.index.time
df['Time'] = df['Time'].astype(str)
df = df[df['Time'] != '22:00:00']
list_date = set(df['Date'])
list_time = set(df['Time'])
list_date = sorted(list_date)
list_time = sorted(list_time)
iterables = [list_date, list_time]
indexed = pd.MultiIndex.from_product(iterables, names=['date', 'time'])
df_index_date = indexed.get_level_values(0)
df_index_time = indexed.get_level_values(1)
df_joined = pd.DataFrame(df_index_date.astype(str) + ' ' + df_index_time.astype(str))
df_joined = df_joined.reset_index()
df_joined = df_joined.set_index(df_joined[0])
del df_joined['index']
del df_joined[0]
df_final = df_joined.join(df)
df_final = df_final.reset_index(drop=True)
df_final = df_final.set_index(indexed)