I'm new with Dataframe. I would like to kwon how (if possible) can I merge 2 Dataframes with multiple match For example
[df1]
date ZipCode Weather
2022-11-25 00:00:00 123456 34
2022-11-25 00:00:15 123456 35
2022-11-25 00:00:30 123456 36
[df2]
date ZipCode host
2022-11-25 00:00:00 123456 host1
2022-11-25 00:00:00 123456 host2
2022-11-25 00:00:00 123456 host3
2022-11-25 00:00:15 123456 host1
2022-11-25 00:00:30 123456 host2
2022-11-25 00:00:30 123456 host3
Expected results:
date ZipCode host Weather
2022-11-25 00:00:00 123456 host1 34
2022-11-25 00:00:00 123456 host2 34
2022-11-25 00:00:00 123456 host3 34
2022-11-25 00:00:15 123456 host1 35
2022-11-25 00:00:30 123456 host2 36
2022-11-25 00:00:30 123456 host3 36
My objetive is assign weather measures to each host. I have weather measurements every 15 minutes for one ZipCode (One line) By the other hand, I have several host KPIs for one time and one ZipCode (multiples lines)
Can I perfomr this activity with Dataframes?
Thanks in advance!
You could use the join function in pandas which joins one dataframe's index to the index of the other. Try something like
import pandas as pd
data1 = \
[['2022-11-25 00:00:00', 123456, 34],
['2022-11-25 00:00:15', 123456, 35],
['2022-11-25 00:00:30', 123456, 36]]
columns1 =['date', 'ZipCode', 'Weather']
data2 = \
[['2022-11-25 00:00:00', 123456, 'host1'],
['2022-11-25 00:00:00', 123456, 'host2'],
['2022-11-25 00:00:00', 123456, 'host3'],
['2022-11-25 00:00:15', 123456, 'host1'],
['2022-11-25 00:00:30', 123456, 'host2'],
['2022-11-25 00:00:30', 123456, 'host3']]
columns2 =['date', 'ZipCode', 'host']
df1 = pd.DataFrame(data=data1, columns=columns1)
df1.date = pd.to_datetime(df1.date)
df1.set_index('date', inplace=True)
df2 = pd.DataFrame(data=data2, columns=columns2)
df2.date = pd.to_datetime(df2.date)
df2.set_index('date', inplace=True)
df3 = df1.join(df2['host'], on='date')
df3