Search code examples
pythondataframemergemultiple-columns

Python Dataframes merge multi match


I'm new with Dataframe. I would like to kwon how (if possible) can I merge 2 Dataframes with multiple match For example

[df1]

date                  ZipCode Weather
2022-11-25 00:00:00   123456   34
2022-11-25 00:00:15   123456   35
2022-11-25 00:00:30   123456   36
[df2]
date                  ZipCode   host
2022-11-25 00:00:00   123456   host1
2022-11-25 00:00:00   123456   host2
2022-11-25 00:00:00   123456   host3
2022-11-25 00:00:15   123456   host1
2022-11-25 00:00:30   123456   host2
2022-11-25 00:00:30   123456   host3

Expected results:

date                  ZipCode   host   Weather
2022-11-25 00:00:00   123456   host1   34
2022-11-25 00:00:00   123456   host2   34
2022-11-25 00:00:00   123456   host3   34
2022-11-25 00:00:15   123456   host1   35
2022-11-25 00:00:30   123456   host2   36
2022-11-25 00:00:30   123456   host3   36

My objetive is assign weather measures to each host. I have weather measurements every 15 minutes for one ZipCode (One line) By the other hand, I have several host KPIs for one time and one ZipCode (multiples lines)

Can I perfomr this activity with Dataframes?

Thanks in advance!


Solution

  • You could use the join function in pandas which joins one dataframe's index to the index of the other. Try something like

    import pandas as pd
    
    data1 = \
    [['2022-11-25 00:00:00',  123456, 34],
    ['2022-11-25 00:00:15',  123456, 35],
    ['2022-11-25 00:00:30',  123456, 36]]
    
    columns1 =['date', 'ZipCode', 'Weather']
    
    data2 = \
    [['2022-11-25 00:00:00',  123456, 'host1'],
    ['2022-11-25 00:00:00',  123456, 'host2'],
    ['2022-11-25 00:00:00',  123456, 'host3'],
    ['2022-11-25 00:00:15',  123456, 'host1'],
    ['2022-11-25 00:00:30',  123456, 'host2'],
    ['2022-11-25 00:00:30',  123456, 'host3']]
    
    columns2 =['date', 'ZipCode', 'host']
    
    df1 = pd.DataFrame(data=data1, columns=columns1)
    df1.date = pd.to_datetime(df1.date)
    df1.set_index('date', inplace=True)
    df2 = pd.DataFrame(data=data2, columns=columns2)
    df2.date = pd.to_datetime(df2.date)
    df2.set_index('date', inplace=True)
    df3 = df1.join(df2['host'], on='date')
    df3