Search code examples
pythonpandasdataframegroup-by

Collect common groups on non-index column across two dataframes


Here are two dataframes grouped how I want them:

last5s = pd.Timestamp.now().replace(microsecond=0) - pd.Timedelta('5s')
dates = pd.date_range(last5s, periods = 5, freq='s')

N=10
data1 = np.random.randint(0,10,N)
data2 = np.random.randint(0,10,N)

df1 = pd.DataFrame({'timestamp': np.random.choice(dates, size=N), 'A': data1})
df2 = pd.DataFrame({'timestamp': np.random.choice(dates, size=N), 'B': data2})

print(df1)
print(df2)
print()

g1 = df1.groupby(pd.Grouper(key='timestamp', freq='1s'))
print("g1:")
for time, group in g1:
    print('time:', time)
    print(group)
    print()
    
print()
g2 = df2.groupby(pd.Grouper(key='timestamp', freq='1s'))
print('g2:')
for time, group in g2:
    print('time:', time)
    print(group)
    print()

Output (e.g.):

            timestamp  A
0 2024-03-01 10:05:26  7
1 2024-03-01 10:05:25  8
2 2024-03-01 10:05:28  1
3 2024-03-01 10:05:24  2
4 2024-03-01 10:05:28  5
5 2024-03-01 10:05:27  4
6 2024-03-01 10:05:24  6
7 2024-03-01 10:05:26  3
8 2024-03-01 10:05:26  8
9 2024-03-01 10:05:28  8
            timestamp  B
0 2024-03-01 10:05:25  1
1 2024-03-01 10:05:26  6
2 2024-03-01 10:05:25  5
3 2024-03-01 10:05:28  7
4 2024-03-01 10:05:27  7
5 2024-03-01 10:05:28  1
6 2024-03-01 10:05:28  4
7 2024-03-01 10:05:25  0
8 2024-03-01 10:05:24  6
9 2024-03-01 10:05:24  5

g1:
time: 2024-03-01 10:05:24
            timestamp  A
3 2024-03-01 10:05:24  2
6 2024-03-01 10:05:24  6

time: 2024-03-01 10:05:25
            timestamp  A
1 2024-03-01 10:05:25  8

time: 2024-03-01 10:05:26
            timestamp  A
0 2024-03-01 10:05:26  7
7 2024-03-01 10:05:26  3
8 2024-03-01 10:05:26  8

time: 2024-03-01 10:05:27
            timestamp  A
5 2024-03-01 10:05:27  4

time: 2024-03-01 10:05:28
            timestamp  A
2 2024-03-01 10:05:28  1
4 2024-03-01 10:05:28  5
9 2024-03-01 10:05:28  8


g2:
time: 2024-03-01 10:05:24
            timestamp  B
8 2024-03-01 10:05:24  6
9 2024-03-01 10:05:24  5

time: 2024-03-01 10:05:25
            timestamp  B
0 2024-03-01 10:05:25  1
2 2024-03-01 10:05:25  5
7 2024-03-01 10:05:25  0

time: 2024-03-01 10:05:26
            timestamp  B
1 2024-03-01 10:05:26  6

time: 2024-03-01 10:05:27
            timestamp  B
4 2024-03-01 10:05:27  7

time: 2024-03-01 10:05:28
            timestamp  B
3 2024-03-01 10:05:28  7
5 2024-03-01 10:05:28  1
6 2024-03-01 10:05:28  4

How do I "join" the groups together such that I can iterate over them together? E.g. I want to be able to do:

for time, group1, group2 in somehow_joined(g1,g2):
    <do stuff with group1 and group2 in this common time group>

Solution

  • You can just do:

    for t, d1 in g1:
        d2 = g2.get_group(t)
        if d2 is None:
            print("I don't want this")
            continue
    
        print(d1)
        print('-'*10)
        print(d2)
        print('='*30)
    

    Output:

                timestamp  A
    0 2024-02-29 19:10:14  0
    3 2024-02-29 19:10:14  7
    7 2024-02-29 19:10:14  1
    ----------
                timestamp  B
    1 2024-02-29 19:10:14  0
    3 2024-02-29 19:10:14  6
    5 2024-02-29 19:10:14  9
    6 2024-02-29 19:10:14  4
    ==============================
                timestamp  A
    2 2024-02-29 19:10:15  2
    5 2024-02-29 19:10:15  8
    6 2024-02-29 19:10:15  2
    9 2024-02-29 19:10:15  6
    ----------
                timestamp  B
    8 2024-02-29 19:10:15  9
    ==============================
                timestamp  A
    1 2024-02-29 19:10:16  3
    4 2024-02-29 19:10:16  9
    8 2024-02-29 19:10:16  6
    ----------
                timestamp  B
    2 2024-02-29 19:10:16  6
    4 2024-02-29 19:10:16  6
    ==============================