How to use IF NOT IN in pandas groupby object?

I have such a dataframe:

import pandas as pd
import numpy as np
# create a sample DataFrame
data = {'ID': [1, 1, 1, 2, 2, 2],
        'timestamp': ['2022-01-01 12:00:00', '2022-01-01 13:00:00', '2022-01-01 18:00:00',
                      '2022-01-01 12:02:00', '2022-01-01 13:02:00', '2022-01-01 18:02:00'],
        'value1': [10, 20, 30, 40, 50, 60],
        'gender': ['M', 'M', 'F', 'F', 'F', 'M'],
        'age': [20, 25, 30, 35, 40, 45]}
df = pd.DataFrame(data)

# extract the date from the timestamp column
df['date'] = pd.to_datetime(df['timestamp']).dt.date

I would like for this dataframe, to get timestamp values and enumerate them. Then, I will take a single value of a timestamp and check in groupby object if it exists or not. If it does not exist, I will append it. Here is my approach:

for indx, single_date in enumerate(df.timestamp):
    #print(single_date)
    if df.timestamp[indx] not in df.groupby(['ID'],as_index=False):
        df2 = pd.DataFrame([[df.ID[indx],df.timestamp[indx],np.nan,df.gender[indx],df.age[indx]]],
                           columns=['ID', 'timestamp', 'value1', 'gender', 'age'])
        #print(df2)
        df2['timestamp'] = pd.to_datetime(df2['timestamp'])
        new_ckd = df.groupby(['ID']).apply(lambda y: pd.concat([y, df2]))
new_ckd['timestamp'] = pd.to_datetime(new_ckd['timestamp'])
new_ckd = new_ckd.sort_values(by=['timestamp'], ascending=True).reset_index(drop=True)
#print(new_ckd)
    #print(df.ID[indx])
print(df.groupby(['ID'],as_index=False).timestamp.apply(print))
for indx, single_date in enumerate(df.timestamp):
    #print(df.timestamp[indx])
    if df.timestamp[indx] in df.groupby(['ID'],as_index=False).timestamp:
        print('a')

I realized that IF NOT IN condition on groupby object does not work. How can I make it work?

What I have:

ID	value1	timestamp	gender	age
1	50	2022-01-01 12:00:00	m	7
1	80	2022-01-01 12:30:00	m	7
1	65	2022-01-01 13:00:00	m	7
2	65	2022-01-01 12:02:00	f	8
2	83	2022-01-01 12:22:00	f	8
2	63	2022-01-01 12:42:00	f	8

What I expect:

ID	value1	timestamp	gender	age
1	50	2022-01-01 12:00:00	m	7
1	NaN	2022-01-01 12:02:00	m	7
1	NaN	2022-01-01 12:22:00	m	7
1	80	2022-01-01 12:30:00	m	7
1	NaN	2022-01-01 12:42:00	m	7
1	65	2022-01-01 13:00:00	m	7
2	NaN	2022-01-01 12:00:00	f	8
2	65	2022-01-01 12:02:00	f	8
2	83	2022-01-01 12:22:00	f	8
2	NaN	2022-01-01 12:30:00	f	8
2	63	2022-01-01 12:42:00	f	8
2	NaN	2022-01-01 13:00:00	f	8

Solution

You can reimagine your task as: add missing dates to every unique ID based on all dates present in the dataframe and fill NaNs in the result.

This can be achieved for example with some magic using reindexing via multiindex and then filling the resulted NaNs:

data = {'ID': [1, 1, 1, 2, 2, 2],
        'timestamp': ['2022-01-01 12:00:00', '2022-01-01 13:00:00', '2022-01-01 18:00:00',
                      '2022-01-01 12:02:00', '2022-01-01 13:02:00', '2022-01-01 18:02:00'],
        'value1': [10, 20, 30, 40, 50, 60],
        'gender': ['M', 'M', 'F', 'F', 'F', 'M'],
        'age': [20, 25, 30, 35, 40, 45]}
df = pd.DataFrame(data)

# cross apply to build index 
cross = df[['ID']].drop_duplicates().merge(df[['timestamp']].drop_duplicates(), how = 'cross')
multiIdx = pd.MultiIndex.from_frame(cross)

# "add" missing rows
df = df.set_index(['ID', 'timestamp']) \
        .reindex(multiIdx, fill_value=np.nan) \
        .reset_index() \
        .sort_values(by=['ID', 'timestamp'], ignore_index=True)

# fill NaNs
df[['gender', 'age']] = df.groupby('ID')[['gender', 'age']].ffill().bfill()

UPD

If you have non-unique entries (based on ID + timestamp pair) you can use left merge:

cross = ...
df = cross.merge(df, on=['ID', 'timestamp'], how='left').sort_values(by=['ID', 'timestamp'],ignore_index=True)
df[['gender', 'age']] = df.groupby('ID')[['gender', 'age']].ffill().bfill()