I am trying to merge 2 Dataframes: df and relations_df
df: (df's Index is post's ID)
Example df:
Post similar_posts
----- --------
Head-1 [123046, 1915, 111357]
Head-2 [654, 239]
Example relations_df:
id related_id keywords
--- --------- --------
17 123046 ['key1', 'key2']
17 1915 ['key1', 'key2', 'key6']
17 111357 ['key3', 'key4']
18 654 ['key4', 'key6', 'key1']
18 239 ['key5', 'key2']
Expected Output: (df's index should persist)
Title similar_posts keywords
----- -------- --------
Head-1 [123046, 1915, 111357] [['key1', 'key2'], ['key1', 'key2', 'key6'], ['key3', 'key4']]
Head-2 [654, 239] [['key4', 'key6', 'key1'], ['key5', 'key2']]
I am not very knowledgeable with pandas yet, so I tried to ask ChatGPT and failed to get an acceptable answer...
This is my code as of now:
def addkeywords(args):
df, relations_df = args
# Merge relations_df with df on 'id' and 'related_id'
merged_df = pd.merge(df, relations_df, left_on='id', right_on='related_id', suffixes=('_left', '_right'), how='left')
# Group by 'id' and aggregate 'keywords' into lists
grouped_df = merged_df.groupby('id', as_index=False)['keywords'].agg(list)
# grouped_df.rename(columns={'keywords_right': 'keywords'}, inplace=True)
# Merge back to the original DataFrame
df = pd.merge(df, grouped_df, on='id', how='left')
# Fill NaN values in 'keywords' column with empty lists
df['keywords'].fillna(value='', inplace=True)
return df
The index of 'df' Dataframe should be the same (As the Index represents the post ID in df)
The index of 'relations_df' is autogenerated, so it can change.
Can anyone help me with this please?
# Group by 'id' and aggregate 'related_id' and 'keywords'
df_grouped = relations_df.groupby('id').agg({'related_id': list, 'keywords': list}).reset_index()
df_grouped.set_index('id', inplace=True)
# Merge on 'id' of relations_df and 'index' of df
merged_df = pd.merge(df, df_grouped, left_index=True, right_index=True)
I managed to get the desired output using the aboove code.