I have a dataframe that has 2 columns: [ID, ASSOCIATED_ID] For each ID, I have a list of other associated IDS from the dataframe. Here is a synthesized version of it:
ID ASSOCIATED_ID
1 [2,3]
2 [1,4]
3 [1]
4 [2]
5 []
If I want to create clusters (groups) of IDs that are associated to each other (not necessary that they have a direct association but even if there is any transitive association). How can I do that programmatically?
IIUC,you can use networkx and connect_components:
df_e = df.explode('ASSOCIATED_ID')
G = nx.from_pandas_edgelist(df_e, 'ID','ASSOCIATED_ID')
[i for i in nx.connected_components(G)]
Output:
[{1, 2, 3, 4}, {nan, 5}]