python python-3.x pandas algorithm cluster-analysis

How to create clusters/groups from knowing associations?

I have a dataframe that has 2 columns: [ID, ASSOCIATED_ID] For each ID, I have a list of other associated IDS from the dataframe. Here is a synthesized version of it:

ID            ASSOCIATED_ID
1             [2,3]
2             [1,4]
3             [1]
4             [2]
5             []

If I want to create clusters (groups) of IDs that are associated to each other (not necessary that they have a direct association but even if there is any transitive association). How can I do that programmatically?

Solution

IIUC,you can use networkx and connect_components:

df_e = df.explode('ASSOCIATED_ID')

G = nx.from_pandas_edgelist(df_e, 'ID','ASSOCIATED_ID')

[i for i in nx.connected_components(G)]

Output:

[{1, 2, 3, 4}, {nan, 5}]

Unexpected list append
Force matrix_world to be recalculated in Blender
SQLAlchemy and empty columns
ValueError: time data '24:00' does not match format '%H:%M'
Convert RDD of LabeledPoint to DataFrame toDF() Error
How to cancel trigonometric expressions in SymPy
Get view used in Django tests
Precompiled sasl python 3.9+ package for windows
Regex: Substitute pattern in string multiple times without leftovers
How to render raw html in the PyHTML library
Why does my implementation of trilateration give wrong results?
Django admin: how to sort by one of the custom list_display fields that has no database field
TypeError: not all arguments converted during string formatting - psycopg2
Is there a Python equivalent of the C# null-coalescing operator?
Kraken API - Account balances request returning Invalid Nonce
configparser without whitespace surrounding operator
Pytorch tensor to numpy array
Django: How to get a person whose birthday is today from a database?
Performance impact of inheriting from many classes
How can I do a line break (line continuation) in Python (split up a long line of source code)?
Using pydantic to change int to string
Breaking long method chains into multiple lines in Python
What do ** (double star/asterisk) and * (star/asterisk) mean in a function call?
How to install Pygame on Python 3.4?
Rotating values in a list [Python]
Launch default image viewer from pygtk program
what's the inverse of the quantile function on a pandas Series?
How can I install packages using pip according to the requirements.txt file from a local directory?
Python generate all n-permutations of n lists
FastAPI error when handling file together with form-data defined in a Pydantic model