Group dataframe rows with common vales

I have the following dataset:

    0   1   2   3
0   a   ❤   💛  👍
1   b   ❤   👍  🙏
2   c   😉  🙏  👍
3   d   😉  ✨   💪
4   e   ❤   😉  🙏

I would like to perform clustering to group the ROWS which have something in common.

By using networkx in the following code, this is the result:

import networkx as nx
import matplotlib.pyplot as plt

G=nx.from_pandas_edgelist(df, 0, 1)
nx.draw(G, with_labels=True)
plt.show()

output: groups obtained with networkx

How can I also consider columns 2 and 3? Can I also do it without giving any priority to any particular column (example, I want column 2 to be equally important as column 1)?

Solution

Similarly to this answer, you could have each dataframe raw be a path, and look for the connected components. I've added a row without any common values with any other rows to better illustrate how this works:

print(df)
   0  1   2    3
0  a  ❤  💛  👍
1  b  ❤  👍  🙏
2  c  😉  🙏  👍
3  d  😉  ✨  💪
4  e  ❤  😉  🙏
5  f  👅  😱  🤑

So iterate over the dataframe rows, and add them as paths with nx.add_path:

my_list = df.values.tolist()
G=nx.Graph()
for path in my_list:
    nx.add_path(G, path)
components = list(nx.connected_components(G))

print(components)
[{'a', 'b', 'c', 'd', 'e', '✨', '❤', '👍', '💛', '💪', '😉', '🙏'},
 {'f', '👅', '😱', '🤑'}]

And now you can traverse the groups, and add each row to a new sublist in a nested list if it is a subset of the component:

groups = []
for component in components:
    group = []
    for path in my_list:
        if component.issuperset(path):
            group.append(path)
    groups.append(group)

In this case you'd have all rows except for the last grouped together, and the last in another gruop.

print(groups)

[[['a', '❤', '💛', '👍'],
  ['b', '❤', '👍', '🙏'],
  ['c', '😉', '🙏', '👍'],
  ['d', '😉', '✨', '💪'],
  ['e', '❤', '😉', '🙏']],
 [['f', '👅', '😱', '🤑']]]