Search code examples
pythonpandasgroup-by

count unique combination in one column using Groupby


i have a dataframe which i'm trying to create new columns showing the occurrence of different combinations within different groups. Solutions I've found are all combinations of values across 2 or more columns instead of one. Therefore, is hoping somebody can help.

sample df:

╔════╦═════╗
║ id ║ tag ║
╠════╬═════╣
║ a  ║   1 ║
║ a  ║   1 ║
║ a  ║   2 ║
║ a  ║   2 ║
║ a  ║   3 ║
║ a  ║   3 ║
║ b  ║   2 ║
║ b  ║   2 ║
║ b  ║   2 ║
║ b  ║   3 ║
║ b  ║   3 ║
║ b  ║   3 ║
╚════╩═════╝

output hope to get:

╔════╦═════╦═════╦═════╦═════╦═════╦═════╦═════╗
║ id ║ tag ║ 1,1 ║ 1,2 ║ 1,3 ║ 2,2 ║ 2,3 ║ 3,3 ║
╠════╬═════╬═════╬═════╬═════╬═════╬═════╬═════╣
║ a  ║   1 ║   1 ║   4 ║   4 ║   1 ║   4 ║   1 ║
║ a  ║   1 ║   1 ║   4 ║   4 ║   1 ║   4 ║   1 ║
║ a  ║   2 ║   1 ║   4 ║   4 ║   1 ║   4 ║   1 ║
║ a  ║   2 ║   1 ║   4 ║   4 ║   1 ║   4 ║   1 ║
║ a  ║   3 ║   1 ║   4 ║   4 ║   1 ║   4 ║   1 ║
║ a  ║   3 ║   1 ║   4 ║   4 ║   1 ║   4 ║   1 ║
║ b  ║   2 ║   0 ║   0 ║   0 ║   3 ║   9 ║   3 ║
║ b  ║   2 ║   0 ║   0 ║   0 ║   3 ║   9 ║   3 ║
║ b  ║   2 ║   0 ║   0 ║   0 ║   3 ║   9 ║   3 ║
║ b  ║   3 ║   0 ║   0 ║   0 ║   3 ║   9 ║   3 ║
║ b  ║   3 ║   0 ║   0 ║   0 ║   3 ║   9 ║   3 ║
║ b  ║   3 ║   0 ║   0 ║   0 ║   3 ║   9 ║   3 ║
╚════╩═════╩═════╩═════╩═════╩═════╩═════╩═════╝

sample df code:

data = {
    "id": ['a', 'a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b', 'b'],
    "tag": [1, 1, 2, 2, 3, 3, 2, 2, 2, 3, 3, 3]}

df = pd.DataFrame(data)

for clarification: "col "x,y" is the combinations of the tag values grouped by the id" as mentioned by @Chrysophylaxs (thanks).

kindly advise


Solution

  • I got the answer here:

    data = {
        "id": ['a', 'a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b', 'b'],
        "tag": [1, 1, 2, 2, 3, 3, 2, 2, 2, 3, 3, 3]}
    
    df = pd.DataFrame(data)
    
    from itertools import combinations
    
    df['combinations'] = df.groupby(['id']).transform(lambda x: str(list(combinations(x.to_list(), 2))))
    df['combinations'] = df['combinations'].apply(lambda x: x.replace('[', '').replace(']', '').replace('),', '*').replace(', ','-').replace('*',',').replace('(','').replace(')','').replace(' ','')).str.split(',')
    df2 = df.drop_duplicates(['combinations'])
    
    x = df2.explode('combinations')
    x = x.drop('tag', axis=1).groupby(['id', 'combinations']).value_counts().unstack().reset_index().fillna(0)
    df.merge(x, on='id', how='left').drop('combinations', axis=1)
    

    Output:

        id  tag 1-1 1-2 1-3 2-2 2-3 3-3
    0   a   1   1.0 4.0 4.0 1.0 4.0 1.0
    1   a   1   1.0 4.0 4.0 1.0 4.0 1.0
    2   a   2   1.0 4.0 4.0 1.0 4.0 1.0
    3   a   2   1.0 4.0 4.0 1.0 4.0 1.0
    4   a   3   1.0 4.0 4.0 1.0 4.0 1.0
    5   a   3   1.0 4.0 4.0 1.0 4.0 1.0
    6   b   2   0.0 0.0 0.0 3.0 9.0 3.0
    7   b   2   0.0 0.0 0.0 3.0 9.0 3.0
    8   b   2   0.0 0.0 0.0 3.0 9.0 3.0
    9   b   3   0.0 0.0 0.0 3.0 9.0 3.0
    10  b   3   0.0 0.0 0.0 3.0 9.0 3.0
    11  b   3   0.0 0.0 0.0 3.0 9.0 3.0