I have a dataframe in like below
id | group | log |
---|---|---|
10 | UU1Q | 23 |
10 | UU1Q | 12 |
10 | UU2Q | 15 |
11 | UU2Q | 17 |
11 | UU3Q | 35.6 |
11 | UU1Q | 29.8 |
11 | UU1Q | 33 |
11 | UU1Q | 44 |
13 | UU2Q | 17.77 |
13 | UU2Q | 19.90 |
13 | UU2Q | 55 |
14 | UU3Q | 33 |
15 | UU3Q | 22 |
For Each ID and group I want to create all possible combination of log values present in the dataframe in a new column. Desired Output
id | group | log | new_col |
---|---|---|---|
10 | UU1Q | 23 | (23,23) |
10 | UU1Q | 12 | (23,12) |
10 | UU2Q | 15 | (15,15) |
11 | UU2Q | 17 | (17,17) |
11 | UU3Q | 35.6 | (35.6,35.6) |
11 | UU1Q | 29.8 | (29.8, 29.8) |
11 | UU1Q | 33 | (29.8,33) |
11 | UU1Q | 44 | (29.8,44) |
11 | UU1Q | (33,44) | |
13 | UU2Q | 17.77 | (17.77,17.77) |
13 | UU2Q | 19.90 | (17.77,19.90) |
13 | UU2Q | 55 | (17.77,55) |
13 | UU2Q | (19.90,55) | |
14 | UU3Q | 33 | (33,33) |
15 | UU3Q | 22 | (22,22) |
I used shift function but it is only generating combinations with next matching cell. I want to get all the possible combination in each group.
dummy['new'] = dummy.groupby(['ID', 'group'])['log'].shift()
This is close what need - added all combinations and if one element per group is created tuple with same values:
from itertools import combinations
df = (df.groupby(['id','group'])['log']
.apply(lambda x: list(combinations(x, 2)) if len(x) > 1 else [(*x, *x)])
.explode()
.reset_index(name='comb'))
print (df)
id group comb
0 10 UU1Q (23.0, 12.0)
1 10 UU2Q (15.0, 15.0)
2 11 UU1Q (29.8, 33.0)
3 11 UU1Q (29.8, 44.0)
4 11 UU1Q (33.0, 44.0)
5 11 UU2Q (17.0, 17.0)
6 11 UU3Q (35.6, 35.6)
7 13 UU2Q (17.77, 19.9)
8 13 UU2Q (17.77, 55.0)
9 13 UU2Q (19.9, 55.0)
10 14 UU3Q (33.0, 33.0)
11 15 UU3Q (22.0, 22.0)
Or is possible create same values tuples of first rows per ['id','group'] and join to DataFrame df1
filled by combinations:
from itertools import combinations
df1 = (df.groupby(['id','group'])['log']
.apply(lambda x: list(combinations(x, 2)))
.explode()
.dropna()
.reset_index(name='comb'))
df2 = df.groupby(['id','group']).head(1).copy()
df2['comb'] = df2.pop('log').map(lambda x: (x,x))
df = pd.concat([df2, df1]).sort_values(['id','group'], ignore_index=True)
print (df)
id group comb
0 10 UU1Q (23.0, 23.0)
1 10 UU1Q (23.0, 12.0)
2 10 UU2Q (15.0, 15.0)
3 11 UU1Q (29.8, 29.8)
4 11 UU1Q (29.8, 33.0)
5 11 UU1Q (29.8, 44.0)
6 11 UU1Q (33.0, 44.0)
7 11 UU2Q (17.0, 17.0)
8 11 UU3Q (35.6, 35.6)
9 13 UU2Q (17.77, 17.77)
10 13 UU2Q (17.77, 19.9)
11 13 UU2Q (17.77, 55.0)
12 13 UU2Q (19.9, 55.0)
13 14 UU3Q (33.0, 33.0)
14 15 UU3Q (22.0, 22.0)