I am having issues in mapping a groupby output back to the original dataframe. To visualize the problem i created a test_df which is assembled similar to my real dataframe:
test_df = pd.DataFrame({"a": [1,1,2,2,3,3,3], "b": ["a", "a", "b", "c", "b", "b", "b"], "c": ["L", "O", "O", "K", "N", "O", "W"]})
My goal is to get all unique values of column "c" for each group (whereas a group depends on values in column a and b). I am trying to achieve this by executing:
test_df.groupby(["a", "b"])["c"].unique()
Resulting in this output. I can't manage to map this back to my original "test_df". I also tried to use:
test_df.groupby(["a", "b"])["c"].transform(lambda x: np.unique(x))
Which resulted in this error: "ValueError: Length of passed values is 1, index implies 8" on my real dataframe. If I add len() around np.unique(x) it works, but it obviously doesn't return what i need. However, my desired DataFrame looks like this:
desired_df = pd.DataFrame({"a": [1,1,2,2,3,3,3], "b": ["a", "a", "b", "c", "b", "b", "b"], "c": ["L", "O", "O", "K", "N", "O", "W"], "d": ["L, O", "L, O", "O", "K", "N, O, W", "N, O, W", "N, O, W"]})
Use transform
is correct, but you need to wrap the output of np.unique
as an array with same length with x
:
test_df['unique_val'] = test_df.groupby(["a", "b"])["c"].transform(lambda x: [x.unique()]*len(x))
Output:
a b c unique_val
0 1 a L [L, O]
1 1 a O [L, O]
2 2 b O [O]
3 2 c K [K]
4 3 b N [N, O, W]
5 3 b O [N, O, W]
6 3 b W [N, O, W]
However, you seem to expect to concatenate the unique values as a single string, in which case:
test_df['unique_val'] = test_df.groupby(["a", "b"])["c"].transform(lambda x: ', '.join(x.unique()) )
would work. Output:
a b c unique_val
0 1 a L L, O
1 1 a O L, O
2 2 b O O
3 2 c K K
4 3 b N N, O, W
5 3 b O N, O, W
6 3 b W N, O, W