Search code examples
pythonpandaspandas-groupbyunique

How to map a groupby output (list of unique values) back to the original dataframe?


I am having issues in mapping a groupby output back to the original dataframe. To visualize the problem i created a test_df which is assembled similar to my real dataframe:

test_df = pd.DataFrame({"a": [1,1,2,2,3,3,3], "b": ["a", "a", "b", "c", "b", "b", "b"], "c": ["L", "O", "O", "K", "N", "O", "W"]})

My goal is to get all unique values of column "c" for each group (whereas a group depends on values in column a and b). I am trying to achieve this by executing:

test_df.groupby(["a", "b"])["c"].unique()

Resulting in this output. I can't manage to map this back to my original "test_df". I also tried to use:

test_df.groupby(["a", "b"])["c"].transform(lambda x: np.unique(x))

Which resulted in this error: "ValueError: Length of passed values is 1, index implies 8" on my real dataframe. If I add len() around np.unique(x) it works, but it obviously doesn't return what i need. However, my desired DataFrame looks like this:

desired_df = pd.DataFrame({"a": [1,1,2,2,3,3,3], "b": ["a", "a", "b", "c", "b", "b", "b"], "c": ["L", "O", "O", "K", "N", "O", "W"], "d": ["L, O", "L, O", "O", "K", "N, O, W", "N, O, W", "N, O, W"]})

Solution

  • Use transform is correct, but you need to wrap the output of np.unique as an array with same length with x:

    test_df['unique_val'] = test_df.groupby(["a", "b"])["c"].transform(lambda x: [x.unique()]*len(x))
    

    Output:

       a  b  c unique_val
    0  1  a  L     [L, O]
    1  1  a  O     [L, O]
    2  2  b  O        [O]
    3  2  c  K        [K]
    4  3  b  N  [N, O, W]
    5  3  b  O  [N, O, W]
    6  3  b  W  [N, O, W]
    

    However, you seem to expect to concatenate the unique values as a single string, in which case:

    test_df['unique_val'] = test_df.groupby(["a", "b"])["c"].transform(lambda x: ', '.join(x.unique()) )
    

    would work. Output:

       a  b  c unique_val
    0  1  a  L       L, O
    1  1  a  O       L, O
    2  2  b  O          O
    3  2  c  K          K
    4  3  b  N    N, O, W
    5  3  b  O    N, O, W
    6  3  b  W    N, O, W