Search code examples
python-3.xgroup-byseries

How to group pandas series by values and return dict of list of indices for those values, without explicitly transforming the series first?


I have a pandas series that looks like this:

import numpy as np
import string
import pandas as pd

np.random.seed(0)
data = np.random.randint(1,6,10)
index = list(string.ascii_lowercase)[:10]
a = pd.Series(data=data,index=index,name='apple')

a
>>>

a    5
b    1
c    4
d    4
e    4
f    2
g    4
h    3
i    5
j    1
Name: apple, dtype: int32

I want to group the series by its values and return a dict of of list of indices for those values i.e. this result:

{1: ['b', 'j'], 2: ['f'], 3: ['h'], 4: ['c', 'd', 'e', 'g'], 5: ['a', 'i']}

Here is how I achieve that at the moment:

b = a.reset_index().set_index('apple').squeeze()
grouped = b.groupby(level=0).apply(list).to_dict()

grouped
>>>

{1: ['b', 'j'], 2: ['f'], 3: ['h'], 4: ['c', 'd', 'e', 'g'], 5: ['a', 'i']}

However, it does not feel particularly pythonic to explicitly transform the series first so that I can get to the result. Is there a way to do this directly by applying a single function (ideally) or combination of functions in one line to achieve the same result?

Thanks!


Solution

  • You can use the groupby function and apply a lambda expression to it in order to get the desired result in one line:

    grouped = a.groupby(a.values).apply(lambda x: list(x.index)).to_dict()
    

    Alternatively, you could use the following:

    grouped = dict(a.groupby(a.values).apply(lambda x: x.index.get_level_values(0)))
    grouped = dict(a.groupby(a.values).apply(lambda x: x.index.tolist()))