Search code examples
python-3.xpandaspandas-groupbychaining

How to reference index attributes in groupby chain statement without the DataFrame name?


Is there a way to write groupby by index.attributes in chain after DataFrame creation like this?

pd.read_excel('some_excel.xlsx').groupby(index.time)['some_var'].sum()

I'm able to do it in two lines with referencing DataFrame by name like this:

a = pd.read_excel('some_excel.xlsx')
b = a.groupby(a.index.time)['some_var'].sum()

or in one line with dummy column creation like this:

pd\
 .read_excel('some_excel.xlsx')\
 .assign(time = lambda x: x.index.time)\
 .groupby('time')\
 ['some_var'].sum()

but i wonder if there is a one-line way without additional assignments?

Thank you for answer or link to it.

P.S. Originally index is a datetime (e.g. '2018-05-01 13:15:00') column, and there is no column with 'time' name.


Solution

  • so actually, if 'time' is an attribute of the index, you can use a lambda in the groupby such as:

    pd.read_excel('some_excel.xlsx').groupby(lambda x: x.time)['some_var'].sum()
    

    and it should work.