Search code examples
pythonpandasdataframegroup-bymulti-index

Converting a Pandas GroupBy multiindex output from Series back to DataFrame


I have a dataframe:

   City     Name
0   Seattle    Alice
1   Seattle      Bob
2  Portland  Mallory
3   Seattle  Mallory
4   Seattle      Bob
5  Portland  Mallory

I perform the following grouping:

g1 = df1.groupby(["Name", "City"]).count()

which when printed looks like:

                  City  Name
Name    City
Alice   Seattle      1     1
Bob     Seattle      2     2
Mallory Portland     2     2
        Seattle      1     1

But what I want eventually is another DataFrame object that contains all the rows in the GroupBy object. In other words I want to get the following result:

                  City  Name
Name    City
Alice   Seattle      1     1
Bob     Seattle      2     2
Mallory Portland     2     2
Mallory Seattle      1     1

How do I do it?


Solution

  • g1 here is a DataFrame. It has a hierarchical index, though:

    In [19]: type(g1)
    Out[19]: pandas.core.frame.DataFrame
    
    In [20]: g1.index
    Out[20]: 
    MultiIndex([('Alice', 'Seattle'), ('Bob', 'Seattle'), ('Mallory', 'Portland'),
           ('Mallory', 'Seattle')], dtype=object)
    

    Perhaps you want something like this?

    In [21]: g1.add_suffix('_Count').reset_index()
    Out[21]: 
          Name      City  City_Count  Name_Count
    0    Alice   Seattle           1           1
    1      Bob   Seattle           2           2
    2  Mallory  Portland           2           2
    3  Mallory   Seattle           1           1
    

    Or something like:

    In [36]: DataFrame({'count' : df1.groupby( [ "Name", "City"] ).size()}).reset_index()
    Out[36]: 
          Name      City  count
    0    Alice   Seattle      1
    1      Bob   Seattle      2
    2  Mallory  Portland      2
    3  Mallory   Seattle      1