Search code examples
pythonpandaspivotgroup

How to group with multiple index in Python?


Problem

I want to pivot and group values. Despite looking through the guidance here, I cannot seem to find what I am looking for.

Reprex

What I have.

import pandas as pd

data_input = {'Area':['North', 'South', 'West', 'East','North', 'South', 'West', 'East'], 
              "Job1":["T", "F", "T", "X","T", "F", "T", "X"],
              "Job2":["F", "X", "T", "X","T", "F", "T", "X"],
              "Job3":["T", "F", "T", "X","X", "X", "F", "T"]}
 
# Create DataFrame
df1 = pd.DataFrame(data_input)
 
# Print the output.
print(df1)

enter image description here

What I want

# multi-level columns
items = pd.MultiIndex.from_tuples([('Job1', 'T'),('Job1', 'F'), ('Job1', 'X'),
                                 ('Job2', 'T'),('Job2', 'F'), ('Job2', 'X')])

# creating a DataFrame
dataFrame = pd.DataFrame([[2, 0, 0, 1, 1, 0], 
                          [0, 2, 0,0, 1, 1], 
                          [2, 0, 0,2, 0, 0], 
                          [0, 0, 2,0, 0, 2]],
                         index=['North', 'South', 'East', "West"],
                         columns=items)

# DataFrame
dataFrame

enter image description here


Solution

  • One option is to pivot each Job column and concat the result

    cols = ['Job1', 'Job2', 'Job3']
    dfs = []
    
    for col in cols:
        df = pd.crosstab(df1['Area'], df1[col])
        df.columns = pd.MultiIndex.from_product([[col], df.columns.tolist()])
        dfs.append(df)
    
    out = pd.concat(dfs, axis=1)
    
    print(out)
    
          Job1       Job2       Job3
             F  T  X    F  T  X    F  T  X
    Area
    East     0  0  2    0  0  2    0  1  1
    North    0  2  0    1  1  0    0  1  1
    South    2  0  0    1  0  1    1  0  1
    West     0  2  0    0  2  0    1  1  0