Search code examples
pythonpandaspivot-tablecrosstab

Create separate crosstab in pandas DataFrame for each category in a column


I have a pandas DataFrame df which looks like this (Sample):

df = pd.DataFrame({'Time' : ['Low','High','Medium','Low','Low','High','Low','High','Medium'],
                   'Sentiment' : ['Pos','Neg','Neg','Pos','Pos','Neg','Neg','Pos','Neg'],
                   'Product' : ['P1','P1','P1','P2','P1','P2','P2','P2','P1']})

My question is simple (for which I searched before posting), I need to create a crosstab which can be created easily with this command:

pd.crosstab(df['Time'],df['Sentiment'])

And gives the desired output:

Sentiment  Neg  Pos
Time               
High         2    1
Low          1    3
Medium       2    0

Now there is one more column in the data called Product which has around 50 unique categories (for sample I have included 2) and I need to create similar crosstab objects for all the categories, how can I do that?

My expected output is shown below:

Crosstab for Product1 (P1):

P1

Sentiment  Neg  Pos
Time
High         1    0
Low          0    2
Medium       2    0

Crosstab for Product1 (P2):

P2

Sentiment  Neg  Pos
Time
High         1    1
Low          1    1
Medium       0    0

How can I get 50 crosstab in one go?


Solution

  • Use crosstab with list of both columns - so get MultiIndex:

    df1 = pd.crosstab([df['Product'], df['Time']],df['Sentiment'])
    print (df1)
    Sentiment       Neg  Pos
    Product Time            
    P1      High      1    0
            Low       0    2
            Medium    2    0
    P2      High      1    1
            Low       1    1
    

    For add all possible values in MultiIndex add DataFrame.unstack and DataFrame.stack:

    df1 = pd.crosstab([df['Product'], df['Time']],df['Sentiment']).unstack(fill_value=0).stack()
    print (df1)
    Sentiment       Neg  Pos
    Product Time            
    P1      High      1    0
            Low       0    2
            Medium    2    0
    P2      High      1    1
            Low       1    1
            Medium    0    0
    

    And then select by first level by DataFrame.loc:

    print (df1.loc['P1'])
    Sentiment  Neg  Pos
    Time               
    High         1    0
    Low          0    2
    Medium       2    0 
    
    
    print (df1.loc['P2'])
    Sentiment  Neg  Pos
    Time               
    High         1    1
    Low          1    1