Search code examples
pythonpandasdataframepivot-table

Grouping indexes in a Pandas crosstable


I have a dataframe in Pandas that looks like this:

df = pandas.DataFrame({
    'Age': [21,22,21,23,23,21,21],
    'Region': ['North America', 'Europe East', 'Europe West', 'South America',
               'North America', 'North America', 'Europe West'],
    'Answer': ['yes','yes','no','yes','no','no','yes']})

   Age         Region Answer
0   21  North America    yes
1   22    Europe East    yes
2   21    Europe West     no
3   23  South America    yes
4   23  North America     no
5   21  North America     no
6   21    Europe West    yes

And I need a way to produce a cross or pivot table like this:

Answer         no  yes
Continent                
Europe          1    2    
America         2    2

Using the Pandas crosstab function I managed to produce this table:

ct = pandas.crosstab(index=df['Region'], columns=df['Answer'])

Answer         no  yes
Region                
Europe East     0    1
Europe West     1    1
North America   2    1
South America   0    1

But then I don't know how to group the indexes that have some part of the string in common.

Is there anyway to do it?


Solution

  • You can use a regex to extract the continent name from the region.

    ct.groupby(
        ct.index.str.extract(r'(Europe|America)', expand=False).rename('Continent'),
        sort=False,
        ).sum()
    
    Answer     no  yes
    Continent         
    Europe      1    2
    America     2    2