I have a dataframe in Pandas that looks like this:
df = pandas.DataFrame({
'Age': [21,22,21,23,23,21,21],
'Region': ['North America', 'Europe East', 'Europe West', 'South America',
'North America', 'North America', 'Europe West'],
'Answer': ['yes','yes','no','yes','no','no','yes']})
Age Region Answer
0 21 North America yes
1 22 Europe East yes
2 21 Europe West no
3 23 South America yes
4 23 North America no
5 21 North America no
6 21 Europe West yes
And I need a way to produce a cross or pivot table like this:
Answer no yes
Continent
Europe 1 2
America 2 2
Using the Pandas crosstab function I managed to produce this table:
ct = pandas.crosstab(index=df['Region'], columns=df['Answer'])
Answer no yes
Region
Europe East 0 1
Europe West 1 1
North America 2 1
South America 0 1
But then I don't know how to group the indexes that have some part of the string in common.
Is there anyway to do it?
You can use a regex to extract the continent name from the region.
ct.groupby(
ct.index.str.extract(r'(Europe|America)', expand=False).rename('Continent'),
sort=False,
).sum()
Answer no yes
Continent
Europe 1 2
America 2 2