python pandas group-by data-science jupyter

How to count text event type and transform it into country-year data using pandas?

I am trying to convert a dataframe where each row is a specific event, and each column has information about the event. I want to turn this into data in which each row is a country and year with information about the number and characteristics about the events in the given year.In this data set, each event is an occurrence of terrorism, and I want to count the number of events where the "target" is a government building. One of the columns is called "targettype" or "targettype_txt" and there are 5 different entries in this column I want to count (government building, military, police, diplomatic building etc). The targettype is also coded as a number if that is easier (i.e. there is another column where gov't building is 2, military installation is 4 etc..)

FYI This data set has 16 countries in West Africa and is looking at years 2000-2020 with a total of roughly 8000 events recorded. The data comes from the Global Terrorism Database, and this is for a thesis/independent research project (i.e. not a graded class assignment).

Right now my data looks like this (there are a ton of other columns but they aren't important for this):

eventID	iyear	country_txt	nkill	nwounded	nhostages	targettype_txt
10000102	2000	Nigeria	3	10	0	government building
10000103	2000	Mali	1	3	15	military installation
10000103	2000	Nigeria	15	0	0	government building
10000103	2001	Benin	1	0	0	police
10000103	2001	Nigeria	1	3	15	private business

. . .

And I would like it to look like this:

country_txt	iyear	total_nkill	total_nwounded	total_nhostages	total public_target
Nigeria	2000	200	300	300	15
Nigeria	2001	250	450	15	17

I was able to get the total number for nkill,nwounded, and nhostages using this super simple line:

df2 = cdf.groupby(['country','country_txt', 'iyear'])['nkill', 'nwound','nhostkid'].sum()

But this is a little different because I want to only count certain entries and sum up the total number of times they occur. Any thoughts or suggestions are really appreciated!

Solution

Try:

cdf['CountCondition'] = (cdf['targettype_txt']=='government building') | 
    (cdf['targettype_txt']=='military installation') | 
    (cdf['targettype_txt']=='police')
df2 = cdf[cdf['CountCondition']].groupby(['country','country_txt', 'iyear', 'CountCondition']).count()

You create a new column 'CountCondition' which just marks as true or false if the condition in the statement holds. Then you just count the number of times the CountCondition is True. Hope this makes sense.

It is possible to combine all this into one statement and NOT create an additional column but the statement gets quite convaluted and more difficult to understand how it works:

df2 = cdf[(cdf['targettype_txt']=='government building') | 
    (cdf['targettype_txt']=='military installation') | 
    (cdf['targettype_txt']=='police')].groupby(['country','country_txt', 'iyear']).count()