Search code examples
rdplyrpanel-data

Attempting to create panel-data from cross sectional data


I'm attempting to transform data from the Global Terrorism Database so that instead of the unit being terror events, it will be "Country_Year" with one variable having the number of terror events that year.

I've managed to create a dataframe that has all one column with all the Country_Year combinations as one variable. I've also find that by using ` ´table(GTD_94_Land$country_txt, GTD_94_Land$iyear)´ the table shows the values that I would like the new variable to have. What I can't figure out is how to store this number as a variable.

So my data look like this

        eventid iyear crit1 crit2 crit3 country country_txt
      <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <chr>      
 1 199401010008  1994     1     1     1     182 Somalia    
 2 199401010012  1994     1     1     1     209 Turkey     
 3 199401010013  1994     1     1     1     209 Turkey     
 4 199401020003  1994     1     1     1     209 Turkey     
 5 199401020007  1994     1     1     0     106 Kuwait     
 6 199401030002  1994     1     1     1     209 Turkey     
 7 199401030003  1994     1     1     1     228 Yemen      
 8 199401030006  1994     1     1     0      53 Cyprus     
 9 199401040005  1994     1     1     0     209 Turkey     
10 199401040006  1994     1     1     0     209 Turkey     
11 199401040007  1994     1     1     1     209 Turkey     
12 199401040008  1994     1     1     1     209 Turkey 

and I would like to transform so that I had

Terror attacks iyear crit1 crit2 crit3 country country_txt
          <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <chr>      
 1 1  1994     1     1     1     182 Somalia    
 2 8  1994     1     1     1     209 Turkey     
 5 1  1994     1     1     0     106 Kuwait     
  7 1  1994    1     1     1     228 Yemen      
 8 1  1994     1     1     0      53 Cyprus     
´´´

I've looked at some solutions but most of them seems to assume that the number the new variable should have already is in the data. 

All help is appreciated!


Solution

  • Assuming df is the original dataframe:

    df_out = df %>% 
      dplyr::select(-eventid) %>% 
      dplyr::group_by(country_txt,iyear) %>% 
      dplyr::mutate(Terrorattacs = n()) %>% 
      dplyr::slice(1L) %>% 
      dplyr::ungroup()
    

    Ideally, I would use summarise but since I don't know the summarising criteria for other columns, I have simply used mutate and slice.

    Note: The 'crit' columns values would be the first occurrence of the 'country_txt' and 'iyear'.