How to find frequencies of multiple ID's from one column by year and plot?

I have a df that looks like

ID	Year
Nation, Nation - NA, Economy, Economy - Asia	2008
Economy, Economy - EU, State, Nation	2009

I would like to extract the frequencies of the ID's so that it looks like

Nation	Economy	State	Year
2	2	0	2008
1	2	1	2009

For ID's that have hyphens like "Economy - EU", I am only interested in counting this as a frequency of "Economy"

My end goal is to plot this df by year with the frequency counts of different ID's in the same plot. So for example, "State" would be a green dot in 2008, "Nation" would be a red dot in 2008, and "Economy" would be a blue dot in 2008.

If the second df is not a good way to do this, I am also open to suggestions! That was just my first thought on how to start this.

I will this post as a separate question if this is not appropriate, but my next question is how to plot the frequencies of the second df by year, as mentioned above?

Thank you!

Solution

You can split the data into different rows using separate_rows splitting on a comma (,). Separate the value after - in a different column and calculate occurrence of ID value in each Year and get the data in wide format.

library(dplyr)
library(tidyr)

df %>%
  separate_rows(ID, sep = ',\\s*') %>%
  separate(ID, c('ID', 'Value'), sep = '\\s*-\\s*',fill = 'right') %>%
  count(Year, ID) %>%
  pivot_wider(names_from = ID, values_from = n, values_fill = 0)

#   Year Economy Nation State
#  <int>   <int>  <int> <int>
#1  2008       2      2     0
#2  2009       2      1     1

You can also reduce the code by using janitor::tabyl.

df %>%
  separate_rows(ID, sep = ',\\s*') %>%
  separate(ID, c('ID', 'Value'), sep = '\\s*-\\s*',fill = 'right') %>%
  janitor::tabyl(Year, ID)

data

df <- structure(list(ID = c("Nation, Nation - NA, Economy, Economy - Asia", 
"Economy, Economy - EU, State, Nation"), Year = 2008:2009), 
class = "data.frame", row.names = c(NA, -2L))