I have a df that looks like
ID | Year |
---|---|
Nation, Nation - NA, Economy, Economy - Asia | 2008 |
Economy, Economy - EU, State, Nation | 2009 |
I would like to extract the frequencies of the ID's so that it looks like
Nation | Economy | State | Year |
---|---|---|---|
2 | 2 | 0 | 2008 |
1 | 2 | 1 | 2009 |
For ID's that have hyphens like "Economy - EU", I am only interested in counting this as a frequency of "Economy"
My end goal is to plot this df by year with the frequency counts of different ID's in the same plot. So for example, "State" would be a green dot in 2008, "Nation" would be a red dot in 2008, and "Economy" would be a blue dot in 2008.
If the second df is not a good way to do this, I am also open to suggestions! That was just my first thought on how to start this.
I will this post as a separate question if this is not appropriate, but my next question is how to plot the frequencies of the second df by year, as mentioned above?
Thank you!
You can split the data into different rows using separate_rows
splitting on a comma (,
). Separate the value after -
in a different column and calculate occurrence of ID
value in each Year
and get the data in wide format.
library(dplyr)
library(tidyr)
df %>%
separate_rows(ID, sep = ',\\s*') %>%
separate(ID, c('ID', 'Value'), sep = '\\s*-\\s*',fill = 'right') %>%
count(Year, ID) %>%
pivot_wider(names_from = ID, values_from = n, values_fill = 0)
# Year Economy Nation State
# <int> <int> <int> <int>
#1 2008 2 2 0
#2 2009 2 1 1
You can also reduce the code by using janitor::tabyl
.
df %>%
separate_rows(ID, sep = ',\\s*') %>%
separate(ID, c('ID', 'Value'), sep = '\\s*-\\s*',fill = 'right') %>%
janitor::tabyl(Year, ID)
data
df <- structure(list(ID = c("Nation, Nation - NA, Economy, Economy - Asia",
"Economy, Economy - EU, State, Nation"), Year = 2008:2009),
class = "data.frame", row.names = c(NA, -2L))