I have two dataframes that look like the following
library(dplyr)
State = c('AK','AL','AR','AZ')
Perc = c(0.0023, 0.0034, 0.0043, 0.065)
df1 <- data.frame(State, Perc)
ID = c('A','B','C','D','E')
States = c('AK','AK; AL', 'AK; AL; AR; AZ', 'AR; AZ', 'AL')
df2 <- data.frame(ID, States)
I want to create a new column in df2 that locates the value of Perc in df1 for a particular state and sums up the values for each ID. Ideally, I would like to have a new df2 that looks like this:
ID States SumPerc
A AK 0.0023
B AK; AL 0.0057
C AK; AL; AR; AZ 0.0750
D AR; AZ 0.0693
E AL 0.0034
How should I go about this process?
First we split df2
into a row for each state, then we join in the Perc
data. Then we just need to aggregate back up based on the ID
group, and we re-concatenate States
and sum up Perc
library(tidyverse)
df2 %>% separate_rows(States, sep = "; ") %>%
left_join(., df1, by = c("States" = "State")) %>%
group_by(ID) %>%
summarize(
Perc = sum(as.numeric(Perc), na.rm = TRUE),
States = paste(States, collapse= "; ")
)
ID Perc States
<chr> <dbl> <chr>
1 A 0.0023 AK
2 B 0.0057 AK; AL
3 C 0.075 AK; AL; AR; AZ
4 D 0.0693 AR; AZ
5 E 0.0034 AL