I have a vector containing strings, each containing an alphanumeric code with integers having values 1-3 (ex. "1RV2GA"). I want to extract the numbers and get their sum. So for "1RV2GA", it should extract 1 and 2 and add them to get 3.
I have figured out how to do this on a single string:
str_extract_all(
"1RV2GA", "\\(?[0-3,.]+\\)?", simplify = T) %>%
as.numeric() %>% sum()
[1] 3
My problem is, I can't figure out how to get this to work across a whole vector. str_extract_all()
returns a list, so that would obviously cause issues within mutate, but I just need a sum for each row.
To make some sample data, here:
test<-data.frame(ID=c("2VG1AR", "1OR2AG", "1GV1OA"),
value = c(4,8,2))
> test
ID value
1 2VG1AR 4
2 1OR2AG 8
3 1GV1OA 2
Normally str_extract_all()
would handle a vector like this, returning a list of character vectors:
> str_extract_all(test$ID, "\\(?[0-3,.]+\\)?")
[[1]]
[1] "2" "1"
[[2]]
[1] "1" "2"
[[3]]
[1] "1" "1"
But obviously, to get the sums of the output vectors for each input value, I need them to be numeric, or I need a function designed for an input that is an atomic vector. And if I try a mutate command with simplify=T
, the sum of all the values in the ID vector is returned:
test %>% mutate(ID.numsum =
str_extract_all(ID, "\\(?[0-3,.]+\\)?", simplify = T) %>%
as.numeric() %>% sum())
ID value ID.numsum
1 2VG1AR 4 8
2 1OR2AG 8 8
3 1GV1OA 2 8
If I just try to take the first element of the str_extract_all()
list output, it just returns the correct value for "2VG1AR" down the entire new vector.:
test%>%mutate(ID.numsum = str_extract_all(ID, "\\(?[0-3,.]+\\)?")[[1]] %>%
as.numeric() %>% sum())
# A tibble: 3 × 3
ID value ID.numsum
<chr> <dbl> <dbl>
1 2VG1AR 4 3
2 1OR2AG 8 3
3 1GV1OA 2 3
str_extract()
also doesn't work because it only extracts the first numeral in each string, so if I try it on "2VG1AR" it returns 2, where I need a vector including 2 and 1 so I can sum them to three.
Does anyone have a solution here?
sum()
is a collapsing function. You have to be careful when using those function in a row-wise manner. You can explicitly map()
over the lists. For example
test %>% mutate(ID.numsum =
purrr::map_int(stringr::str_extract_all(ID, "\\(?[0-3,.]+\\)?"),
~sum(as.numeric(.))))
or you could use rowwise()
test %>%
rowwise() %>%
mutate(ID.numsum =
stringr::str_extract_all(ID, "\\(?[0-3,.]+\\)?") |> unlist() |> as.numeric() |> sum())