I'm trying to do some complex calculations and part of the code requires that I parse a comma separated entry and count the number of values that are more than 0.
Example input data:
a <- c(0,0,3,0)
b <- c(4,4,0,1)
c <- c("3,4,3", "2,1", 0, "5,8")
x <- data.frame(a, b, c)
x
a b c
1 0 4 3,4,3
2 0 4 2,1
3 3 0 0
4 0 1 5,8
The column that I need to parse, c
is factors and all other columns are numeric. The number of values comma separated will vary, in this example it varies from 0 to 3.
The desired output would look like this:
x$c_occur <- c(3, 2, 0, 2)
x
a b c c_occur
1 0 4 3,4,3 3
2 0 4 2,1 2
3 3 0 0 0
4 0 1 5,8 2
Where c_occur
lists the number of occurrences > 0 in the c column.
I was thinking something like this would work... but I can't figure it out.
library(dplyr
x_desired <- x %>%
mutate(c_occur = count(strsplit(c, ","), > 0))
We can make use of str_count
library(stringr)
library(dplyr)
x %>%
mutate(c_occur = str_count(c, '[1-9]\\d*'))
# a b c c_occur
#1 0 4 3,4,3 3
#2 0 4 2,1 2
#3 3 0 0 0
#4 0 1 5,8 2
After splitting the 'c', we can get the count by sum
ming the logical vector after looping over the list
output from strsplit
library(purrr)
x %>%
mutate(c_occur = map_int(strsplit(as.character(c), ","),
~ sum(as.integer(.x) > 0)))
# a b c c_occur
#1 0 4 3,4,3 3
#2 0 4 2,1 2
#3 3 0 0 0
#4 0 1 5,8 2
Or we can separate the rows with separate_rows
and do a group_by
summarise
library(tidyr)
x %>%
mutate(rn = row_number()) %>%
separate_rows(c, convert = TRUE) %>%
group_by(rn) %>%
summarise(c_occur = sum(c >0)) %>%
select(-rn) %>%
bind_cols(x, .)
# A tibble: 4 x 4
# a b c c_occur
# <dbl> <dbl> <fct> <int>
#1 0 4 3,4,3 3
#2 0 4 2,1 2
#3 3 0 0 0
#4 0 1 5,8 2