From the following data grouped by id
and visit
, which include a mix of numeric and text/character values
, how to create these 3 new columns:
count_wotxt
: count by id
and visit
but without considering text/character valuesdiff_value_first
: calculate the difference between each numeric value versus the first visit
for each id
, ignoring text/character values
diff_value_previous
: calculate the difference between each numeric value versus the previous visit
for each id
, ignoring text/character values
Data:
dat <-
structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"),
visit = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"),
value = c("5", "7", "10", "20", "15", "text0", "25", "text1",
"100", "text2", "text3", "120", "text4", "50", "45"), count = c(1L,
2L, 3L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 1L, 2L, 3L)), class = "data.frame", row.names = c(NA,
-15L))
Desired output:
> dat2
id visit value count count_wotxt diff_value_first diff_value_previous
1 1 1 5 1 1 0 0
2 1 1 7 2 2 2 2
3 1 1 10 3 3 5 3
4 1 2 20 1 1 0 0
5 1 2 15 2 2 -5 -5
6 1 2 text0 3 NA NA NA
7 1 2 25 4 3 5 10
8 1 2 text1 5 NA NA NA
9 2 1 100 1 1 0 0
10 2 1 text2 2 NA NA NA
11 2 1 text3 3 NA NA NA
12 2 1 120 4 2 20 20
13 2 2 text4 1 NA NA NA
14 2 2 50 2 1 NA 0
15 2 2 45 3 2 NA -5
Thanks for help
Here is one logic we could apply to get the desired output:
I think the most challenging part is the difference of the previous row in the presence of NAs. I have solved this by using fill
:
library(dplyr)
library(tidyr)
dat %>%
group_by(id, visit) %>%
mutate(count_wotxt = ifelse(grepl("^[0-9]+$", value), cumsum(grepl("^[0-9]+$", value)), NA),
value_numeric = as.numeric(as.character(value)),
diff_value_first = value_numeric - value_numeric[1]) %>%
fill(value_numeric, .direction = "down") %>%
mutate(diff_value_previous = value_numeric - lag(value_numeric, default = first(value_numeric)), .keep="unused") %>%
ungroup()
# A tibble: 15 × 7
id visit value count count_wotxt diff_value_first diff_value_previous
<fct> <fct> <chr> <int> <int> <dbl> <dbl>
1 1 1 5 1 1 0 0
2 1 1 7 2 2 2 2
3 1 1 10 3 3 5 3
4 1 2 20 1 1 0 0
5 1 2 15 2 2 -5 -5
6 1 2 text0 3 NA NA 0
7 1 2 25 4 3 5 10
8 1 2 text1 5 NA NA 0
9 2 1 100 1 1 0 0
10 2 1 text2 2 NA NA 0
11 2 1 text3 3 NA NA 0
12 2 1 120 4 2 20 20
13 2 2 text4 1 NA NA NA
14 2 2 50 2 1 NA NA
15 2 2 45 3 2 NA -5
Warning message:
There were 3 warnings in `mutate()`.
The first warning was:
ℹ In argument: `value_numeric = as.numeric(as.character(value))`.
ℹ In group 2: `id = 1`, `visit = 2`.
Caused by warning:
! NAs introduced by coercion
ℹ Run dplyr::last_dplyr_warnings() to see the 2 remaining warnings.