I have a dataframe as follows:
library(dplyr)
df <- data.frame(A=1:20,
B=c(2,1.8,1.6,1.8,4,6,8,10,12,10,8,6,13,14,15,16,16.5,15,14,13))
mutate(df, C = B - lag(B))
A B C
1 2.0 NA
2 1.8 -0.2
3 1.6 -0.2
4 1.8 0.2
5 4.0 2.2
6 6.0 2.0
7 8.0 2.0
8 10.0 2.0
9 12.0 2.0
10 10.0 -2.0
11 8.0 -2.0
12 6.0 -2.0
13 13.0 7.0
14 14.0 1.0
15 15.0 1.0
16 16.0 1.0
17 16.5 -0.5
18 15.0 -1.0
19 14.0 -1.0
20 13.0 -1.0
And I want to extract out the sequences of minus values where there are 3 or more consecutively together and put in a separate column. So for example put the values from (col C)row 10, 11, 12 in a new column and row 17, 18,19,20 in another new column. This dataframe is huge so I dont know how many new columns I will have. Any help would be appreciated. Thanks
Here is an option with rleid
to create a run-length-id grouping based on the sign
of the column 'C' i.e. those adjacent elements with same sign
will have same grouping 'id' and it gets incremented when there is a difference in sign
. Then, we create the columns based on the count (n()
) value to be particular number i.e. 3 or 4
library(dplyr)
library(data.table)
df %>%
mutate(C = B - lag(B)) %>%
group_by(grp = rleid(sign(C))) %>%
mutate(newC3 = if(n() ==3 && all(C < 0)) C else NA,
newC4 = if(n() == 4 && all(C < 0) C else NA)
To make this automated, an option is pivot_wider
to reshape from 'long' to 'wide' format after creating the grouping id with rleid
and replaceing the values that are not negative to NA
. In this way, we get only the blocks of negative values to be in a separate column
library(tidyr)
library(stringr)
df %>%
mutate(C = B - lag(B)) %>%
mutate(grp = str_c('C', rleid(sign(C))),
C1 = case_when(C >=0 ~ NA_real_, TRUE ~ C)) %>%
pivot_wider(names_from = grp, values_from = C1)%>%
select(where(~ sum(!is.na(.)) > 0))
-output
# A tibble: 20 x 6
# A B C C2 C4 C7
# <int> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 2 NA NA NA NA
# 2 2 1.8 -0.200 -0.200 NA NA
# 3 3 1.6 -0.200 -0.200 NA NA
# 4 4 1.8 0.200 NA NA NA
# 5 5 4 2.2 NA NA NA
# 6 6 6 2 NA NA NA
# 7 7 8 2 NA NA NA
# 8 8 10 2 NA NA NA
# 9 9 12 2 NA NA NA
#10 10 10 -2 NA -2 NA
#11 11 8 -2 NA -2 NA
#12 12 6 -2 NA -2 NA
#13 13 13 7 NA NA NA
#14 14 14 1 NA NA NA
#15 15 15 1 NA NA NA
#16 16 16 1 NA NA NA
#17 17 16 0 NA NA NA
#18 18 15 -1 NA NA -1
#19 19 14 -1 NA NA -1
#20 20 13 -1 NA NA -1
NOTE: The column names 'C2', 'C4', 'C7' are based on the ids created with rleid
. If we wanted to rename, then it can be done with rename_with
or rename_at
...
%>%
rename_at(vars(matches('^C\\d+')), ~ str_c('C', seq_along(.)))