Assume my research concerns an observational longitudinal cohort study.
Let γ_comp be the composite outcome of interest and γ1....γ4 at time t1 and t2 denote components of γ_comp. In addition, the dataset has three other variables (χ1, χ2, and χ3) which will be used in future analysis but are not necessary to code γ_comp. Here is an excerpt of the data.frame
df <- structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
Y1_t1 = c(5, 6, 10, 7, 5, 7, 5, 4, 7, 4),
Y2_t1 = c(6, 4, 8, 8, 7, 10, 7, 6, 5, 7),
Y3_t1 = c(5, 6, 10, 4, 8, 5, 10, 5, 4, 6),
Y4_t1 = c(4.5, 8.5, 9.5, 4.5, 5, 8, 4.5, 8.5, 4, 6),
Y1_t2 = c(6, 4, 5, 5, 3, 4, 8, 4, 3, 2),
Y2_t2 = c(5, 4, 3, 6, 5, 5, 5, 2, 2, 8),
Y3_t2 = c(2, 2, 4, 5, 4, 9, 5, 3, 2, 4),
Y4_t2 = c(3.5, 6, 5, 5, 4.5, 4, 2.5, 7, 4.5, 4),
X1 = c(40, 45, 52, 44, 42, 65, 55, 61, 52, 49),
X2 = c("NL", "UK", "NL", "US", "UK", "US", "NL", "NL", "UK", "UK"),
X3 = c(2000, 2005, 2003, 2000, 2001, 2002, 2003, 2004, 2001, 2000)),
class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, -10L))
Structure
spec_tbl_df [10 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ ID : num [1:10] 1 2 3 4 5 6 7 8 9 10
$ Y1_t1: num [1:10] 5 6 10 7 5 7 5 4 7 4
$ Y2_t1: num [1:10] 6 4 8 8 7 10 7 6 5 7
$ Y3_t1: num [1:10] 5 6 10 4 8 5 10 5 4 6
$ Y4_t1: num [1:10] 4.5 8.5 9.5 4.5 5 8 4.5 8.5 4 6
$ Y1_t2: num [1:10] 6 4 5 5 3 4 8 4 3 2
$ Y2_t2: num [1:10] 5 4 3 6 5 5 5 2 2 8
$ Y3_t2: num [1:10] 2 2 4 5 4 9 5 3 2 4
$ Y4_t2: num [1:10] 3.5 6 5 5 4.5 4 2.5 7 4.5 4
$ X1 : num [1:10] 40 45 52 44 42 65 55 61 52 49
$ X2 : chr [1:10] "NL" "UK" "NL" "US" ...
$ X3 : num [1:10] 2000 2005 2003 2000 2001 ...
As mentioned earlier, I am interested in calculating γ_comp. The rules for recording are as follows:
I believe the following steps have to be taken to achieve this aim. First, Y1_diff = Y1_t2/Y1_t1
must be calculated for every component. This is the proportion between two time points and should be <0.80. Next, an if_else condition
has to be applied, which reinforces these rules and returns 1
if the rules are met and 0
if not (i.e., "responded" to treatment or not).
For example, this could be a desired output:
ID Ycomp Y1_t1 Y2_t1 Y3_t1 Y4_t1 Y1_t2 Y2_t2 Y3_t2 Y4_t2 Y1_diff Y2_diff Y3_diff Y4_diff X1 X2 X3
1 1 0 5 6 5 4.5 6 5 2 3.5 1.2 0.83 0.4 0.78 40 NL 2000
2 2 1 6 4 6 8.5 4 4 2 6 0.67 1 0.33 0.71 45 UK 2005
3 3 1 10 8 10 9.5 5 3 4 5 0.5 0.38 0.4 0.53 52 NL 2003
4 4 0 7 8 4 4.5 5 6 5 5 0.71 0.75 1.25 1.11 44 US 2000
5 5 1 5 7 8 5 3 5 4 4.5 0.6 0.71 0.5 0.9 42 UK 2001
6 6 0 7 10 5 8 4 5 9 4 0.57 0.5 1.8 0.5 65 US 2002
7 7 0 5 7 10 4.5 8 5 5 2.5 1.6 0.71 0.5 0.56 55 NL 2003
8 8 0 4 6 5 8.5 4 2 3 7 1 0.33 0.6 0.82 61 NL 2004
9 9 1 7 5 4 4 3 2 2 4.5 0.43 0.4 0.5 1.13 52 UK 2001
10 10 1 4 7 6 6 2 8 4 4 0.5 1.14 0.67 0.67 49 UK 2000
I would appreciate any advice on recoding the composite score γ_comp. Alternative methods are also welcome. The idea is to use γ_comp in logistic regression in future analysis.
This should do it for you, assuming my understanding is correct:
inner_join(
df,
df %>%
select(ID,starts_with("Y")) %>%
pivot_longer(!ID,names_to = c("Y","t"), names_sep="_") %>%
pivot_wider(id_cols = ID:Y, names_from=t, values_from = value) %>%
mutate(change=1-t2/t1) %>%
group_by(ID) %>%
mutate(impct = sum(change>0.2)) %>%
summarize(Y_comp=1*all(impct==4 | (impct==3 & min(change)>=-0.2)))
) %>% relocate(Y_comp,.after = ID)
Output:
ID Y_comp Y1_t1 Y2_t1 Y3_t1 Y4_t1 Y1_t2 Y2_t2 Y3_t2 Y4_t2 X1 X2 X3
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 1 0 5 6 5 4.5 6 5 2 3.5 40 NL 2000
2 2 1 6 4 6 8.5 4 4 2 6 45 UK 2005
3 3 1 10 8 10 9.5 5 3 4 5 52 NL 2003
4 4 0 7 8 4 4.5 5 6 5 5 44 US 2000
5 5 1 5 7 8 5 3 5 4 4.5 42 UK 2001
6 6 0 7 10 5 8 4 5 9 4 65 US 2002
7 7 0 5 7 10 4.5 8 5 5 2.5 55 NL 2003
8 8 0 4 6 5 8.5 4 2 3 7 61 NL 2004
9 9 1 7 5 4 4 3 2 2 4.5 52 UK 2001
10 10 1 4 7 6 6 2 8 4 4 49 UK 2000
Explanation:
This is an inner join between df
, and a new dataframe that contains two columns ID
and Y_comp
. How is this second frame created?
ID
and those starting with "Y"group_by(ID)
), I generate a column impt
as the number of times change
exceeds 0.2. This is constant over ID
Y_comp
as TRUE if all of the rows have impct==4
(i.e. all are improvements) OR, if three are improvements and the minimum in the set is not less than negative 0.2).relocate()
The OP is having an error, likely caused by namespace collision; one solution is to be specific about the packages being used:
library(magrittr)
dplyr::inner_join(
df,
df %>%
dplyr::select(ID,starts_with("Y")) %>%
tidyr::pivot_longer(!ID,names_to = c("Y","t"), names_sep="_") %>%
tidyr::pivot_wider(id_cols = ID:Y, names_from=t, values_from = value) %>%
dplyr::mutate(change=1-t2/t1) %>%
dplyr::group_by(ID) %>%
dplyr::mutate(impct = sum(change>0.2)) %>%
dplyr::summarize(Y_comp=1*all(impct==4 | (impct==3 & min(change)>=-0.2)))
) %>% dplyr::relocate(Y_comp,.after = ID)