I am hoping for help from the swarm intelligence! In preparing my dataframe of laboratory values across different time points, I encounter the following issue. My dataframe is in long format consisting of 5 variables: the subject ID (sid), timepoint (zeitpunkt), intervention group of the clinical trial (gruppe), the laboratory parameter (parameter), the actual value of the parameter (messwert) and the unit of the value (einheit).
A tibble: 6 x 6
sid zeitpunkt gruppe parameter messwert einheit
<dbl> <chr> <chr> <chr> <dbl> <chr>
1 1 t0 DGE ACPA(citrull. Prot.-Ak) EIA/Se 1000 U/ml
2 1 t3 DGE ACPA(citrull. Prot.-Ak) EIA/Se 1000 U/ml
3 1 t2 DGE Apolipoprot. A1 HP 1.31 g/l
4 1 t2 DGE Apolipoprotein B 0.61 g/l
5 1 t0 DGE aPTT Pathromtin SL 34.3 sek.
6 1 t1 DGE aPTT Pathromtin SL 37.3 sek.
>
I would now like to compare the change in value of each parameter in percent from baseline (t0) through the last timepoint (t3)in each individual, so that I can comopare the mean of the two interventions perspectively. Problematic: some values are missing, some might be existing twice but slightly differing in case the laboratory measured twice or samples were missing. Also, some parametres were only assessed at baseline.
I tried to calculate the change in percent by summarizing this new variable to a new
dataframe: labor_change <-
labor_long %>%
group_by(sid, gruppe, parameter, zeitpunkt) %>%
arrange(sid, parameter, zeitpunkt)%>%
summarize(messwert=mean(messwert))%>%
ungroup()%>%
group_by(sid, gruppe, parameter) %>%
summarize(changet3t0 =
(messwert[zeitpunkt == "t3"]-messwert[zeitpunkt = "t0"]/messwert[zeitpunkt == "t0"])*100)
My problem now: As soon as I use values from two different timepoints (aka 2 different rows, differing in a second variable "timepoint") in the code, R returns me the desired dataframe, but filled with NA only:
# Groups: sid, gruppe, parameter [6]
sid gruppe parameter changet3t0
<dbl> <chr> <chr> <dbl>
1 1 DGE aPTT Pathromtin SL NA
2 1 DGE Basophile % NA
3 1 DGE Basophile absolut NA
4 1 DGE Calcium NA
5 1 DGE Creatinkinase (CK) HP NA
6 1 DGE CRP HP NA
>
As soon as I eliminate one timepoint out of the calculation, R gives me the desired calculated value. Any idea how I can fix this?
As mentioned by @Martin Gal at one place you are using =
instead of ==
and currently you are using (x-y/x)
but instead what you want is (x-y)/x
which is different.
I would also suggest to use match
instead of ==
for comparison since match
would return an NA
if there is no "t3"
or "t0"
in the data.
library(dplyr)
labor_change <- labor_long %>%
group_by(sid, gruppe, parameter, zeitpunkt) %>%
arrange(sid, parameter, zeitpunkt)%>%
summarize(messwert=mean(messwert))%>%
group_by(sid, gruppe, parameter) %>%
summarize(changet3t0 = (messwert[match('t3', zeitpunkt)]-messwert[match("t0", zeitpunkt)])/
messwert[match("t0",zeitpunkt)]*100, .groups = 'drop')