I have a long dataset in which each participant should have 9 measurements, but they all have more than 9. These measurements need to be within specific brackets of the Weeks variable.
If there is more than one measurement in the bracket per participant, I would like the measurement closest to the ideal number of weeks (indicated above in parentheses) to remain, and the rest of the rows to be removed.
This an example of my data:
ID <- c(1,1,1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2,2,2,2,2,2)
Weeks <- c(11,16,25,28,31,34,35,36,38,40,
6,12,13,26,29,31,32,33,34,36,38,40,41)
Measurement <- c(120,160,170,200,120,40,89,70,63,44,
90,92,93,94,95,96,97,100,100, 120,
110, 100, 130)
df1 <- data.frame(ID, Weeks, Measurement)
This is what I would like it to look like, with rows 7, 12, 17, 18, and 23 removed.
Thank you for any help!
Since you have tagged this with dplyr
, here is a dplyr solution. It is based on sorting, so it will only ever retain one row per participant and week slot.
If it is possible for two measurements to happen in the same week, you may want to consider a way to resolve that situation, for example by averaging the measurements from the same week before discarding observations.
library(dplyr)
# Data
df1 <- data.frame(
ID = rep(c(1, 2), c(10L, 13L)),
Weeks = c(11, 16, 25, 28, 31, 34, 35, 36, 38, 40, 6, 12, 13, 26, 29, 31, 32, 33, 34, 36, 38, 40, 41),
Measurement = c(
120, 160, 170, 200, 120, 40, 89, 70, 63, 44, 90, 92, 93, 94, 95, 96, 97, 100,
100, 120, 110, 100, 130
)
)
# Solution
ideal = c(10, 16, 25, 28, 31, 34, 36, 38, 40)
df1 |>
mutate(slot = cut(Weeks,
breaks = c(0, 12, 25, 28, 31, 34, 36, 38, 40, Inf),
right = F
),
diff = abs(Weeks - sapply(as.integer(slot), \(s) ideal[s]))
) |>
group_by(ID, slot) |>
arrange(ID, slot, diff) |>
slice(1) |>
ungroup()
#> # A tibble: 18 × 5
#> ID Weeks Measurement slot diff
#> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 1 11 120 [0,12) 1
#> 2 1 16 160 [12,25) 0
#> 3 1 25 170 [25,28) 0
#> 4 1 28 200 [28,31) 0
#> 5 1 31 120 [31,34) 0
#> 6 1 34 40 [34,36) 0
#> 7 1 36 70 [36,38) 0
#> 8 1 38 63 [38,40) 0
#> 9 1 40 44 [40,Inf) 0
#> 10 2 6 90 [0,12) 4
#> 11 2 13 93 [12,25) 3
#> 12 2 26 94 [25,28) 1
#> 13 2 29 95 [28,31) 1
#> 14 2 31 96 [31,34) 0
#> 15 2 34 100 [34,36) 0
#> 16 2 36 120 [36,38) 0
#> 17 2 38 110 [38,40) 0
#> 18 2 40 100 [40,Inf) 0
Created on 2024-07-14 with reprex v2.1.1