I am looking at graft patency after surgery (CABG)
In a CABG procedure, a single patient will typically get more than one graft (bypass), and we are looking at time-to-failure. This is indicated in the raw data by a variable indicating number of failed grafts, and the time at which diagnosed.
My raw data is currently one-line-per-patient and I believe I need to make it one-line-per-graft in order to continue to KM and Cox analyses. I am considering assorted if/then loops, but wonder if there is a more-efficient way to recode here.
Example data:
Patient VeinGrafts VeinsOccluded Months
1 2 0 36
2 4 1 34
3 3 2 38
4 4 0 33
In order to look at this "per vein" I need to recode such that each #VeinGraft gets its own row, and VeinsOccluded becomes 1/0
I need each row replicated (VeinGrafts) times, such that patient 2 will have 4 rows, but one of them has the VeinsOccluded indicator and the other 3 do not
This is what I would need the above data to look like for my next analytic move.
Patient VeinGrafts VeinsOccluded Months
1 2 0 36
1 2 0 36
2 4 1 34
2 4 0 34
2 4 0 34
2 4 0 34
3 3 1 38
3 3 1 38
3 3 0 38
4 4 0 33
4 4 0 33
4 4 0 33
4 4 0 33
This community has been so incredibly helpful to this point, but I have not been able to find a similar question answered - if I have overlooked I apologize, but most certainly appreciate any ideas you may have!
We can uncount
to expand the data, then grouped by 'Patient', mutate
the 'VeinsOccluded' by creating a logical expression with row_number()
on the first
value of 'VeinsOccluded', coerced to binary with +
library(dplyr)
library(tidyr)
df1 %>%
uncount(VeinGrafts, .remove = FALSE) %>%
group_by(Patient) %>%
mutate(VeinsOccluded = +(row_number() <= first(VeinsOccluded))) %>%
ungroup %>%
select(names(df1))
-output
# A tibble: 13 x 4
# Patient VeinGrafts VeinsOccluded Months
# <int> <int> <int> <int>
# 1 1 2 0 36
# 2 1 2 0 36
# 3 2 4 1 34
# 4 2 4 0 34
# 5 2 4 0 34
# 6 2 4 0 34
# 7 3 3 1 38
# 8 3 3 1 38
# 9 3 3 0 38
#10 4 4 0 33
#11 4 4 0 33
#12 4 4 0 33
#13 4 4 0 33
Or this can be done with data.table
(probably in a more efficient way)
library(data.table)
setDT(df1)[rep(seq_len(.N), VeinGrafts)][,
VeinsOccluded := +(seq_len(.N) <= first(VeinsOccluded)), Patient][]
-output
# Patient VeinGrafts VeinsOccluded Months
# 1: 1 2 0 36
# 2: 1 2 0 36
# 3: 2 4 1 34
# 4: 2 4 0 34
# 5: 2 4 0 34
# 6: 2 4 0 34
# 7: 3 3 1 38
# 8: 3 3 1 38
# 9: 3 3 0 38
#10: 4 4 0 33
#11: 4 4 0 33
#12: 4 4 0 33
#13: 4 4 0 33
df1 <- structure(list(Patient = 1:4, VeinGrafts = c(2L, 4L, 3L, 4L),
VeinsOccluded = c(0L, 1L, 2L, 0L), Months = c(36L, 34L, 38L,
33L)), class = "data.frame", row.names = c(NA, -4L))