this may be a simple question, but I don't have a knack for writing code in R AT ALL. I have a dataset with right censoring that looks something like this:
dput(head(books)):
structure(list(id = 1:6, time = c(29, 30, 26, 30, 30, 29
), event = c(1, 0, 1, 0, 0, 1), z1 = c("early", "late",
"early", "late", "late", "early"), z2 = c(9, 6, 4, 9,
9, 5), z3 = c(0B, 1B, 0C, 0C, 0C, 0C), burrowed = c(1,
1, 1, 0, 1, 1), time.burrowed = c(5, 2, 6, 30, 1, 8),
returned = c(1, 0, 0, 0, 1, 0), time.returned = c(20, 30, 21,
30, 28, 29)), row.names = c(NA, 6L), class = "data.frame")
and I need it to look like this?
head(books)
id start stop checkedout event z1 z2
1 0 5 0 0 early 9
1 5 20 1 0 early 9
1 20 30 0 1 early 9 etc.
2
3
4
4
basically, combining the borrowed and returned into whether it's checked out at these times or not
I have so far...
start <- 0
stop <- numeric(length=0)
checkedout <- 0
event <- numeric(length=0)
if (book$burrowed[1]==1) {
start <- c(start, book$time.burrowed[1])
stop <- c(stop, book$time.burrowed[1])
checkedout <- c(checkedout,1)
event <- c(event, 0)
}
if (book$returned[1]==1) {
start <- c(start, book$time.returned[1])
stop <- c(stop, book$time.returned[1])
checkedout <- c(checkedout,0)
event <- c(event, 0)
}
stop <- c(stop, book$time[1])
event <- c(event, book$event[1])
temp.frame <- data.frame(id=book$id[1],start,stop,event,checkedout)
Thanks for providing the data, it is helpful. However, there still is uncertainty on how you want to set specific columns, such as event
(it appears different between the result from your code, and the example above).
Here's something that might help you get started. I am guessing you'll want to add specific conditional rules to help further shape how your result looks.
First, I would create a function to handle a single row of data:
my_fun <- function(x) {
df <- as.data.frame(rbind(
c(id = x[["id"]], start = 0, stop = x[["time.burrowed"]], event = 0, checkedout = 0),
c(id = x[["id"]], start = x[["time.burrowed"]], stop = x[["time.returned"]], event = 0, checkedout = 1),
c(id = x[["id"]], start = x[["time.returned"]], stop = x[["time"]], event = x[["event"]], checkedout = 0)
))
df <- cbind(
df,
z1 = x[["z1"]],
z2 = x[["z2"]],
z3 = x[["z3"]]
)
return(df)
}
In this function, you can clearly indicate what you want in converting a single row to three rows of data. Specifically, indicate what the start
and stop
should be based on time.burrowed
(misspelling?) and time.returned
. Also, you can hard code the first start
as 0, and final stop
as time
. Here, you can also indicate what you want for event
and checkedout
. I set event
to 0 based on your example code for the first two rows, and then event
for third row; and checkedout
was 1 only for middle row. In the end, the three rows are combined with rbind
.
After that, other columns can be added with cbind
. This includes z1
, z2
, etc. which appear constant across rows.
You can try out the function with a single row, such as:
R> my_fun(book[1,])
id start stop event checkedout z1 z2 z3
1 1 0 5 0 0 early 9 0B
2 1 5 20 0 1 early 9 0B
3 1 20 29 1 0 early 9 0B
Once you are satisfied with your function, you can apply the function to all rows in your data frame:
do.call(rbind, lapply(1:nrow(book), function(x) my_fun(book[x,])))
Output
id start stop event checkedout z1 z2 z3
1 1 0 5 0 0 early 9 0B
2 1 5 20 0 1 early 9 0B
3 1 20 29 1 0 early 9 0B
4 2 0 2 0 0 late 6 1B
5 2 2 30 0 1 late 6 1B
6 2 30 30 0 0 late 6 1B
7 3 0 6 0 0 early 4 0C
8 3 6 21 0 1 early 4 0C
9 3 21 26 1 0 early 4 0C
10 4 0 30 0 0 late 9 0C
11 4 30 30 0 1 late 9 0C
12 4 30 30 0 0 late 9 0C
13 5 0 1 0 0 late 9 0C
14 5 1 28 0 1 late 9 0C
15 5 28 30 0 0 late 9 0C
16 6 0 8 0 0 early 5 0C
17 6 8 29 0 1 early 5 0C
18 6 29 29 1 0 early 5 0C