I have a matrix with individuals as rows and time points as columns. The value in the matrix is the probability of an event occurring for a subject at each time point.
set.seed(123)
prob_mat <- matrix(round(runif(15), 2), 5, 3,
dimnames = list(paste0('id', 1:5), c(1.2, 2.5, 3.1)))
# 1.2 2.5 3.1
# id1 0.29 0.05 0.96
# id2 0.79 0.53 0.45
# id3 0.41 0.89 0.68
# id4 0.88 0.55 0.57
# id5 0.94 0.46 0.10
I also have a time vector named time_vec
.
time_vec <- c(1.7, 2.9, 4)
I want to estimate the probabilities for each subject at the time points recorded in time_vec
using linear interpolation. For example, the time point 1.7 is between 1.2 and 2.5, with a distance of 0.5 from 1.2 and 0.8 from 2.5, so the interpolated probabilities should be
(prob_mat[, '1.2'] * 0.8 + prob_mat[, '2.5'] * 0.5) / 1.3
# id1 id2 id3 id4 id5
# 0.1976923 0.6900000 0.5946154 0.7530769 0.7553846
Note that the time point 4
is outside the interval [1.2, 3.1]
. We use the values at the closest time, i.e. 3.1
, as its estimates. The expected output is as follows:
1.7 2.9 4
id1 0.1976923 0.6566667 0.96
id2 0.6900000 0.4766667 0.45
id3 0.5946154 0.7500000 0.68
id4 0.7530769 0.5633333 0.57
id5 0.7553846 0.2200000 0.10
I have tried apply()
with approx()
by rows, but the efficiency is low for a large matrix.
We can use approx()
to determine the positions of time_vec
within prob_mat
, and then use the modulo operator (%%
) to obtain the decimal part of the position. This value is exactly the weight required for linear interpolation.
tt <- as.numeric(colnames(prob_mat))
pos <- approx(x = tt, y = seq_along(tt), xout = time_vec, rule = 2)$y
w <- pos %% 1
t(t(prob_mat[, floor(pos)]) * (1-w) + t(prob_mat[, ceiling(pos)]) * w)
# 1.2 2.5 3.1
# id1 0.1976923 0.6566667 0.96
# id2 0.6900000 0.4766667 0.45
# id3 0.5946154 0.7500000 0.68
# id4 0.7530769 0.5633333 0.57
# id5 0.7553846 0.2200000 0.10
Note: rule = 2
is set so that when the estimated time point is outside the interval, the value at the closest data extreme is used.