I have a dataset of peoples blood results in R:
ID Result date
A1 80 01/01/2006
A1 70 01/01/2009
A1 61 01/01/2010
A1 30 01/01/2018
A1 28 01/01/2022
B2 40 01/01/2006
B2 30 01/01/2012
B2 25 01/01/2015
B2 10 01/01/2020
G3 28 01/01/2009
G3 27 01/01/2014
G3 25 01/01/2013
G3 24 01/01/2011
G3 22 01/01/2019
I would like to plot these and then do a linear regression and find the predicted date at which their blood result would hit 15 for each person. In reality there are multiple blood tests per year so I would try and get the median result per year and then plot those and get the predicted year they hit 15.
I can do the median per year and use the lm function along the lines of:
filtered$date_of_bloods <-format(filtered$date_of_bloods,format="%Y")
#split into individual ID groups
a <- with(filtered, split(filtered, list(ID)))
#aggregate median results per year
medianfunc <- function(y) {aggregate(results ~ date_of_bloods, data = y, median)}
medians <- sapply(a, medianfunc)
# do lm per ID cohort and get slope of lines
g<- as.data.frame(medians)
coefLM <- function(x) {coef(lm(date_of_bloods ~ results, data = x))}
coefs<- sapply(g, coefLM)
But I am unsure who to extract and get the prediction for when the slope of the line would hit 15.
Any help would be much appreciated
Here's a base R answer:
filtered <- read.table(text = "ID results date_of_bloods
A1 80 01/01/2006
A1 70 01/01/2009
A1 61 01/01/2010
A1 30 01/01/2018
A1 28 01/01/2022
B2 40 01/01/2006
B2 30 01/01/2012
B2 25 01/01/2015
B2 10 01/01/2020
G3 28 01/01/2009
G3 27 01/01/2014
G3 25 01/01/2013
G3 24 01/01/2011
G3 22 01/01/2019", header = TRUE)
filtered$date_of_bloods <- as.Date(filtered$date_of_bloods, '%m/%d/%Y')
filtered$date_of_bloods <- format(filtered$date_of_bloods,format="%Y") |>
as.integer()
#split into individual ID groups
a <- with(filtered, split(filtered, list(ID)))
#aggregate median results per year
medianfunc <- function(y) {aggregate(results ~ date_of_bloods, data = y, median)}
medians <- sapply(a, medianfunc)
# do lm per ID cohort and get slope of lines
g <- as.data.frame(medians)
coefLM <- function(x) {coef(lm(results ~ date_of_bloods, data = x))}
coefs<- sapply(g, coefLM)
coefs <-
rbind(coefs,
year_for_15 = (15 - coefs["(Intercept)", ]) / coefs["date_of_bloods", ])
Giving:
coefs
A1 B2 G3
(Intercept) 6998.650 4263.381995 953.8239437
date_of_bloods -3.450 -2.104623 -0.4612676
year_for_15 2024.246 2018.595376 2035.3129771
I changed your code for the year to make it an integer, that way the coefLM
function can work in the normal dependent variable ~ independent variable order.
Then it's just y = mx + b implies x0 = (15 - b) / m.