I have a dataset with observations of multiple patients and their diagnoses over time. There are 9 different dummy variables, each representing a specific diagnosis, named e.g. L40, L41, K50, M05 and so on.
Where there are missing values in the dummy variables, I want to carry forward the last non-missing value by patient, so that once a patient receives a diagnosis, it will follow through to subsequent observations.
I started with this, using the na.locf function from the zoo package.
diagdata <- originaldata[,grep("^patient|^ar|^edatum|^K|^L|^M",colnames(originaldata))]
require(zoo)
require(data.table)
diagnosis <- data.table(diagdata)
diagnosis[,L40:=na.locf(L40),by=patient]
This achieves what I am looking for, but only on the column in question (L40). Is there any way of applying the above to all the relevant diagnosis columns, i.e. columns starting with K, L and M?
cols = grep("^K|^L|^M", names(diagnosis), value = T)
diagnosis[, (cols) := na.locf(.SD, na.rm = F), by = patient, .SDcols = cols]
Also take a look at efficiently locf by groups in a single R data.table.