Search code examples
rdata.tablelocf

Last observation carried forward by group over multiple columns


I have a dataset with observations of multiple patients and their diagnoses over time. There are 9 different dummy variables, each representing a specific diagnosis, named e.g. L40, L41, K50, M05 and so on.

Where there are missing values in the dummy variables, I want to carry forward the last non-missing value by patient, so that once a patient receives a diagnosis, it will follow through to subsequent observations.

I started with this, using the na.locf function from the zoo package.

diagdata <- originaldata[,grep("^patient|^ar|^edatum|^K|^L|^M",colnames(originaldata))]

require(zoo)
require(data.table)

diagnosis <- data.table(diagdata)

diagnosis[,L40:=na.locf(L40),by=patient]

This achieves what I am looking for, but only on the column in question (L40). Is there any way of applying the above to all the relevant diagnosis columns, i.e. columns starting with K, L and M?


Solution

  • cols = grep("^K|^L|^M", names(diagnosis), value = T)
    
    diagnosis[, (cols) := na.locf(.SD, na.rm = F), by = patient, .SDcols = cols]
    

    Also take a look at efficiently locf by groups in a single R data.table.