Search code examples
rmissing-data

Imputing missing values linearly in R


I have a data frame with missing values:

X   Y   Z
54  57  57
100 58  58
NA  NA  NA
NA  NA  NA
NA  NA  NA
60  62  56
NA  NA  NA
NA  NA  NA
69  62  62

I want to impute the NA values linearly from the known values so that the dataframe looks:

X   Y    Z
54  57  57
100 58  58
90  59  57.5
80  60  57
70  61  56.5
60  62  56
63  62  58
66  62  60
69  60  62

thanks


Solution

  • Base R's approxfun() returns a function that will linearly interpolate the data it is handed.

    ## Make easily reproducible data
    df <- read.table(text="X   Y   Z
    54  57  57
    100 58  58
    NA  NA  NA
    NA  NA  NA
    NA  NA  NA
    60  62  56
    NA  NA  NA
    NA  NA  NA
    69  62  62", header=T)
    
    ## See how this works on a single vector
    approxfun(1:9, df$X)(1:9)
    # [1]  54 100  90  80  70  60  63  66  69
    
    ## Apply interpolation to each of the data.frame's columns
    data.frame(lapply(df, function(X) approxfun(seq_along(X), X)(seq_along(X))))
    #     X  Y    Z
    # 1  54 57 57.0
    # 2 100 58 58.0
    # 3  90 59 57.5
    # 4  80 60 57.0
    # 5  70 61 56.5
    # 6  60 62 56.0
    # 7  63 62 58.0
    # 8  66 62 60.0
    # 9  69 62 62.0