Search code examples
rmissing-dataimputation

Error in missing value imputation using imp4p package, impute.slsa function: Error in fast_apply_sd_na_rm_T(xincomplete1, 1) : Not a matrix


For context, I'd like to impute missing values in a proteomic dataset (protein level, not peptide), and I am trying to use the function impute.mix, which requires upstream processing with the impute.slsa function, in the imp4p package.
https://cran.r-project.org/web/packages/imp4p/imp4p.pdf

Experimental design info:
- I have 1 biological replicate
- 4 cells types (biological samples)
- For each of these cell types I have 3 technical replicates

Which gives me a 12 columns of samples and over 3000 rows of observations.

Here is where I run into issues

library(imp4p)

df <- data.frame(
  Cell_1 = c(NA, 8.367031, NA, 7.279088, 5.649025),
  Cell_2 = c(4.660856, 8.450544, 6.984861, NA, NA),
  Cell_3 = c(NA, 7.829102, NA, 8.434507, NA),
  Cell_4 = c(NA, 8.471086, NA, 10.028531, 9.175705),
  Cell_5 = c(5.30285, 9.60319, 8.51769, NA, NA)
)

data <- as.matrix(df)
cdts <- c("MK", "MK", "Plts", "Plts", "RBC")

Tab_imp <- impute.slsa(data, conditions=cdts, repbio=NULL, reptech=NULL, nknn=15, selec="all", weight=1, ind.comp=1, progress.bar=TRUE)

I think the imp4p package is pretty recent, so I have not been able to find errors approaching what I get. Can anyone set me one the right path?

SessionInfo

R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252   
[3] LC_MONETARY=French_France.1252 LC_NUMERIC=C                  
[5] LC_TIME=French_France.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] imp4p_0.8       norm_1.0-9.5    truncnorm_1.0-8 Iso_0.0-18     

loaded via a namespace (and not attached):
[1] compiler_3.5.1 tools_3.5.1    yaml_2.2.0     Rcpp_1.0.3     knitr_1.26     xfun_0.11  

Solution

  • The error is that the second parameter that the function takes is expected to be a factor not a vector, so by simply converting it to factor it should work;

    library(imp4p)
    
    # Create a dataframe
    df <- data.frame(
      Cell_1 = c(NA, 8.367031, NA, 7.279088, 5.649025),
      Cell_2 = c(4.660856, 8.450544, 6.984861, NA, NA),
      Cell_3 = c(NA, 7.829102, NA, 8.434507, NA),
      Cell_4 = c(NA, 8.471086, NA, 10.028531, 9.175705),
      Cell_5 = c(5.30285, 9.60319, 8.51769, NA, NA)
    )
    
    # Convert dataframe into a matrix
    data <- as.matrix(df)
    # Create a factor of of biological conditions
    cdts <- factor(c("MK", "MK", "Plts", "Plts", "RBC"))
    
    # Impute missing values using adaption of LSimpute algorithm.
    Tab_imp <- impute.slsa(data, conditions = cdts, repbio = NULL, reptech = NULL, 
                           nknn = 15, selec = "all", weight = 1, ind.comp = 1, progress.bar = TRUE)