For context, I'd like to impute missing values in a proteomic dataset (protein level, not peptide), and I am trying to use the function impute.mix, which requires upstream processing with the impute.slsa function, in the imp4p package.
https://cran.r-project.org/web/packages/imp4p/imp4p.pdf
Experimental design info:
- I have 1 biological replicate
- 4 cells types (biological samples)
- For each of these cell types I have 3 technical replicates
Which gives me a 12 columns of samples and over 3000 rows of observations.
Here is where I run into issues
library(imp4p)
df <- data.frame(
Cell_1 = c(NA, 8.367031, NA, 7.279088, 5.649025),
Cell_2 = c(4.660856, 8.450544, 6.984861, NA, NA),
Cell_3 = c(NA, 7.829102, NA, 8.434507, NA),
Cell_4 = c(NA, 8.471086, NA, 10.028531, 9.175705),
Cell_5 = c(5.30285, 9.60319, 8.51769, NA, NA)
)
data <- as.matrix(df)
cdts <- c("MK", "MK", "Plts", "Plts", "RBC")
Tab_imp <- impute.slsa(data, conditions=cdts, repbio=NULL, reptech=NULL, nknn=15, selec="all", weight=1, ind.comp=1, progress.bar=TRUE)
I think the imp4p package is pretty recent, so I have not been able to find errors approaching what I get. Can anyone set me one the right path?
SessionInfo
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252
[3] LC_MONETARY=French_France.1252 LC_NUMERIC=C
[5] LC_TIME=French_France.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] imp4p_0.8 norm_1.0-9.5 truncnorm_1.0-8 Iso_0.0-18
loaded via a namespace (and not attached):
[1] compiler_3.5.1 tools_3.5.1 yaml_2.2.0 Rcpp_1.0.3 knitr_1.26 xfun_0.11
The error is that the second parameter that the function takes is expected to be a factor not a vector, so by simply converting it to factor it should work;
library(imp4p)
# Create a dataframe
df <- data.frame(
Cell_1 = c(NA, 8.367031, NA, 7.279088, 5.649025),
Cell_2 = c(4.660856, 8.450544, 6.984861, NA, NA),
Cell_3 = c(NA, 7.829102, NA, 8.434507, NA),
Cell_4 = c(NA, 8.471086, NA, 10.028531, 9.175705),
Cell_5 = c(5.30285, 9.60319, 8.51769, NA, NA)
)
# Convert dataframe into a matrix
data <- as.matrix(df)
# Create a factor of of biological conditions
cdts <- factor(c("MK", "MK", "Plts", "Plts", "RBC"))
# Impute missing values using adaption of LSimpute algorithm.
Tab_imp <- impute.slsa(data, conditions = cdts, repbio = NULL, reptech = NULL,
nknn = 15, selec = "all", weight = 1, ind.comp = 1, progress.bar = TRUE)