The function extractFeatures from NMF
package
can select features using the following method only if the features fulfill both following criteria are retained:
score greater than \hat{\mu} + 3 \hat{\sigma}
, where \hat{\mu}
and \hat{\sigma}
are the median and the median absolute deviation (MAD) of the scores respectively;
the maximum contribution to a basis component is greater than the median of all contributions (i.e. of all elements of W).
How can I write this function in R that only applies the first criteria to data matrix?
Kim H and Park H (2007). "Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis." Bioinformatics (Oxford, England), 23(12), pp. 1495-502. ISSN 1460-2059, , .
Given a vector scores
, the condition for each score can be checked as follows:
scores <- rnorm(5)
scores > (median(scores) + 3 * mad(scores))
# [1] FALSE FALSE FALSE FALSE FALSE
where we don't need to look for a function for MAD as mad
from the package stats
does exactly that. Now if you want to select corresponding columns from some matrix M
, you could write simply
M[, scores > (median(scores) + 3 * mad(scores))]
And if you prefer a function for that, then you may use
featureCriterion <- function(M, scores)
M[, scores > (median(scores) + 3 * mad(scores))]