Search code examples
rfunctionapplynmf

Customized KIM feauture selection function


The function extractFeatures from NMF package can select features using the following method only if the features fulfill both following criteria are retained:

score greater than \hat{\mu} + 3 \hat{\sigma}, where \hat{\mu} and \hat{\sigma} are the median and the median absolute deviation (MAD) of the scores respectively;

the maximum contribution to a basis component is greater than the median of all contributions (i.e. of all elements of W).

How can I write this function in R that only applies the first criteria to data matrix?

Kim H and Park H (2007). "Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis." Bioinformatics (Oxford, England), 23(12), pp. 1495-502. ISSN 1460-2059, , .


Solution

  • Given a vector scores, the condition for each score can be checked as follows:

    scores <- rnorm(5)
    scores > (median(scores) + 3 * mad(scores))
    # [1] FALSE FALSE FALSE FALSE FALSE
    

    where we don't need to look for a function for MAD as mad from the package stats does exactly that. Now if you want to select corresponding columns from some matrix M, you could write simply

    M[, scores > (median(scores) + 3 * mad(scores))]
    

    And if you prefer a function for that, then you may use

    featureCriterion <- function(M, scores)
      M[, scores > (median(scores) + 3 * mad(scores))]