Search code examples
rsparse-matrix

Explain extractFeatures() from the NMF package in R


I am using the R package NMF to perform non-negative matrix factorization on microarray expression data. I finished the nmf procedure fine, but would like to extract the gene names (features) from the basis matrix. The basis matrix is one of the resulting matrices after nmf with gene names for rows, and metagene numbers (factorization ranks) for columns.

The package has a function for doing this called extractFeatures() that will score the matrix and return the features (gene names) that fit my scoring criteria. Let's say that I had 4 metagene columns (rank = 4) for the basis matrix after running NMF (final NMF object called x). When I run s <- extractFeatures(x) I get an R "list" with 4 vectors containing integers:

> class(s)
[1] "list"

> str(s)
List of 4
 $ : int [1:575] 569 4857 4 51 91 9627 6359 2522 118 163 ...
 $ : int [1:243] 3 1 11834 106 2 52 3855 1103 6 1510 ...
 $ : int [1:37] 11922 11890 11521 11888 11648 11388 9340 11520 9854 11670 ...
 $ : int [1:808] 6123 9125 11918 10432 9674 2109 11802 8372 11746 6996 ...
 - attr(*, "method")= chr "kim"

(for code below, some of the result was removed for brevity)

> s
[[1]]
  [1]   569  4857     4    51 

[[2]]
  [1]     3     1 11834   106     2    52  3855  1103     6  1510    14    49

[[3]]
 [1] 11922 11890 11521

[[4]]
 [1]  6123  9125 11918 10432  9674  2109

QUESTION 1: What are these integers? They are supposed to be "features" (i.e. gene names) from my matrix. Why are they integers and not gene names? Do those integers correspond to my gene names in some way?

QUESTION 2: How to isolate the gene names from each individual vector (within the list s). For example, I want to get the only the gene names for the first metagene (575 features), and then only the gene names for the second metagene (243 features), etc., etc.

Any ideas would be appreciated. Thanks!


Solution

  • I think integers are the index of your genes

    http://nmf.r-forge.r-project.org/scores.html

    extractFeatures returns the selected features as a list of indexes, a single integer vector or an object of the same class as object that only contains the selected features.