Search code examples
rlistmatrixbioconductor

R - convert each row of a matrix in a character vector and save as named list


I have a matrix containing biological pathways (rows) and corresponding genes (columns). If a gene is present in a pathway the cell contains 1, otherwise 0. See example below:

mat=matrix(c(0,0,1,0,1,1,1,1,1), nrow = 3, ncol = 3)

row.names(mat) = c("pathwayX", "pathwayY", "pathwayZ")

colnames(mat) = c("Gene1", "Gene2", "Gene3")

Gene1 Gene2 Gene3
pathwayX 0 0 1
pathwayY 0 1 1
pathwayZ 1 1 1

What I need is a character vector for each pathway with constituting genes, holded in a list (e. g named gene_sets). In this example this would be:

> gene_sets
$pathwayX
"Gene3"

$pathwayY
"Gene2" "Gene3"

$pathwayZ
"Gene1" "Gene2" "Gene3"

Additionally, I need character vectors describing the pathway name, holded in a list (e. g. named description). In this example this would be:

> description
$pathwayX
"pathwayX"

$pathwayY
"pathwayY"

$pathwayZ
"pathwayZ" 

Background: The vector lists are needed for the package pathfindR with costum input (https://github.com/egeulgen/pathfindR/wiki/Analysis-Using-Custom-Gene-Sets).


Solution

  • Well done giving us a reproducible example. You can use the apply family of functions where lapply gives you a list as output, sapply will try to simplify the result, and apply lets you decide if you want to apply the function over rows or columns of a data.frame using the margin argument. (default is columns if used with lapply or sapply).

    mat <- as.data.frame(mat)
    gene_sets <- apply(mat, 1, function(x) colnames(mat)[x==1])
    description <- lapply(row.names(mat), function(x) x)
    names(description) <- row.names(mat)
    > gene_sets
    $pathwayX
    [1] "Gene3"
    
    $pathwayY
    [1] "Gene2" "Gene3"
    
    $pathwayZ
    [1] "Gene1" "Gene2" "Gene3"
    
    > description
    $pathwayX
    [1] "pathwayX"
    
    $pathwayY
    [1] "pathwayY"
    
    $pathwayZ
    [1] "pathwayZ"
    

    Not sure I follow your logic regarding the description list, but this seems to give your expected result.