I have a list of files that contain specific genes, and I want to create a binary relation matrix in R that shows the presence of each gene in each file.
For example, here are my files aaa
, bbb
, ccc
, and ddd
and the genes associated to them.
aaa=c("HERC1")
bbb=c("MYO9A", "PKHD1L1", "PQLC2", "SLC7A2")
ccc=c("HERC1")
ddd=c("MACC1","PKHD1L1")
I would like to know which command I could use in R to generate a binary relation table like the one in the following image:
where the value 1 means association, and the value 0 means non-association.
How can I do this operation in R?
I tried to use table(aaa,bbb,ccc,ddd)
but it did not work. R said:
Error in table(aaa, bbb, ccc, ddd) : all arguments must have the same length
EDIT: Thanks @akrun for your useful reply! I'll take advantage of this question to ask help for another issue, that I'm sure you guys can handle very quickly. For the second part of my analysis, I need to generate another table that where, for each pair of genes, I assign the value 1 if both of them present in the specific file, and 0 other wise. Following the example that I gave earlier, this new table should look like the following one (I transpose it for clarify):
Does anybody know a quick way to obtain this new bigenic table in R, starting from the commands you guys already provided to me? Thanks!
An option would be to get the values of the object identifiers in a named list
(mget
), stack
it to a two column data.frame and get the frequency with table
table(stack( mget(strrep(letters[1:4], 3)))[2:1])
# values
#ind HERC1 MACC1 MYO9A PKHD1L1 PQLC2 SLC7A2
# aaa 1 0 0 0 0 0
# bbb 0 0 1 1 1 1
# ccc 1 0 0 0 0 0
# ddd 0 1 0 1 0 0
Or an option with tidyverse
library(tidyverse)
lst(aaa, bbb, ccc, ddd) %>%
enframe %>%
unnest %>%
count(name, value) %>%
spread(value, n, fill = 0)
# A tibble: 4 x 7
# name HERC1 MACC1 MYO9A PKHD1L1 PQLC2 SLC7A2
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 aaa 1 0 0 0 0 0
#2 bbb 0 0 1 1 1 1
#3 ccc 1 0 0 0 0 0
#4 ddd 0 1 0 1 0 0
In the OP's code
table(aaa,bbb,ccc,ddd)
the length
of the vector
s need to be same for table
to work. In addition, if we use more than 2 vectors, the frequency table will be multi-dimensional (> 2D). So, we need a framework to have the table
applied on two columns instead of multiple objects