Is there some kind of inner_join equivalent for 3d arrays and can that be accomplished joining 2d structure to a 3d structure?
Let's see if this makes sense:
I have a 3d array of integers (microbiome count data).
I have a 2d table of metadata.
There is one column in the 2d metadata table (sample names) that matches the labels of the second dimension in the array.
Can I somehow join these two, such that I preserve the array structure and add the metadata for each sample?
Do I have to just melt/stack the array into a super long 2d table?
Thanks for any help!
-edit
Let's say I generate an array a and "a" table "b" with the following code:
a <- array(1:10,c(2,4,3))
b <- data.frame("thing" = c("stuff", "foo", "dodad"), "data" = c(10,20,30), "match" = c("first","second","third"))
dimnames(a) <- list(c("A", "B"), c("one", "two", "three", "four"), c("first", "second", "third"))
As you can see, I have a column "match" in table "b" that I would like to join/match with to the dimension names a[[3]].
So if we look at "a" and "b"
> a
, , first
one two three four
A 1 3 5 7
B 2 4 6 8
, , second
one two three four
A 9 1 3 5
B 10 2 4 6
, , third
one two three four
A 7 9 1 3
B 8 10 2 4
> b
thing data match
1 stuff 10 first
2 foo 20 second
3 dodad 30 third
I would like, for example, the array
, , third
one two three four
A 7 9 1 3
B 8 10 2 4
to have the elements "dodad" and "30" associated to it with the labels "thing" and "data".
For the real data set, I'll want to have "patient name" instead of "thing" and "dilution" instead of "data" and use these elements as a means to pull slices out of the array to run statistical analyses.
You don't show what output you intend, so I'll guess.
If you start with a
(with dims AxBxC) and b
(dims DxE), then you should get an array with dims AxBxD.
a[,,b[,"match"]]
# , , first
# one two three four
# A 1 3 5 7
# B 2 4 6 8
# , , second
# one two three four
# A 9 1 3 5
# B 10 2 4 6
# , , third
# one two three four
# A 7 9 1 3
# B 8 10 2 4
As far as combined output, with the data you've provided it can't happen: matrix a
has the constraint that all data must be the same class, yet your b
is a frame with different classes. So if you need to keep numbers in a
and strings or factors in b
, then you cannot just merge one to the other.
You have some options:
If your second frame really can be a matrix, then we can do this.
### a naive conversion, your case may vary with real data
bnum <- sapply(b, as.integer)
dim(bnum) <- c(dim(bnum), 1)
dimnames(bnum) <- list(rownames(b), colnames(b), NULL)
bnum
# , , 1
# thing data match
# 1 3 10 1
# 2 2 20 2
# 3 1 30 3
### the solution
abind::abind(
apply(bnum[,-3,1], 2:1, rep, times = dim(a)[1]),
a[,,bnum[,"match",1]],
along = 2
)
# , , first
# thing data one two three four
# A 3 10 1 3 5 7
# B 3 10 2 4 6 8
# , , second
# thing data one two three four
# A 2 20 9 1 3 5
# B 2 20 10 2 4 6
# , , third
# thing data one two three four
# A 1 30 7 9 1 3
# B 1 30 8 10 2 4
If you need to keep b
as-is, then you cannot make a 3-d array. An option is to nest each of the layers of a
in a list-column fashion.
out <- within(b, { mtx = lapply(match, function(m) a[,,m]) })
out
# thing data match mtx
# 1 stuff 10 first 1, 2, 3, 4, 5, 6, 7, 8
# 2 foo 20 second 9, 10, 1, 2, 3, 4, 5, 6
# 3 dodad 30 third 7, 8, 9, 10, 1, 2, 3, 4
While that looks like it lost the layout of the z-layer of a
, that's just a poor representation on the console. It's still good:
out$mtx[[1]]
# one two three four
# A 1 3 5 7
# B 2 4 6 8
This can also be done with dplyr
and data.table
, if you're interested.
library(dplyr)
out <- b %>%
mutate(mtx = lapply(match, function(m) a[,,m]))
# option to use purrr::map instead of lapply
library(data.table)
out <- as.data.table(b)[, mtx := lapply(match, function(m) a[,,m]) ]