Search code examples
rarraysinner-joinmetadata

Is there an inner_join equivalent for multidimensional arrays in R?


Is there some kind of inner_join equivalent for 3d arrays and can that be accomplished joining 2d structure to a 3d structure?

Let's see if this makes sense:

I have a 3d array of integers (microbiome count data).

  • Dimension 1: replicates 1:100
  • Dimension 2: Samples
  • Dimension 3: Taxa

I have a 2d table of metadata.

  • Dimension 1: Samples
  • Dimension 2: metadata type (dilution, sample date, etc)

There is one column in the 2d metadata table (sample names) that matches the labels of the second dimension in the array.

Can I somehow join these two, such that I preserve the array structure and add the metadata for each sample?

Do I have to just melt/stack the array into a super long 2d table?

Thanks for any help!

-edit

Let's say I generate an array a and "a" table "b" with the following code:

a <- array(1:10,c(2,4,3))
b <- data.frame("thing" = c("stuff", "foo", "dodad"), "data" = c(10,20,30), "match" = c("first","second","third"))
dimnames(a) <- list(c("A", "B"), c("one", "two", "three", "four"), c("first", "second", "third"))

As you can see, I have a column "match" in table "b" that I would like to join/match with to the dimension names a[[3]].

So if we look at "a" and "b"

> a
, , first

  one two three four
A   1   3     5    7
B   2   4     6    8

, , second

  one two three four
A   9   1     3    5
B  10   2     4    6

, , third

  one two three four
A   7   9     1    3
B   8  10     2    4

> b
  thing data  match
1 stuff   10  first
2   foo   20 second
3 dodad   30  third

I would like, for example, the array

, , third

      one two three four
    A   7   9     1    3
    B   8  10     2    4

to have the elements "dodad" and "30" associated to it with the labels "thing" and "data".

For the real data set, I'll want to have "patient name" instead of "thing" and "dilution" instead of "data" and use these elements as a means to pull slices out of the array to run statistical analyses.


Solution

  • You don't show what output you intend, so I'll guess.

    If you start with a (with dims AxBxC) and b (dims DxE), then you should get an array with dims AxBxD.

    a[,,b[,"match"]]
    # , , first
    #   one two three four
    # A   1   3     5    7
    # B   2   4     6    8
    # , , second
    #   one two three four
    # A   9   1     3    5
    # B  10   2     4    6
    # , , third
    #   one two three four
    # A   7   9     1    3
    # B   8  10     2    4
    

    As far as combined output, with the data you've provided it can't happen: matrix a has the constraint that all data must be the same class, yet your b is a frame with different classes. So if you need to keep numbers in a and strings or factors in b, then you cannot just merge one to the other.

    You have some options:

    1. If your second frame really can be a matrix, then we can do this.

      ### a naive conversion, your case may vary with real data
      bnum <- sapply(b, as.integer)
      dim(bnum) <- c(dim(bnum), 1)
      dimnames(bnum) <- list(rownames(b), colnames(b), NULL)
      bnum
      # , , 1
      #   thing data match
      # 1     3   10     1
      # 2     2   20     2
      # 3     1   30     3
      
      ### the solution
      abind::abind(
        apply(bnum[,-3,1], 2:1, rep, times = dim(a)[1]),
        a[,,bnum[,"match",1]],
        along = 2
      )
      # , , first
      #   thing data one two three four
      # A     3   10   1   3     5    7
      # B     3   10   2   4     6    8
      # , , second
      #   thing data one two three four
      # A     2   20   9   1     3    5
      # B     2   20  10   2     4    6
      # , , third
      #   thing data one two three four
      # A     1   30   7   9     1    3
      # B     1   30   8  10     2    4
      
    2. If you need to keep b as-is, then you cannot make a 3-d array. An option is to nest each of the layers of a in a list-column fashion.

      out <- within(b, { mtx = lapply(match, function(m) a[,,m]) })
      out
      #   thing data  match                     mtx
      # 1 stuff   10  first  1, 2, 3, 4, 5, 6, 7, 8
      # 2   foo   20 second 9, 10, 1, 2, 3, 4, 5, 6
      # 3 dodad   30  third 7, 8, 9, 10, 1, 2, 3, 4
      

      While that looks like it lost the layout of the z-layer of a, that's just a poor representation on the console. It's still good:

      out$mtx[[1]]
      #   one two three four
      # A   1   3     5    7
      # B   2   4     6    8
      

      This can also be done with dplyr and data.table, if you're interested.

      library(dplyr)
      out <- b %>%
        mutate(mtx = lapply(match, function(m) a[,,m]))
      # option to use purrr::map instead of lapply
      
      library(data.table)
      out <- as.data.table(b)[, mtx := lapply(match, function(m) a[,,m]) ]