Search code examples
riranges

Include list of IRanges as column in a data.frame


I have some data structured a bit like this:

x01 <- c("94633X94644Y95423X96130", "124240X124494Y124571X124714", "135654X135660Y136226X136786")

That I end up using later as an IRanges object through some steps that look like:

x02 <- sapply(x01,
              function(x) do.call(rbind,
                                  strsplit(strsplit(x,
                                                    split = "Y",
                                                    fixed = TRUE)[[1]],
                                           split = "X",
                                           fixed = TRUE)),
              simplify = FALSE,
              USE.NAMES = FALSE)

x03 <- sapply(x02,
              function(x) IRanges(start = as.integer(x[, 1L]),
                                  end = as.integer(x[, 2L])),
              simplify = FALSE,
              USE.NAMES = FALSE)

> x03
[[1]]
IRanges object with 2 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]     94633     94644        12
  [2]     95423     96130       708

[[2]]
IRanges object with 2 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]    124240    124494       255
  [2]    124571    124714       144

[[3]]
IRanges object with 2 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]    135654    135660         7
  [2]    136226    136786       561

Now I would like to be able to store x03 as a column in a data.frame with some associated information with something simple like:

> x04 <- data.frame("col1" = 1:3,
                    "col2" = x01,
                    "col3" = x03)

This unsurprisingly tells me that I have a differing number of rows, however, I feel like i've seen JSON imports into R mimic the kind of structure I want, where a ragged list inhabits the column of a data.frame. Is this a possible operation?


Solution

  • It's a very good question, I have seen it before with other dataframe like objects, but I think the above does not work because as long as there is an as.data.frame that can be used onto the matrix, or IRanges, it will mess up the dimensions and not embed it (I might be very well wrong).

    One option is to use a tibble:

    x04 = tibble::tibble(x01=x01,x02=x02,x03=x03)
    # A tibble: 3 x 3
      a                           b                 c        
      <chr>                       <list>            <list>   
    1 94633X94644Y95423X96130     <chr[,2] [2 x 2]> <IRanges>
    2 124240X124494Y124571X124714 <chr[,2] [2 x 2]> <IRanges>
    3 135654X135660Y136226X136786 <chr[,2] [2 x 2]> <IRanges>
    
    x04$x03
    [[1]]
    IRanges object with 2 ranges and 0 metadata columns:
              start       end     width
          <integer> <integer> <integer>
      [1]     94633     94644        12
      [2]     95423     96130       708
    
    [[2]]
    IRanges object with 2 ranges and 0 metadata columns:
              start       end     width
          <integer> <integer> <integer>
      [1]    124240    124494       255
      [2]    124571    124714       144
    
    [[3]]
    IRanges object with 2 ranges and 0 metadata columns:
              start       end     width
          <integer> <integer> <integer>
      [1]    135654    135660         7
      [2]    136226    136786       561
    

    Another option:

    library(S4Vectors)
    DataFrame(x01=x01,x02=List(x02),x03=IRangesList(x03))
                              x01                             x02
                      <character>                          <List>
    1     94633X94644Y95423X96130     94633:94644,95423:96130,...
    2 124240X124494Y124571X124714 124240:124494,124571:124714,...
    3 135654X135660Y136226X136786 135654:135660,136226:136786,...
                              x03
                    <IRangesList>
    1     94633-94644,95423-96130
    2 124240-124494,124571-124714
    3 135654-135660,136226-136786