Search code examples
rdata.tablecopyrow

copy data from one row to a new row in R data.table


I have a table like follows:

library(data.table)
dt <- data.table(t0.id=1:2,t0.V1=c("a","e"),t0.V2=c("b","f"),t1.id=3:4,t1.V1=c("c","g"),t1.V2=c("d","h"))

dt
   t0.id t0.V1 t0.V2 t1.id t1.V1 t1.V2
1:     1     a     b     3     c     d
2:     2     e     f     4     g     h

and I want to copy data from row one to a new row, like this:

   t0.id t0.V1 t0.V2 t1.id t1.V1 t1.V2
1:     1     a     b                  
2:                       3     c     d
3:     2     e     f     4     g     h

I know how to duplicate the row (I've seen this post), but I don't know how to clear columns by condition (eg. t0.id==1) since both rows would be equal.

I guess it could be done by row index, but my real table has thousands of rows and I don't think that's the best way to do it.

Thank you

EDIT:

  • The final order of rows doesn't matter, that is, final rows no 1 and 2 don't need to be next to each other.
  • I 'manually' (by looking at some variables) identify which rows need to be split. So, the only condition to be applied is based on 't0.id'.

Solution

  • library(data.table)
    splitids <- 1L # t0.id
    out <- rbindlist(list(
        dt[t0.id %in% splitids, .SD, .SDcols = patterns("^t0")],
        dt[t0.id %in% splitids, .SD, .SDcols = patterns("^t1")],
        dt[!t0.id %in% splitids,]),
      use.names = TRUE, fill = TRUE)
    out
    #    t0.id  t0.V1  t0.V2 t1.id  t1.V1  t1.V2
    #    <int> <char> <char> <int> <char> <char>
    # 1:     1      a      b    NA   <NA>   <NA>
    # 2:    NA   <NA>   <NA>     3      c      d
    # 3:     2      e      f     4      g      h
    

    It may make more sense if you look at each of them:

    dt[t0.id %in% splitids, .SD, .SDcols = patterns("^t0")]
    #    t0.id  t0.V1  t0.V2
    #    <int> <char> <char>
    # 1:     1      a      b
    dt[t0.id %in% splitids, .SD, .SDcols = patterns("^t1")]
    #    t1.id  t1.V1  t1.V2
    #    <int> <char> <char>
    # 1:     3      c      d
    dt[!t0.id %in% splitids,]
    #    t0.id  t0.V1  t0.V2 t1.id  t1.V1  t1.V2
    #    <int> <char> <char> <int> <char> <char>
    # 1:     2      e      f     4      g      h
    

    If you need the blank "" instead of NA, then that can be done to your character columns, but not to the t*.id columns since that would convert them from integer to character.

    ischr <- which(sapply(dt, inherits, "character"))
    ischr
    # t0.V1 t0.V2 t1.V1 t1.V2 
    #     2     3     5     6 
    out[, (ischr) := lapply(.SD, fcoalesce, ""), .SDcols = ischr][]
    #    t0.id  t0.V1  t0.V2 t1.id  t1.V1  t1.V2
    #    <int> <char> <char> <int> <char> <char>
    # 1:     1      a      b    NA              
    # 2:    NA                   3      c      d
    # 3:     2      e      f     4      g      h