Search code examples
rdataframeif-statementdata-manipulationzero

R: adding in rows of zero based on the values in multiple columns


I am trying to append rows to an R data.frame. Here is an example of a data.frame "foo":

A   B   C   D
1   1   1   200
1   1   2   50
1   1   3   15
1   2   1   150
1   2   4   50
1   3   1   300
2   1   2   40
2   1   4   90
2   3   2   80

For every A, there are 3 possible values of B, and for every B, there are 4 possible values of C. However, the initial df only contains non-zero values of D. I'd like to manipulate the df so that zeros are included for both B and C. Thus, the df would show 0's in D for any value of B/C that was 0. I have seen questions that address this with one column, but couldn't find a question addressing it with multiple columns. The final df would look like this:

A   B   C   D
1   1   1   200
1   1   2   50
1   1   3   15
1   1   4   0
1   2   1   150
1   2   2   0
1   2   3   0
1   2   4   50
1   3   1   300
1   3   2   0
1   3   3   0
1   3   4   0
2   1   1   0
2   1   2   40
2   1   3   0
2   1   4   90
2   2   1   0
2   2   2   0
2   2   3   0
2   2   4   0
2   3   1   0
2   3   2   80
2   3   3   0
2   3   4   0

I first tried creating a dummy data frame that then merged with the initial df, but something isn't working right. Here's the current code, which I know is wrong because this code only generates rows based on A. I think I want to make the dummy frame based on A and B but I don't know how - could an if/else function work here?:

# create dummy df
dummy <- as.data.frame(
  cbind(
    sort(rep(unique(foo$A), 12)),
    rep(1:3,length(unique(foo$A)))))
colnames(dummy) <- c("A","B")
foo$A <- as.numeric(foo$A)
foo$B <- as.numeric(foo$C)

# merge with foo
mergedummy <- merge(dummy,foo,all.x=T)

Any insight is greatly appreciated - thanks!


Solution

  • A one liner:

    merge(dat, data.frame(table(dat[1:3]))[-4],all.y=TRUE)
    
    #   A B C   D
    #1  1 1 1 200
    #2  1 1 2  50
    #3  1 1 3  15
    #4  1 1 4  NA
    #...
    

    Or maybe less complicated:

    out <- data.frame(xtabs(D ~ ., data=dat))
    out[do.call(order,out[1:3]),]
    
    #   A B C Freq
    #1  1 1 1  200
    #7  1 1 2   50
    #13 1 1 3   15
    #19 1 1 4    0
    #...
    

    Where dat is:

    dat <- structure(list(A = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), B = c(1L, 
    1L, 1L, 2L, 2L, 3L, 1L, 1L, 3L), C = c(1L, 2L, 3L, 1L, 4L, 1L, 
    2L, 4L, 2L), D = c(200L, 50L, 15L, 150L, 50L, 300L, 40L, 90L, 
    80L)), .Names = c("A", "B", "C", "D"), class = "data.frame", row.names = c(NA, 
    -9L))