I am trying to append rows to an R data.frame. Here is an example of a data.frame "foo":
A B C D
1 1 1 200
1 1 2 50
1 1 3 15
1 2 1 150
1 2 4 50
1 3 1 300
2 1 2 40
2 1 4 90
2 3 2 80
For every A, there are 3 possible values of B, and for every B, there are 4 possible values of C. However, the initial df only contains non-zero values of D. I'd like to manipulate the df so that zeros are included for both B and C. Thus, the df would show 0's in D for any value of B/C that was 0. I have seen questions that address this with one column, but couldn't find a question addressing it with multiple columns. The final df would look like this:
A B C D
1 1 1 200
1 1 2 50
1 1 3 15
1 1 4 0
1 2 1 150
1 2 2 0
1 2 3 0
1 2 4 50
1 3 1 300
1 3 2 0
1 3 3 0
1 3 4 0
2 1 1 0
2 1 2 40
2 1 3 0
2 1 4 90
2 2 1 0
2 2 2 0
2 2 3 0
2 2 4 0
2 3 1 0
2 3 2 80
2 3 3 0
2 3 4 0
I first tried creating a dummy data frame that then merged with the initial df, but something isn't working right. Here's the current code, which I know is wrong because this code only generates rows based on A. I think I want to make the dummy frame based on A and B but I don't know how - could an if/else function work here?:
# create dummy df
dummy <- as.data.frame(
cbind(
sort(rep(unique(foo$A), 12)),
rep(1:3,length(unique(foo$A)))))
colnames(dummy) <- c("A","B")
foo$A <- as.numeric(foo$A)
foo$B <- as.numeric(foo$C)
# merge with foo
mergedummy <- merge(dummy,foo,all.x=T)
Any insight is greatly appreciated - thanks!
A one liner:
merge(dat, data.frame(table(dat[1:3]))[-4],all.y=TRUE)
# A B C D
#1 1 1 1 200
#2 1 1 2 50
#3 1 1 3 15
#4 1 1 4 NA
#...
Or maybe less complicated:
out <- data.frame(xtabs(D ~ ., data=dat))
out[do.call(order,out[1:3]),]
# A B C Freq
#1 1 1 1 200
#7 1 1 2 50
#13 1 1 3 15
#19 1 1 4 0
#...
Where dat
is:
dat <- structure(list(A = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), B = c(1L,
1L, 1L, 2L, 2L, 3L, 1L, 1L, 3L), C = c(1L, 2L, 3L, 1L, 4L, 1L,
2L, 4L, 2L), D = c(200L, 50L, 15L, 150L, 50L, 300L, 40L, 90L,
80L)), .Names = c("A", "B", "C", "D"), class = "data.frame", row.names = c(NA,
-9L))