I have two data sets (data2012 and data2013) that share some identically named rows (ID) and columns (individual characteristics, numerical and metric). Data2012 has values in rows where data2013 has NAs and vice versa.
I want to replace the NAs in data2012 with the respecitve values from data2013 if the row name (ID) and the column name (individual characteristics) are the same. I only want to replace data2012 with data2013 if data2012 has NAs and data2013 has a value.
(Since the two data sets have over 200 identically named columns I cannot list every column seperately in the code. Also, the columns do not come in the same order. I need a code that takes advantage of the fact that the columns and rows - where values need to be replaced - have the exact same name.)
My ideal end-product: A data set that for every ID (row) has some characteristics (column) from data2012, some from data2013 and some NAs remaining (if both datasets did not have a respective value).
data2012
x2 x4 x5 x6 x7 x9
id1 NA NA 2 NA 4 NA
id2 1 NA NA NA NA 1
id5 NA 3 5 NA NA NA
id6 NA NA NA NA NA NA
data2013 (has extra information in id2 and id6)
x2 x4 x5 x7
id2 NA NA 4 NA
id3 1 NA 2 NA
id4 NA 9 NA 5
id6 1 NA 2 8
ideal end-product
x2 x4 x5 x6 x7 x9
id1 NA NA 2 NA 4 NA
id2 1 NA 4 NA NA 1
id5 NA 3 5 NA NA NA
id6 1 NA 2 NA 8 NA
I hope you guys can help me. Thanks!
We can use match
to get the column/row indexes of the corresponding datasets, subset the datasets and then assign the values of one to another
j1 <- match(colnames(data2012), colnames(data2013), nomatch=0)
j2 <- match(colnames(data2013), colnames(data2012), nomatch=0)
i1 <- match(rownames(data2012), rownames(data2013), nomatch=0)
i2 <- match(rownames(data2013), rownames(data2012), nomatch=0)
m1 <- do.call(pmax, c(list(as.matrix(data2012[i2,j2]),
as.matrix(data2013[i1,j1])), list(na.rm=TRUE)))
data2012[i2,j2] <- m1
data2012
# x2 x4 x5 x6 x7 x9
#id1 NA NA 2 NA 4 NA
#id2 1 NA 4 NA NA 1
#id5 NA 3 5 NA NA NA
#id6 1 NA 2 NA 8 NA
data2012 <- structure(list(x2 = c(NA, 1L, NA, NA),
x4 = c(NA, NA, 3L, NA),
x5 = c(2L, NA, 5L, NA), x6 = c(NA, NA, NA, NA), x7 = c(4L,
NA, NA, NA), x9 = c(NA, 1L, NA, NA)), .Names = c("x2", "x4",
"x5", "x6", "x7", "x9"), class = "data.frame",
row.names = c("id1", "id2", "id5", "id6"))
data2013 <- structure(list(x2 = c(NA, 1L, NA, 1L),
x4 = c(NA, NA, 9L, NA),
x5 = c(4L, 2L, NA, 2L), x7 = c(NA, NA, 5L, 8L)), .Names = c("x2",
"x4", "x5", "x7"), class = "data.frame",
row.names = c("id2", "id3", "id4", "id6"))