Search code examples
rnaeconomics

replace NAs in columns with values of new data set (that has identically named columns) in R


I have two data sets (data2012 and data2013) that share some identically named rows (ID) and columns (individual characteristics, numerical and metric). Data2012 has values in rows where data2013 has NAs and vice versa.

I want to replace the NAs in data2012 with the respecitve values from data2013 if the row name (ID) and the column name (individual characteristics) are the same. I only want to replace data2012 with data2013 if data2012 has NAs and data2013 has a value.

(Since the two data sets have over 200 identically named columns I cannot list every column seperately in the code. Also, the columns do not come in the same order. I need a code that takes advantage of the fact that the columns and rows - where values need to be replaced - have the exact same name.)

My ideal end-product: A data set that for every ID (row) has some characteristics (column) from data2012, some from data2013 and some NAs remaining (if both datasets did not have a respective value).

data2012

     x2  x4  x5  x6  x7  x9
id1  NA  NA  2   NA  4   NA
id2  1   NA  NA  NA  NA  1
id5  NA  3   5   NA  NA  NA
id6  NA  NA  NA  NA  NA  NA

data2013 (has extra information in id2 and id6)

     x2  x4  x5  x7 
id2  NA  NA  4  NA  
id3  1   NA  2   NA
id4  NA  9   NA  5
id6  1   NA  2   8

ideal end-product

     x2  x4  x5  x6  x7  x9
id1  NA  NA  2   NA  4   NA
id2  1   NA  4   NA  NA  1
id5  NA  3   5   NA  NA  NA
id6  1   NA  2   NA  8   NA

I hope you guys can help me. Thanks!


Solution

  • We can use match to get the column/row indexes of the corresponding datasets, subset the datasets and then assign the values of one to another

    j1 <- match(colnames(data2012), colnames(data2013), nomatch=0)
    j2 <- match(colnames(data2013), colnames(data2012), nomatch=0)
    i1 <- match(rownames(data2012), rownames(data2013), nomatch=0)
    i2 <- match(rownames(data2013), rownames(data2012), nomatch=0)
    
    m1 <- do.call(pmax, c(list(as.matrix(data2012[i2,j2]), 
            as.matrix(data2013[i1,j1])), list(na.rm=TRUE)))
    data2012[i2,j2] <- m1
    data2012
    #    x2 x4 x5 x6 x7 x9
    #id1 NA NA  2 NA  4 NA
    #id2  1 NA  4 NA NA  1
    #id5 NA  3  5 NA NA NA
    #id6  1 NA  2 NA  8 NA
    

    data

    data2012 <- structure(list(x2 = c(NA, 1L, NA, NA), 
    x4 = c(NA, NA, 3L, NA), 
    x5 = c(2L, NA, 5L, NA), x6 = c(NA, NA, NA, NA), x7 = c(4L, 
     NA, NA, NA), x9 = c(NA, 1L, NA, NA)), .Names = c("x2", "x4", 
    "x5", "x6", "x7", "x9"), class = "data.frame",
     row.names = c("id1", "id2", "id5", "id6"))
    
    data2013 <- structure(list(x2 = c(NA, 1L, NA, 1L),
     x4 = c(NA, NA, 9L, NA), 
    x5 = c(4L, 2L, NA, 2L), x7 = c(NA, NA, 5L, 8L)), .Names = c("x2", 
    "x4", "x5", "x7"), class = "data.frame", 
    row.names = c("id2", "id3", "id4", "id6"))