Search code examples
rreplacenaprefix

How to replace blanks for groups of columns based on column group prefix?


How do I replace NA's in a column with all 0's based on if a value exists in other columns that begin with the same prefix? For example, for column A1, I only want to replace the NA's with a 0 where columns A2 or A3 are NONBLANK. My real data has hundreds of groups of columns.

MY DATA:

ID<-c(1,2,3,4,5,6,7,8)
A1<-c(1,NA,1,NA,1,1,1,NA)
A2<-c(1,NA,NA,1,NA,1,NA,NA)
A3<-c(1,NA,NA,NA,1,NA,NA,NA)
B1<-c(1,1,1,1,1,1,NA,1)
B2<-c(1,1,1,1,NA,1,NA,NA)
B3<-c(1,1,NA,NA,1,NA,NA,NA)

mydata<-cbind.data.frame(ID,A1,A2,A3,B1,B2,B3)

HAVE:

enter image description here

WANTED:

A 0 should replace NA in column A1 if column A2 or A3 have a 1. A 0 should replace NA in column A2 if columns A1 or A3 have a 1, and so on, as below: enter image description here


Solution

  • A non-refined answer in base R, but seems to work:

    for(i in unique(gsub("\\d","",colnames(mydata)[-1]))){
      mydata[apply(mydata[,grepl(i,colnames(mydata))],1,function(x) any(!is.na(x))),grepl(i,colnames(mydata))][is.na(mydata[apply(mydata[,grepl(i,colnames(mydata))],1,function(x) any(!is.na(x))),grepl(i,colnames(mydata))])]<-0
    }
    

    Which gives:

      ID A1 A2 A3 B1 B2 B3
    1  1  1  1  1  1  1  1
    2  2 NA NA NA  1  1  1
    3  3  1  0  0  1  1  0
    4  4  0  1  0  1  1  0
    5  5  1  0  1  1  0  1
    6  6  1  1  0  1  1  0
    7  7  1  0  0 NA NA NA
    8  8 NA NA NA  1  0  0
    

    Edit:
    The idea is to extract the unique letters from colnames(mydata), A and B in the example, by replacing the digits (\\d) with a blank "".
    It then loop over these letters to select the columns that start with it. That's what grepl(i,colnames(mydata)) does.
    apply is used to get a vector of the rows in which there is at least (any()) one non-NA value (!is.na()) : apply(mydata[,grepl(i,colnames(mydata))],1,function(x) any(!is.na(x))).
    Everything is then combined in what it essentially df[is.na(df))]<-0 but with df corresponding to the columns strating with the given letter, and the rows that should have NA's replaced.
    df would be : mydata[apply(mydata[,grepl(i,colnames(mydata))],1,function(x) any(!is.na(x))),grepl(i,colnames(mydata))]