How do I replace NA's in a column with all 0's based on if a value exists in other columns that begin with the same prefix? For example, for column A1, I only want to replace the NA's with a 0 where columns A2 or A3 are NONBLANK. My real data has hundreds of groups of columns.
MY DATA:
ID<-c(1,2,3,4,5,6,7,8)
A1<-c(1,NA,1,NA,1,1,1,NA)
A2<-c(1,NA,NA,1,NA,1,NA,NA)
A3<-c(1,NA,NA,NA,1,NA,NA,NA)
B1<-c(1,1,1,1,1,1,NA,1)
B2<-c(1,1,1,1,NA,1,NA,NA)
B3<-c(1,1,NA,NA,1,NA,NA,NA)
mydata<-cbind.data.frame(ID,A1,A2,A3,B1,B2,B3)
HAVE:
WANTED:
A 0 should replace NA in column A1 if column A2 or A3 have a 1. A 0 should replace NA in column A2 if columns A1 or A3 have a 1, and so on, as below:
A non-refined answer in base R
, but seems to work:
for(i in unique(gsub("\\d","",colnames(mydata)[-1]))){
mydata[apply(mydata[,grepl(i,colnames(mydata))],1,function(x) any(!is.na(x))),grepl(i,colnames(mydata))][is.na(mydata[apply(mydata[,grepl(i,colnames(mydata))],1,function(x) any(!is.na(x))),grepl(i,colnames(mydata))])]<-0
}
Which gives:
ID A1 A2 A3 B1 B2 B3
1 1 1 1 1 1 1 1
2 2 NA NA NA 1 1 1
3 3 1 0 0 1 1 0
4 4 0 1 0 1 1 0
5 5 1 0 1 1 0 1
6 6 1 1 0 1 1 0
7 7 1 0 0 NA NA NA
8 8 NA NA NA 1 0 0
Edit:
The idea is to extract the unique
letters from colnames(mydata)
, A
and B
in the example, by replacing the digits (\\d
) with a blank ""
.
It then loop over these letters to select the columns that start with it. That's what grepl(i,colnames(mydata))
does.
apply
is used to get a vector of the rows in which there is at least (any()
) one non-NA value (!is.na()
) : apply(mydata[,grepl(i,colnames(mydata))],1,function(x) any(!is.na(x)))
.
Everything is then combined in what it essentially df[is.na(df))]<-0
but with df
corresponding to the columns strating with the given letter, and the rows that should have NA
's replaced.
df
would be : mydata[apply(mydata[,grepl(i,colnames(mydata))],1,function(x) any(!is.na(x))),grepl(i,colnames(mydata))]