Search code examples

if() statement with paste0() or grep() in r

I made reproducible minimal example, but my real data is really huge

ac_1 <-c(0.1, 0.3, 0.03, 0.03)
ac_2 <-c(0.2, 0.4, 0.1, 0.008)
ac_3 <-c(0.8, 0.043, 0.7, 0.01)
ac_4 <-c(0.2, 0.73, 0.1, 0.1)
check_1<-c(0.01, 0.902,0.02,0.07)
check_2<-c(0.03, 0.042,0.002,0.00001)
check_3<-c(0.01, 0.02,0.5,0.001)
check_4<-c(0.001, 0.042,0.02,0.2)

df<-data.frame(id,ac_1, ac_2,ac_3,ac_4,c_2,check_1,check_2,check_3,check_4)

so, the dataframe is like this:

> df
  id ac_1  ac_2  ac_3 ac_4 c_2 check_1 check_2 check_3 check_4
1  1 0.10 0.200 0.800 0.20   1   0.010 0.03000   0.010   0.001
2  2 0.30 0.400 0.043 0.73   2   0.902 0.04200   0.020   0.042
3  3 0.03 0.100 0.700 0.10   5   0.020 0.00200   0.500   0.020
4  4 0.03 0.008 0.010 0.10  23   0.070 0.00001   0.001   0.200

and what I want to do is,

if check_1 is 0.02, I will make the corresponding ac_1 to be missing data. if check_2 is 0.02, I will make the corresponding ac_2 to be missing data. I will keep doing this every "check" and "ac"columns

For example, in the check_1 column, the 3th id person have 0.02. so, this person's ac_1 score should be missing data-- 0.03 should be missing data (NA)

In the check_3 column, the 2nd id person have 0.02. so, this person's ac_3 score should be missing data.

In the check_4 column, the 3th id person have 0.02 so, this person's ac_4 score should be missing data.

so. what i did is as follows:

for(i in 1:4){

But, it did not work...


  • You're really close, but you're off on a few fundamentals.

    1. You can't (easily) use strings to refer to objects, so "df$check_1" won't work. You can use strings to refer to column names, but not with $, you need to use [ or [[, so df[["check_1"]] will work.

    2. if isn't vectorized, so it won't work on each value in a column. Use ifelse instead, or even better in this case we can skip the if entirely.

    3. Using == on non-integer numbers is risky due to precision issues. We'll use a tolerance instead.

    4. Minor issue, paste0("df$ac_",i)==NA isn't good, == is for checking equality. You need = or <- for assignment on that line.

    Addressing all of these issues:

    for(i in 1:4){  
        ## rows to replace
        abs(df[[paste0("check_", i)]] - 0.02) < 1e-10,
        ## column to replace
        paste0("ac_", i)
      ] <- NA
    #   id ac_1  ac_2 ac_3 ac_4 c_2 check_1 check_2 check_3 check_4
    # 1  1 0.10 0.200 0.80 0.20   1   0.010 0.03000   0.010   0.001
    # 2  2 0.30 0.400   NA 0.73   2   0.902 0.04200   0.020   0.042
    # 3  3   NA 0.100 0.70   NA   5   0.020 0.00200   0.500   0.020
    # 4  4 0.03 0.008 0.01 0.10  23   0.070 0.00001   0.001   0.200