Search code examples
r

Using grepl to filter columns names in specific range of columns


I have a df that contain colnames like this:

[1] "lab_id"               "weeks"                "group"      
[4] "level"                "id_row"               "id"                  
[7] "number"               "tube"                 "dp"              
[10] "time"                "label"                "age"                 
[13] "gender"              "wtspike_ab1"         "wtrbd_ab1"          
[16] "wts1_ab1"            "wts2_ab1"             "wtntd_ab2"          
[19] "wtn_ab2"             "alphaspike_ab2"       "alpharbd_ab2"       
[22] "betaspike_ab2"       "betarbd_ab2"          "gammaspike_ab2"     

My goal is filter this df by column names, using the following approach:

test1 <- test[, test <- grepl('wtspike|wtrbd|wts1|wts2|wtntd|wtn', colnames(test))]

This works filtering the colnames that not match the pattern, which is fine. However, is taking off the metadata columns 1:14, so I tried to use something like this to avoid perform the operation in the columns 1:14.

filtered_test <- test[, 1:14(test) & grepl('wtspike|wtrbd|wts1|wts2|wtntd|wtn', colnames(test))]

Which gives me the following error:

Error: attempt to apply non-function

My question is related to what alternative can I use to perform the same operation without take the columns 1:14.

  • Is it a better alternative to perform the operation?
  • What is wrong in my line that give me that error?

Solution

  • I think you're trying to keep all columns matching your grepl as well as columns 1-14 (whether or not they match). Perhaps this?

    mt <- mtcars[1:3,]
    mt[, grepl("d|w", colnames(mt))]
    #               disp drat    wt
    # Mazda RX4      160 3.90 2.620
    # Mazda RX4 Wag  160 3.90 2.875
    # Datsun 710     108 3.85 2.320
    

    If we also want to keep the first two columns (akin to your 1:14), then

    mt[, replace(grepl("d|w", colnames(mt)), 1:2, TRUE)]
    #                mpg cyl disp drat    wt
    # Mazda RX4     21.0   6  160 3.90 2.620
    # Mazda RX4 Wag 21.0   6  160 3.90 2.875
    # Datsun 710    22.8   4  108 3.85 2.320
    

    Or another approach (same results)):

    mt[, seq_along(mt) %in% 1:2 | grepl("d|w", colnames(mt))]
    #                mpg cyl disp drat    wt
    # Mazda RX4     21.0   6  160 3.90 2.620
    # Mazda RX4 Wag 21.0   6  160 3.90 2.875
    # Datsun 710    22.8   4  108 3.85 2.320