Search code examples
rselectdplyrstring-matchingends-with

r dplyr ends_with multiple string matches


Can I use dplyr::select(ends_with) to select column names that fit any of multiple conditions. Considering my column names, I want to use ends with instead of contains or matches, because the strings I want to select are relevant at the end of the column name, but may also appear in the middle in others. For instance,

df <- data.frame(a10 = 1:4,
             a11 = 5:8,
             a20 = 1:4,
             a12 = 5:8)

I want to select columns that end with 1 or 2, to have only columns a11 and a12. Is select(ends_with) the best way to do this?

Thanks!


Solution

  • You can also do this using regular expressions. I know you did not want to use matches initially, but it actually works quite well if you use the "end of string" symbol $. Separate your various endings with |.

    df <- data.frame(a10 = 1:4,
                     a11 = 5:8,
                     a20 = 1:4,
                     a12 = 5:8)
    
    df %>% select(matches('1$|2$'))
      a11 a12
    1   5   5
    2   6   6
    3   7   7
    4   8   8
    

    If you have a more complex example with a long list, use paste0 with collapse = '|'.

    dff <- data.frame(a11 = 1:3,
                      a12 = 2:4,
                      a13 = 3:5,
                      a16 = 5:7,
                      my_cat = LETTERS[1:3],
                      my_dog = LETTERS[5:7],
                      my_snake = LETTERS[9:11])
    
    my_cols <- paste0(c(1,2,6,'dog','cat'), 
                      '$', 
                      collapse = '|')
    
    dff %>% select(matches(my_cols))
    
      a11 a12 a16 my_cat my_dog
    1   1   2   5      A      E
    2   2   3   6      B      F
    3   3   4   7      C      G