Search code examples
rregexdplyr

Select columns based on string match - dplyr::select


I have a data frame ("data") with lots and lots of columns. Some of the columns contain a certain string ("search_string").

How can I use dplyr::select() to give me a subset including only the columns that contain the string?

I tried:

# columns as boolean vector
select(data, grepl("search_string",colnames(data)))

# columns as vector of column names names 
select(data, colnames(data)[grepl("search_string",colnames(data))]) 

Neither of them work.

I know that select() accepts numeric vectors as substitute for columns e.g.:

select(data,5,7,9:20)

But I don't know how to get a numeric vector of columns IDs from my grepl() expression.


Solution

  • Within the dplyr world, try:

    select(iris,contains("Sepal"))
    

    See the Selection section in ?select for numerous other helpers like starts_with, ends_with, etc.