How to select in R all columns with a common tag?

I have a SparkDataFrame with hundreds of columns and I would like to only select those with the word "time" on it.

Instead of doing something like this having to write all the names of the column manually:

df_selection <- SparkR::select(df, "TRT_time", "Flight_time", "SS_time", ...)

I would like to do something like:

df_selection <- SparkR::select(df, like('^time.*?'))

Is there a nice way of doing this in SparkR? An equivalent for the SQL like operator? I'm a total begginer in R, sorry if it is quite simple question but I couldn't get my brain around it.

Thank you!

Solution

You can access the column names using names(), even if it is a SparkDataFrame. And then you can use grepl() to find matches to a pattern.

cols <- names(df)

head(select(df, cols[grepl('^time.*?', names(df))]))

Which returns:

            time_hour
1 2013-01-01 06:00:00
2 2013-01-01 06:00:00
3 2013-01-01 06:00:00
4 2013-01-01 06:00:00
5 2013-01-01 07:00:00
6 2013-01-01 06:00:00

Update after comment:

The best thing would be to include it with an OR i.e. '|' where you subset col.

head(SparkR::select(df, cols[grepl('^time.*?', names(df)) | cols == "airline_iata"]))

Note: You could also create a character vector like this c(cols[grepl('^time.*?', names(df))], 'airline_iata') but Spark throws an error if a column name is not found. airline_iata'is not innames(df)`.