This is based upon this question. The question said:
Is there a way for me to subset data based on column names starting with a particular string? I have some columns which are like ABC_1 ABC_2 ABC_3 and some like XYZ_1, XYZ_2,XYZ_3 let's say.
How can I subset my df based only on columns containing the above portions of text (lets say, ABC or XYZ)? I can use indices, but the columns are too scattered in data and it becomes too much of hard coding.
Also, I want to only include rows from each of these columns where any of their value is >0 so if either of the 6 columns above has a 1 in the row, it makes a cut into my final data frame.
One of the answers used dplyr to select multiple columns for the new data:
df <- df %>% dplyr:: select(grep("ABC", names(df)), grep("XYZ", names(df)))
However, I'd like a data.table solution similar to the above. I know that (as per one of the answers) that data.table can do one condition, but unsure how to do multiple:
df[, grep("ABC", names(df)), with = FALSE]
You can select multiple columns that match certain patterns in data.table using patterns
in its .SDcols
argument:
# turn df into data.table
setDT(df)
# select columns that contain ABD or XYZ
df[, .SD, .SDcols=patterns("ABC|XYZ")]
# or
df[, grep("ABC|XYZ", names(df)), with=FALSE]
subsetting rows and columns at the same time
cols = grep("ABC|XYZ", names(df))
df[rowSums(df[, ..cols]>0)>0, .SD, .SDcols=cols]