Search code examples
rdataframeindexingranger-colnames

R: Index data frame columns by ranges of their names


I have a large number of huge data frames. Often in these data frames I have groups of columns with similar names that appear sequentially. The following is a simplified version of such data frame:

> tmp <- data.frame(ID = 1:25,
    Item1 = sample(x = 1:4, size = 25, replace = TRUE),
    Item2 = sample(x = 1:4, size = 25, replace = TRUE),
    Item3 = sample(x = 1:4, size = 25, replace = TRUE),
    Item4 = sample(x = 1:4, size = 25, replace = TRUE),
    Item5 = sample(x = 1:4, size = 25, replace = TRUE),
    Item6 = sample(x = 1:4, size = 25, replace = TRUE),
    Item7 = sample(x = 1:4, size = 25, replace = TRUE),
    Quest = rep(x = 20, times = 25))

I need to find a way to index these columns by ranges of their names, not by their positions. Say I need to index columns from Item4 to Item7. I could do the following:

> tmp[ , c("Item4", "Item5", "Item6", "Item7")]

Not quite nice when you have hundreds of columns with similar names. I would like to do something like:

> tmp[ , c("Item4":"Item7")]

But it throws an error:

Error in "Item1":"Item7" : NA/NaN argument
In addition: Warning messages:
1: In `[.data.frame`(tmp, , c("Item1":"Item7")) :
  NAs introduced by coercion
2: In `[.data.frame`(tmp, , c("Item1":"Item7")) :
  NAs introduced by coercion

Further, I'd like to use this kind of indexing to manipulate, say, columns' attributes in a way as (using the former approach listing all column names)

> labels.Item4to7 <- c("Disagree", "Somewhat disagree",
  "Somewhat agree", "Agree")
> tmp[ , c("Item4", "Item5", "Item6", "Item7")] <- lapply(tmp[ , c("Item4",
  "Item5", "Item6", "Item7")], factor, labels = labels.Item4to7)

But defining ranges of column names as Item4:Item7.

Thank you in advance.


Solution

  • Use function which

    tmp[,which(names(tmp)=="Item4"):which(names(tmp)=="Item7")]
    

    Changing the values of items 4 to 7 could be achieved with the following:

    labels.Item4to7 <- c("Disagree", "Somewhat disagree",
      "Somewhat agree", "Agree")
    tmp[,which(names(tmp)=="Item4"):which(names(tmp)=="Item7")]<-
       lapply(tmp[,which(names(tmp)=="Item4"):which(names(tmp)=="Item7")],
       factor,labels=labels.Item4to7)