Search code examples
rmutate

Select column based on indicator in another column


I have a dataframe in R where I have a column with a numeric indicator, and I want to create a new column picking from other columns in the dataframe based on that indicator. Probably easiest if I explain via an example...

data <- data.frame(
    id = 1:6,
    selction = c(1,4,2,3,2,8),
    column1 = 1:6,
    column2 = 11:16,
    column3 = 21:26,
    column4 = 31:36,
    column5 = 41:46,
    column6 = 51:56,
    column7 = 61:66,
    column8 = 71:76
)
> data
  id selction column1 column2 column3 column4 column5 column6 column7 column8
1  1        1       1      11      21      31      41      51      61      71
2  2        4       2      12      22      32      42      52      62      72
3  3        2       3      13      23      33      43      53      63      73
4  4        3       4      14      24      34      44      54      64      74
5  5        2       5      15      25      35      45      55      65      75
6  6        8       6      16      26      36      46      56      66      76

I would like to create a new column chosing from the columnx columns based on the number in the indicator, i.e. end up with

> data
  id selction column1 column2 column3 column4 column5 column6 column7 column8 newcolumn
1  1        1       1      11      21      31      41      51      61      71         1
2  2        4       2      12      22      32      42      52      62      72        32
3  3        2       3      13      23      33      43      53      63      73        13
4  4        3       4      14      24      34      44      54      64      74        24
5  5        2       5      15      25      35      45      55      65      75        15
6  6        8       6      16      26      36      46      56      66      76        76

I am trying to avoid doing it with a long ifelse as in reality I could have a lot more than 8 columns. Any help would be appreciated


Solution

  • The "[" function will accept a multi-column numeric matrix or array with the same number of columns as there are dimensions in the data object. For dataframes this is always a two column matrix. Subset with row and column index:

    data[, -c(1:2) ][ cbind(seq(nrow(data)), data$selction) ]
    # [1]  1 32 13 24 15 76
    

    Here cbind gives me rows, 1 to nrows, and data$selection as the column index, but before subsetting, I am excluding id and selection columns - -c(1:2).

    Could also have added 2 to make a proper offset without the exclusion (Thanks @IRTFM).

    data[ cbind(seq(nrow(data)), data$selction + 2) ]
    # [1]  1 32 13 24 15 76