Using apply() to select specific variables by name

Ok, basically I have a dataset of households that looks like this:



household_data <- data.frame(
                                id = 1:4,
                                gender_component_1 = c(1,2,2,2),
                                gender_component_2 = c(2,1,1,2),
                                bread_winner      = c(1,1,2,1)
)

I want to construct a variable ('gender_bread_winner') which reports the sex of the breadwinner in the family - whether component 1 or 2 , which is reported in a separate variable as a numeric.

I've come up with the following loop:

var_max <- paste("gender_component", household_data$bread_winner, sep = "_")

for (i in 1:nrow(household_data)) {
  household_data$gender_bread_winner[i] <- select(household_data[i,], var_max[i])
 }

Unfortunately, the real dataset is huge and this solution is not at all optimal, I was wondering whether is it possible to do the same thing using apply or similar? I've not been able to though.

Thanks in advance

EDIT : Thank you all for your answers! In the end I found easier to use a score of ifelses like this:


dataset$sesso_max <- NA
dataset$sesso_max <- ifelse(dataset$max_percettore == 1, dataset$sesso_1, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 2, dataset$sesso_2, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 3, dataset$sesso_3, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 4, dataset$sesso_4, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 5, dataset$sesso_5, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 6, dataset$sesso_6, dataset$sesso_max)

Solution

If there are only 2 gender_component columns a simple ifelse would do.

household_data <- transform(household_data, gender_bread_winner  = 
        ifelse(bread_winner == 1, gender_component_1, gender_component_2))

This says that when bread_winner has value 1 take the value from gender_component_1 or else take it from gender_component_2 column.

For more than 2 columns we may use max.col as follows -

gender_cols <- grep('gender_component', names(household_data), value = TRUE)
household_data$gender_bread_winner <- household_data[gender_cols]
             [cbind(1:nrow(household_data), household_data$bread_winner)]
household_data

#  id gender_component_1 gender_component_2 bread_winner gender_bread_winner
#1  1                  1                  2            1                   1
#2  2                  2                  1            1                   2
#3  3                  2                  1            2                   1
#4  4                  2                  2            1                   2

Explanation for the answer -

gender_cols has all the columns that have "gender_component" in them.

gender_cols
#[1] "gender_component_1" "gender_component_2"

We create a matrix with row and column index to subset from the dataframe household_data.

cbind(1:nrow(household_data), household_data$bread_winner)
#     [,1] [,2]
#[1,]    1    1
#[2,]    2    1
#[3,]    3    2
#[4,]    4    1

This basically says that get 1st value from 1st row, 1st value from 2nd row and so on. This matrix is used to subset the data from the dataframe.