Ok, basically I have a dataset of households that looks like this:
household_data <- data.frame(
id = 1:4,
gender_component_1 = c(1,2,2,2),
gender_component_2 = c(2,1,1,2),
bread_winner = c(1,1,2,1)
)
I want to construct a variable ('gender_bread_winner') which reports the sex of the breadwinner in the family - whether component 1 or 2 , which is reported in a separate variable as a numeric.
I've come up with the following loop:
var_max <- paste("gender_component", household_data$bread_winner, sep = "_")
for (i in 1:nrow(household_data)) {
household_data$gender_bread_winner[i] <- select(household_data[i,], var_max[i])
}
Unfortunately, the real dataset is huge and this solution is not at all optimal, I was wondering whether is it possible to do the same thing using apply or similar? I've not been able to though.
Thanks in advance
EDIT : Thank you all for your answers! In the end I found easier to use a score of ifelses like this:
dataset$sesso_max <- NA
dataset$sesso_max <- ifelse(dataset$max_percettore == 1, dataset$sesso_1, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 2, dataset$sesso_2, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 3, dataset$sesso_3, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 4, dataset$sesso_4, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 5, dataset$sesso_5, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 6, dataset$sesso_6, dataset$sesso_max)
If there are only 2 gender_component
columns a simple ifelse
would do.
household_data <- transform(household_data, gender_bread_winner =
ifelse(bread_winner == 1, gender_component_1, gender_component_2))
This says that when bread_winner
has value 1 take the value from gender_component_1
or else take it from gender_component_2
column.
For more than 2 columns we may use max.col
as follows -
gender_cols <- grep('gender_component', names(household_data), value = TRUE)
household_data$gender_bread_winner <- household_data[gender_cols]
[cbind(1:nrow(household_data), household_data$bread_winner)]
household_data
# id gender_component_1 gender_component_2 bread_winner gender_bread_winner
#1 1 1 2 1 1
#2 2 2 1 1 2
#3 3 2 1 2 1
#4 4 2 2 1 2
Explanation for the answer -
gender_cols
has all the columns that have "gender_component"
in them.
gender_cols
#[1] "gender_component_1" "gender_component_2"
We create a matrix with row and column index to subset from the dataframe household_data
.
cbind(1:nrow(household_data), household_data$bread_winner)
# [,1] [,2]
#[1,] 1 1
#[2,] 2 1
#[3,] 3 2
#[4,] 4 1
This basically says that get 1st value from 1st row, 1st value from 2nd row and so on. This matrix is used to subset the data from the dataframe.