I have the following dataset:
hairdf=data.frame(
id=c(1:4),
typedad=c("straight*","curly"),
colourdad=c("brown","black"),
typemom=c("curly","wavy*"),
colourmom=c("blonde","red"),
typekid1=c("wavy","mixed*"),
colourkid1=c("black","blonde"))
I want to create new columns that will look at hairtypes and give value 1 if the type of hair appears in "hairtype" columns without an asterisk and a value 2 if it appears with an asterisk (blank if it doesnt appear in that row). It should look like so:
id | typedad | colourdad | typemom | colourmom | typekid1 | colourkid1 | straight | curly | wavy | mixed |
---|---|---|---|---|---|---|---|---|---|---|
1 | striaght* | brown | curly | blonde | wavy | black | 2 | 1 | 1 | |
2 | curly | black | wavy* | red | mixed* | blonde | 1 | 2 | 2 |
My two issues are that all other examples use numeric values and all other examples have the columns of interest located next to each other. I need code that looks to match strings in columns that can be located anywhere in the dataframe. I have tried the following:
straight<- hairdf %>% mutate(across(c("hairtypedad", "hairtypemom", "hairtypekid1"),
ifelse(.=="straight", 1
ifelse(.=="straight*",2, ""
))))
curly<- hairdf %>% mutate(across(c("hairtypedad", "hairtypemom", "hairtypekid1"),
ifelse(.=="curly", 1
ifelse(.=="curly*",2, ""
wavy<- hairdf %>% mutate(across(c("hairtypedad", "hairtypemom", "hairtypekid1"),
ifelse(.=="wavy", 1
ifelse(.=="wavy*",2, ""
))))
mixed<- hairdf %>% mutate(across(c("hairtypedad", "hairtypemom", "hairtypekid1"),
ifelse(.=="mixed", 1
ifelse(.=="mixed*",2, ""
))))
But I'm not sure if this code even makes sense. Also, this will be tedious as I have way more hairtypes, so any suggestions to make it easier would be appreciated as well!! Thankyou!!!
This is not the more efficient answer, neither the more general solution, but may satisfy a solution:
#create columns
st <- rep(NA,nrow(hairdf));
cur <- rep(NA,nrow(hairdf));
wav <- rep(NA,nrow(hairdf));
mix <- rep(NA,nrow(hairdf));
#join and define words
hairdf <- cbind(hairdf,st,cur,wav,mix);
words <- c("straight","curly","wavy","mixed");
words_ast <- paste(words,"*",sep=""); #just get the "*" words
#make a loop according to positions of columns st,cur,wav,mix
for (j in 1:length(words_ast)){ #let's see if we can evaluate 2 in words_ast
for (i in c(2,3,4)){ #but only in columns we selected
a <- subset(hairdf,hairdf[,i]==words_ast[j]) #subset columns which satisfay condition. [Note that this can be written as hairdf %>% subset(.[,i]==words_ast[j]) ]
hairdf[row.names(a),7+j] <- 2 #replace value from column 8
}
}
#repeat process for "words"
for (j in 1:length(words)){
for (i in c(2,3,4)){
a <- subset(hairdf,hairdf[,i]==words[j])
hairdf[row.names(a),7+j] <- 1
}
}
This should allow you to get the expected result. Alternatively, you can use the assign()
function, i.e
assign(x,value=1)
where x is each element in words.
So in a loop:
assign(words[n],value=1) ; assign(words_ast[n],value=2)