I have a very large dataset that I'm trying to wrangle. Here's a few of the first few rows of the variables I'm interested in for the purposes of this question:
id | stressor |
---|---|
1 | Homelessness |
1 | Inadequate Housing |
5 | Emotional Abuse |
5 | Extreme Poverty/Low Income |
5 | Physical Abuse |
6 | Chaotic atmosphere/stressful home environment |
The stressor variable is a factor with 61 levels. Here is code for an object with what you see in that table:
structure(list(id = c(1, 1, 5, 5, 5, 6), stressor = structure(c(4L,
5L, 2L, 3L, 6L, 1L), .Label = c("Chaotic atmosphere/stressful home environment",
"Emotional Abuse", "Extreme Poverty/Low Income", "Homelessness",
"Inadequate Housing", "Physical Abuse"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L))
I'm trying to reshape the data so that there is just one row per id, with a column for every stressor. Ideally, if the person has a stressor, there would be a 1 in the value corresponding to the id and that stressor and a 0 if not. I've gotten as far as casting the data. Here's the code I used for that:
data_cast<-dcast(data, id ~ stressor)
Afterward, I have a dataframe that looks like this:
id | Homelessness | Inadequate Housing | Emotional Abuse | Extreme Poverty/Low Income | Physical Abuse | Chaotic atmosphere/stressful home environment |
---|---|---|---|---|---|---|
1 | Homelessness | Inadequate Housing | NA | NA | NA | NA |
5 | NA | NA | Emotional Abuse | Extreme Poverty/Low Income | Physical Abuse | NA |
6 | NA | NA | NA | NA | NA | Chaotic atmosphere/stressful home environment |
Now this is in the correct format, but the values are not what I need. I want my final result to look like this:
id | Homelessness | Inadequate Housing | Emotional Abuse | Extreme Poverty/Low Income | Physical Abuse | Chaotic atmosphere/stressful home environment |
---|---|---|---|---|---|---|
1 | 1 | 1 | 0 | 0 | 0 | 0 |
5 | 0 | 0 | 1 | 1 | 1 | 0 |
6 | 0 | 0 | 0 | 0 | 0 | 1 |
I know that, for each individual column, I could then do something like this to get what I want:
data_cast$Homelessness<-ifelse(data_cast$Homelessness == "Homelessness", 1, 0)
I know several ways to do this, at the level of the individual column, but I'd have to repeat that for every variable and there are 61 in the actual data. That would be a lot of coding, and I'd like to avoid that.
Is there a way to recode values in the data to equal one if the value is found in any of the column names? I was trying something with ifelse()
and names()
but can't figure out what would go in the left side of the test argument. I'm guessing, if it's possible to do this way, it would be something like:
data_cast<-ifelse(__________ %in% names(data_cast) == TRUE, 1, 0)
I tried just data_cast %in% names(data_cast)
, as well as as.list(data_cast)[-1] %in% names(data_cast)
and unlist(data_cast) %in% names(data_cast)
, but none of those works.
Can anyone help me with this? Let me know if there's any more information I need to provide, and I'd be very happy to do that. I'm relatively new to R so I tried looking at other questions here on SO, but if there are applicable answers already then I must not know enough about R to have spotted them. Sorry if that's the case.
A possible solution:
library(tidyverse)
data_cast %>%
mutate(across(!id, ~ ifelse(is.na(.x),0,1)))