My dataset looks something like this
ID YOB ATT94 GRADE94 ATT96 GRADE96 ATT 96 .....
1 1975 1 12 0 NA
2 1985 1 3 1 5
3 1977 0 NA 0 NA
4 ......
(with ATTXX a dummy var. denoting attendance at school in year XX, GRADEXX denoting the school grade)
I'm trying to create a dummy variable that = 1 if an individual is attending school when they are 19/20 years old. e.g. if YOB = 1988 and ATT98 = 1 then the new variable = 1 etc. I've been attempting this using mutate in dplyr but I'm new to R (and coding in general!) so struggle to get anything other than an error any code I write.
Any help would be appreciated, thanks.
Edit:
So, I've just noticed that something has gone wrong, I changed your code a bit just to add another column to the long format data table. Here is what I did in the end:
df %>%
melt(id = c("ID", "DOB") %>%
tbl_df() %>%
mutate(dummy = ifelse(value - DOB %in% c(19,20), 1, 0))
so it looks something like e.g.
ID YOB VARIABLE VALUE dummy
1 1979 ATT94 1994 1
1 1979 ATT96 1996 1
1 1979 ATT98 0 0
2 1976 ATT94 0 0
2 1976 ATT96 1996 1
2 1976 ATT98 1998 1
i.e. whenever the ATT variables take a value other than 0 the dummy = 1, even if they're not 19/20 years old. Any ideas what could be going wrong?
Welcome to the world of code! R's syntax can be tricky (even for experienced coders) and dplyr
adds its own quirks. First off, it's useful when you ask questions to provide code that other people can run in order to be able to reproduce your data. You can learn more about that here.
Are you trying to create code that works for all possible values of DOB
and ATTx
? In other words, do you have a whole bunch of variables that start with ATT and you want to look at all of them? That format is called wide data, and R works much better with long data. Fortunately the reshape2
package does exactly that. The code below creates a dummy
variable with a value of 1 for people who were in school when they were either 19 or 20 years old.
# Load libraries
library(dplyr)
library(reshape2)
# Create a sample dataset
ATT94 <- runif(500, min = 0, max = 1) %>% round(digits = 0)
ATT96 <- runif(500, min = 0, max = 1) %>% round(digits = 0)
ATT98 <- runif(500, min = 0, max = 1) %>% round(digits = 0)
DOB <- rnorm(500, mean = 1977, sd = 5) %>% round(digits = 0)
df <- cbind(DOB, ATT94, ATT96, ATT98) %>% data.frame()
# Recode ATTx variables with the actual year
df$ATT94[df$ATT94==1] <- 1994
df$ATT96[df$ATT96==1] <- 1996
df$ATT98[df$ATT98==1] <- 1998
# Melt the data into a long format and perform requested analysis
df %>%
melt(id = "DOB") %>%
tbl_df() %>%
mutate(dummy = ifelse(value - DOB %in% c(19,20), 1, 0))