Search code examples
rlinear-regressionbinary-datamultinomial

How to create a categorical variable in R using 2 binary variables that takes on 1 of 4 possible values (combination of the 2 binary variables)?


I'm trying to construct a multinomial logit regression model using this categorical variable as my dependent variable.

In my data, the two binary variables represent whether an individual lives in a metropolitan area (RESMETRO) and whether an individual works in a metropolitan area (JOBMETRO).

There are four possible location outcomes when combining the two binary variables.

I'm struggling trying to get these four possible combinations into one variable:

  • (RESMETRO == TRUE & JOBMETRO == TRUE)
  • (RESMETRO == TRUE & JOBMETRO == FALSE)
  • (RESMETRO == FALSE & JOBMETRO == TRUE)
  • (RESMETRO == FALSE & JOBMETRO == FALSE)

I've tried creating a new variable but I've only been capable of creating just another binary variable.


Solution

  • You mean something like this? I simulated your data whilst including a predictor variable and used case_when to make an ifelse statement based off your two binaries, which creates four outcomes in one column.

    #### Simulate Data ####
    resmetro <- rbinom(n=100,
                       size=1,
                       prob=.5)
    
    jobmetro <- rbinom(n=100,
                       size=1,
                       prob=.5)
    
    predictor <- rnorm(n=100,
                       mean=50,
                       sd=10)
    
    tib <- tibble(resmetro,
                  jobmetro,
                  predictor)
    

    You can then use case_when to make the new variable.

    #### Use Case When ####
    tib_2 <- tib %>% 
      mutate(metro_type = case_when(
        (resmetro == 0) & (jobmetro == 0) ~ "No Metro",
        (resmetro == 0) & (jobmetro == 1) ~ "Only Job",
        (resmetro == 1) & (jobmetro == 0) ~ "Only Res",
        (resmetro == 1) & (jobmetro == 1) ~ "Full Metro"
      ))
    

    Which looks like this:

    # A tibble: 100 × 4
       resmetro jobmetro predictor metro_type
          <int>    <int>     <dbl> <chr>     
     1        1        1      58.3 Full Metro
     2        0        0      54.2 No Metro  
     3        0        1      39.9 Only Job  
     4        1        1      54.1 Full Metro
     5        0        0      31.5 No Metro  
     6        1        0      43.3 Only Res  
     7        0        0      30.1 No Metro  
     8        1        1      53.3 Full Metro
     9        1        0      46.4 Only Res  
    10        0        1      51.3 Only Job  
    # … with 90 more rows
    

    Then just fit the model:

    fit <- nnet::multinom(metro_type ~ predictor, tib_2)
    summary(fit)
    

    Shown here:

    Call:
    nnet::multinom(formula = metro_type ~ predictor, data = tib_2)
    
    Coefficients:
             (Intercept)    predictor
    No Metro  0.05991963  0.004357301
    Only Job -0.97875054  0.021891747
    Only Res  0.39298053 -0.006230505
    
    Std. Errors:
             (Intercept)  predictor
    No Metro    1.416491 0.02797334
    Only Job    1.493587 0.02901240
    Only Res    1.467119 0.02925577
    
    Residual Deviance: 275.1406 
    AIC: 287.1406