Search code examples
rregressionlogistic-regressionpaste

R- combine two variables with same character outputs to use in logistic regression


I've looked up what to do in this case and haven't found much information that I could use, so any advice would be greatly appreciated

I have a dataset that separates males and females for certain variables. I would like to combine them and use the combined variable in logistic regression.

example of how data looks

male<- c("weekly","monthly","","never","","","weekly")
female<- c("","","never","","daily","weekly","")
df<-data.frame(male,female)

My code looks like this

df$combined<- paste(df$male,df$female)
model_00_<- glm(formula= df$outcome ~ df$main_predictor + df$combined, data=df, family=binomial(link="logit"))
exp(cbind(OR=coef(model_00_),confint(model_00_)))

but when I do the output looks like this (arbitrary numbers for simplicity)

                     OR      2.5%     97.5%            
intercept            9         6        11
daily                4         3        7
weekly               3          2        6
monthly              2.5        1.5      4
never                0.75       0.6     0.9
daily                4         3        7
weekly               3          2        6
monthly             2.5        1.5      4  
never                NA         NA      NA

I think this is happening because of the "paste" function but I am unsure as to how I can marry the two variables without the "paste" function


Solution

  • As others have mentioned, paste is a bad solution because it adds whitespace between the things being pasted. But I do not like using paste0 either, because it doesn't really consider the original variables as data -- just pastes them together as characters.

    As Limey's comment above mentions, I think coalesce is the better solution than either. coalesce(x, y) simply takes the value of x unless it is NA or NULL, in which case the value of y is used. Thus:

    male <- c("weekly", "monthly", NA, "never", NA, NA, "weekly")
    female <- c(NA, NA, "never", NA, "daily", "weekly", NA)
    
    df <- data.frame(male, female)
    df
    > df
         male female
    1  weekly   <NA>
    2 monthly   <NA>
    3    <NA>  never
    4   never   <NA>
    5    <NA>  daily
    6    <NA> weekly
    7  weekly   <NA>
    
    library(dplyr)
    desired_output <- coalesce(male, female)
    desired_output
    
    > desired_output
    [1] "weekly"  "monthly" "never"   "never"   "daily"   "weekly"  "weekly" 
    

    However, note that if your empty cells in the original data file have any whitespace in them, or were empty strings (""), then coalesce would not work. An empty string is different than a missing value.