Search code examples
rmutual-exclusion

R - Creating variables with an or exclusion


I have a dataset with some varaibles which indicate if an older can or cannot do an activity (take the bus, bathing...). I have to create some variables like "In group C, the older needs assistance to perform 2 activities including bathing." #In group D, the older needs assistance to perform 3 activities including bathing and dressing.

So observations cannot be in two groups. My dataset is like:

    bathing    take_bus   dressing   eating  
1     4          4          4          3
2     2          1          3          2
3     4          2          4          2
4     5          4          1          2
5     2          4          4          1

The numbers indicate a level of difficulty to do the activity. I am only interested in level 4 or higher (the older cannot do an activity at all alone).

So for example, here, individuals 3 and 4 are in the C group. Individual 1 is in the D group BUT should not be in the C group. Individual 5 is not in group C because he can bath alone.

I did something like this:

df$is_C <- ifelse(df$bathing >= 4 & (df$dressing >= 4 | df$eating >= 4 |
                                                        df$take_bus >= 4), 1, 0)
df$is_C <- factor(x = df$is_C, levels = c(1, 0), labels = "Group_C", "Not_Group_C")

df$is_D <- ifelse(df$bathing >= 4 & df$dressing >= 4 & (  df$eating >= 4 | df$take_bus >= 4), 1, 0)
df$is_D <- factor(x = df$is_D, levels = c(1, 0), labels = "Group_D", "Not_Group_D")

However when I do that:

 >table(df$is_C, df$is_D)
          
           Group_D Not_Group_D
  Group_C      683      290
  Not_Group_C    0     9650

So 683 people are in the group C and should only be in the group D.... (It is ok to have people not in group C and not in group D because I have other variables).

What should I do???????

Thank you all for your kindness and your answers!


Solution

  • Here is a solution.
    In order to make it more readable, two functions are defined, both returning logical values. Then the logical values are used for mutual exclusion of groups C and D. When this is done, the values are coerced to integer and then to factor.

    f_is_C <- function(x, level = 4) x[1] >= level && any(x[-1] >= level)
    f_is_D <- function(x, level = 4) all(x[1:2] >= level) && any(x[3:4] >= level)
    
    is_D <- apply(df, 1, f_is_D)
    is_C <- apply(df, 1, f_is_C) & !is_D  # mutual exclusion
    
    df$is_C <- factor(as.integer(is_C), levels = 1:0, labels = c("Group_C", "Not_Group_C"))
    df$is_D <- factor(as.integer(is_D), levels = 1:0, labels = c("Group_D", "Not_Group_D"))
    
    with(df, table(is_C, is_D))
    #             is_D
    #is_C          Group_D Not_Group_D
    #  Group_C           0           2
    #  Not_Group_C       1           2