Search code examples
rif-statementvariablescategorical-data

Create 4 categories variables


I used a survey that has 20 questions and I calculated the mean of the 20 questions as "Total" for 400 participants. Now I need to categorise the Total into 4 groups: Total < 2 is limited, Total >= 2 is basic, Total < 3 is Good, and Total >= 3 is Full

I was able to create three but not four variables as follow:

level <- ifelse (df$Total <2, "Limited", ifelse((df$Total>= 2) & (df$Total<3), "Basic","Good"))

Then I want to see what's the percentage of each category either on numbers or graphs.


Solution

  • I may be misunderstanding something, but you appear to have overlapping categories- Total >= 2 is basic, but Total < 3 is good. You may want to confirm the bounds for your groupings. Once that's sorted, you were actually pretty close to a working solution- you can nest ifelse statements and consider that they are evaluated in order. So, if a condition evaluates to TRUE "early" in the chain, it will return whatever is the output for a TRUE response at that point. Otherwise, it will move to the next ifelse to evaluate. Note here that I've used 1, 2, and 3 as the 'breaks' for the categories, so that the logic evaluates to: "If it's less than 1, it's Limited. If it's less than 2, it's Basic. If it's less than 3, it's good. Otherwise, it's Full."

    set.seed(123)
    df <- data.frame(total = runif(n = 15, min = 0, max = 4))
    df
    
    
    df$level = ifelse(df$total < 1, "Limited", 
                      ifelse(df$total < 2, "Basic", 
                             ifelse(df$total < 3, "Good", "Full")))
    > df
           total   level
    1  0.5691772 Limited
    2  2.1971386    Good
    3  3.8163650    Full
    4  2.3419334    Good
    5  1.6180411   Basic
    6  2.5915739    Good
    7  1.2792825   Basic
    8  1.2308800   Basic
    9  0.8790705 Limited
    10 1.4779555   Basic
    11 3.9368768    Full
    12 0.6168092 Limited
    13 0.3641760 Limited
    14 0.5676276 Limited
    15 2.7600284    Good
    

    With just four categories an ifelse block is probably fine- if I were using many more bounds I'd likely use a different approach Edit: like thelatemail's- it's far cleaner.