Search code examples
rfunctioncategories

Create function to categorize BMI in multiple dataframes in R


This is my first post here and I'm newer to R so I apologize if this post is worded weird.

I am working on an analysis of a large dataset for a single year. I want to categorize continuous BMI data into the categories ranging from "underweight" to "obese". To categorize across multiple years of this dataset I want to write a function that would be able to be used over multiple years where the datasets are named slightly different.

Is there a way I can write this function so I can apply it to different years of the dataset without rewriting my code??

bmi_categories<- function(df_bmi_cat){(as.factor(ifelse(df$BMI2< 18.5 &df$AGE2>6, "Underweight",(ifelse (18.5<=df$BMI2 & df$BMI2<25 & df$AGE2>6, "Normal Weight",(ifelse (25<=df$BMI2 & df$BMI2<30 & df$AGE2>6, "Overweight",(ifelse (30<=df$BMI2 & df$AGE2>6, "Obese","")))))))))}

The first 6 observations of the dataframe look like this:

    AGE2    BMI2
1   15  22.50087
2   17  24.88647
3   14  22.70773
4   9   23.49076
5   7   22.14871
6   16  23.10811 

Thanks in advance to anyone who responds!


Solution

  • Since the names of the columns are different each time, I would provide the function not with the entire dataframe, but with the specific data columns.

    example data

    df1 <- data.frame(AGE1 = c(6, 12, 24, 56, 32), BMI1 = c(20, 18, 27, 31, 29))
    
    > df1
      AGE1 BMI1
    1    6   20
    2   12   18
    3   24   27
    4   56   31
    5   32   29
    

    function

    bmi_categories <- function(bmi, age) {
      category = factor(rep(NA,length(bmi)), levels = c("Underweight","Normal Weight","Overweight","Obese")) # NA as default value, you could set "" as default, but then you should also add "" to the vector of levels
      
      category[bmi<18.5 & age>6] <- "Underweight"
      category[18.5<=bmi & bmi<25 & age>6] <- "Normal Weight"
      category[25<=bmi & bmi<30 & age>6] <- "Overweight"
      category[30<=bmi & age>6] <- "Obese"
    
      return(category)
    }
    

    (You could also use the code by JaredS and turn that into a function. I personally try to avoid using external libraries where possible, so the code is easier to run on another computer.)

    call the function and assign return value to new column

    df1$class <- bmi_categories(df1$BMI1, df1$AGE1)
    
    > df1
      AGE1 BMI1       class
    1    6   20        <NA>
    2   12   18 Underweight
    3   24   27  Overweight
    4   56   31       Obese
    5   32   29  Overweight