Search code examples
rcategorical-datacontinuous

Creating categorical variable from continuous variable w/ scaled data


I've lurked in this community for a while but this is my first question...

Background: I'm working with the breast cancer data from UCI. DATA

What I'm trying to do is a Latent Class Analysis (technically Latent Profile as these are continuous variables), but I must first scale my values for each variable.

Once I scale, I now have 32 variables each scaled and ranging from negative to positive (the poLCA function cannot use negatives or zeros I believe). See below for an example of one of my scaled features.

> summary(scaled.dat.1)
   V1         

Min. :-2.0279
1st Qu.:-0.6888
Median :-0.2149
Mean : 0.0000
3rd Qu.: 0.4690
Max. : 3.9678

Question How do I change these scaled continuous values to categorical values of say 1:5?


Solution

  • To categorize in 5 groups, I would do something like this...

    var1 <- c(NA, sample(-20:20, replace = T, size = 50))
    thresholds <- quantile(var1, probs = seq(0, 1, length.out = 6)[-1], na.rm = T)
    cat.var <- sapply(var1, (function(i){
      ifelse(is.na(i), NA, min(which(i <= thresholds)))
    }))
    plot(cat.var~var1)
    

    If you want to apply this to the whole data frame

    categorize <- function(var1){
      thresholds <- quantile(var1, probs = seq(0, 1, length.out = 6)[-1], na.rm = T)
      cat.var <- sapply(var1, (function(i){
        ifelse(is.na(i), NA, min(which(i <= thresholds)))
      }))
    }
    apply(df, 2, categorize)
    # alternatively
    for (j in 1:ncol(df)) {
      df[,j] <- categorize(df[,j])
    }