Search code examples
rdplyrcategorization

Categorize numeric variable with mutate


I would like to a categorize numeric variable in my data.frame object with the use of dplyr (and have no idea how to do it).

Without dplyr, I would probably do something like:

df <- data.frame(a = rnorm(1e3), b = rnorm(1e3))
df$a <- cut(df$a , breaks=quantile(df$a, probs = seq(0, 1, 0.2)))

and it would be done. However, I strongly prefer to do it with the use of some dplyr function (mutate, I suppose) in the chain sequence of other actions I do perform over my data.frame.


Solution

  • set.seed(123)
    df <- data.frame(a = rnorm(10), b = rnorm(10))
    
    df %>% mutate(a = cut(a, breaks = quantile(a, probs = seq(0, 1, 0.2))))
    

    giving:

                     a          b
    1  (-0.586,-0.316]  1.2240818
    2   (-0.316,0.094]  0.3598138
    3      (0.68,1.72]  0.4007715
    4   (-0.316,0.094]  0.1106827
    5     (0.094,0.68] -0.5558411
    6      (0.68,1.72]  1.7869131
    7     (0.094,0.68]  0.4978505
    8             <NA> -1.9666172
    9   (-1.27,-0.586]  0.7013559
    10 (-0.586,-0.316] -0.4727914