Search code examples
rdataframecategorical-data

R converting continuous variable to categorical


I have a column of continuous numeric values (NO2) which I need to convert into categorical values. Can someone explain how the following code accomplishes that:

cutpoints <- quantile(dataframe%NO2, seq(0,1,length=4),na.rm=TRUE)  
dataframe%newcol <- cut(dataframe%NO2, cutpoints)  
levels(dataframe%newcols) returns (0.3781,1.2] (1.2,1.42] (1.42,2.55]  

Solution

  • I think you meant to use $ instead of % to refer column names.

    If you run the code step-by-step it will help you to understand.

    seq creates a sequence from 0 to 1 with a length of 4.

    seq(0,1,length=4)
    #[1] 0.000 0.333 0.667 1.000
    

    quantile breaks the vector into quantiles of data with a given probability (here seq(0,1,length=4)).

    set.seed(123)
    x <- runif(10)
    cutpoints <- quantile(x, seq(0,1,length=4),na.rm=TRUE) 
    #    0%  33.3%  66.7%   100% 
    #0.0456 0.4566 0.7883 0.9405 
    

    and now these breaks are used to cut the data.

    cut(x, cutpoints)
    

    meaning we divide x into different groups where cutpoints[1]-cutpoints[2] is one group, cutpoints[2]-cutpoints[3] another group and so on.

    You can also use findInterval instead of cut.