I have a column of continuous numeric values (NO2) which I need to convert into categorical values. Can someone explain how the following code accomplishes that:
cutpoints <- quantile(dataframe%NO2, seq(0,1,length=4),na.rm=TRUE)
dataframe%newcol <- cut(dataframe%NO2, cutpoints)
levels(dataframe%newcols) returns (0.3781,1.2] (1.2,1.42] (1.42,2.55]
I think you meant to use $
instead of %
to refer column names.
If you run the code step-by-step it will help you to understand.
seq
creates a sequence from 0 to 1 with a length of 4.
seq(0,1,length=4)
#[1] 0.000 0.333 0.667 1.000
quantile
breaks the vector into quantiles of data with a given probability (here seq(0,1,length=4)
).
set.seed(123)
x <- runif(10)
cutpoints <- quantile(x, seq(0,1,length=4),na.rm=TRUE)
# 0% 33.3% 66.7% 100%
#0.0456 0.4566 0.7883 0.9405
and now these breaks are used to cut
the data.
cut(x, cutpoints)
meaning we divide x
into different groups where cutpoints[1]-cutpoints[2]
is one group, cutpoints[2]-cutpoints[3]
another group and so on.
You can also use findInterval
instead of cut
.