I would like to format a variable in R, using round
, floor
or ceiling
. However, I would like to sometimes use floor
, sometimes ceiling
for different values of the same variable. Is that possible?
My dataframe is data
and the variable I want to format is var
. These are its values (with frequencies):
Value | Freq.
---------|-----------
1 | 1504
1.333333 | 397
1.5 | 9
1.666667 | 612
2 | 2096
2.333333 | 1057
2.5 | 18
2.666667 | 1270
3 | 2913
3.333333 | 1487
3.5 | 35
3.666667 | 1374
4 | 2007
4.333333 | 779
4.5 | 16
4.666667 | 522
5 | 1913
NaN | 553
My desired result is a variable var2
that looks like this:
Value | Freq.
------|-----------
1 | 1910
2 | 3783
3 | 5670
4 | 4195
5 | 2451
NaN | 553
So, 1.5 and 2.5 are adjusted downward (floor
), but 3.5 and 4.5 are adjusted upward (ceiling
). The other values are rounded the usual way.
My attempt is this, but it does not work yet:
data$var2 <- format(round(data$var, 1))
if (data$var2 == 1.7||2.7||3.5||3.7||4.5||4.7) {
data$var2 <- format(ceiling(data$var2))
} else {
data$var2 <- format(floor(data$var2))
}
I know that there are probably several mistakes in my attempt and would appreciate any help.
PS: What I'm actually looking for is an equivalent for Stata's function egen
cut
. With that it is very easy to achieve the desired result:
egen var2 = cut(var), at(1, 1.6, 2.6, 3.5, 4.4, 5.1)
recode var2 (1 = 1) (1.6 = 2) (2.6 = 3) (3.5 = 4) (4.4 = 5)
You can use the case_when
function from the dplyr
package for this:
library(dplyr)
data %>%
mutate(var2 = case_when(var %in% c(1.5, 2.5) ~ floor(var),
var %in% c(3.5, 4.5) ~ ceiling(var),
TRUE ~ round(var)))
This returns the following data.frame
:
var var2
1 1.000000 1
2 1.333333 1
3 1.500000 1
4 1.666667 2
5 2.000000 2
6 2.333333 2
7 2.500000 2
8 2.666667 3
9 3.000000 3
10 3.333333 3
11 3.500000 4
12 3.666667 4
13 4.000000 4
14 4.333333 4
15 4.500000 5
16 4.666667 5
17 5.000000 5
18 NaN NaN
You can customize the conditions as needed.