Search code examples
rroundingstatafloorceil

Using floor and ceiling on different values of one variable in R


I would like to format a variable in R, using round, floor or ceiling. However, I would like to sometimes use floor, sometimes ceiling for different values of the same variable. Is that possible?

My dataframe is data and the variable I want to format is var. These are its values (with frequencies):

Value    |      Freq.
---------|-----------
1        |       1504
1.333333 |        397
1.5      |          9
1.666667 |        612
2        |       2096
2.333333 |       1057
2.5      |         18
2.666667 |       1270
3        |       2913
3.333333 |       1487
3.5      |         35
3.666667 |       1374
4        |       2007
4.333333 |        779
4.5      |         16
4.666667 |        522
5        |       1913
NaN      |        553

My desired result is a variable var2 that looks like this:

Value |      Freq.
------|-----------
1     |       1910
2     |       3783
3     |       5670
4     |       4195
5     |       2451     
NaN   |        553

So, 1.5 and 2.5 are adjusted downward (floor), but 3.5 and 4.5 are adjusted upward (ceiling). The other values are rounded the usual way.

My attempt is this, but it does not work yet:

data$var2 <- format(round(data$var, 1))
if (data$var2 == 1.7||2.7||3.5||3.7||4.5||4.7) {
  data$var2 <- format(ceiling(data$var2))
} else {
  data$var2 <- format(floor(data$var2))
}

I know that there are probably several mistakes in my attempt and would appreciate any help.

PS: What I'm actually looking for is an equivalent for Stata's function egen cut. With that it is very easy to achieve the desired result:

egen var2 = cut(var), at(1, 1.6, 2.6, 3.5, 4.4, 5.1)
recode var2 (1 = 1) (1.6 = 2) (2.6 = 3) (3.5 = 4) (4.4 = 5)

Solution

  • You can use the case_when function from the dplyr package for this:

    library(dplyr)
    
    data %>% 
      mutate(var2 = case_when(var %in% c(1.5, 2.5) ~ floor(var),
                              var %in% c(3.5, 4.5) ~ ceiling(var),
                              TRUE ~ round(var)))
    

    This returns the following data.frame:

            var var2
    1  1.000000    1
    2  1.333333    1
    3  1.500000    1
    4  1.666667    2
    5  2.000000    2
    6  2.333333    2
    7  2.500000    2
    8  2.666667    3
    9  3.000000    3
    10 3.333333    3
    11 3.500000    4
    12 3.666667    4
    13 4.000000    4
    14 4.333333    4
    15 4.500000    5
    16 4.666667    5
    17 5.000000    5
    18      NaN  NaN
    

    You can customize the conditions as needed.