Search code examples
rdecimalrounding

Rounding numbers to the nearest decimal places in R


Consider the following data set:

df <- data.frame(id=1:10,
                 v1=c(2.35456185,1.44501001,2.98712312,0.12345123,0.96781234,
                      1.23934551,5.00212233,4.34120000,1.23443213,0.00112233),
           v2=c(0.22222222,0.00123456,2.19024869,0.00012000,0.00029848,
                0.12348888,0.46236577,0.85757000,0.05479729,0.00001202))

My intention is to round the values in v1 and v2 to the nearest one decimal place (10% of observation), two decimals (40% of observations), and three decimal places (50% of observations) randomly. I can use the round() function to round numbers to certain decimal places uniformly. In my case, however, it's not uniform. Thank you in advance!

Example of output needed (of course mine is not random):

id   v1    v2
 1   2.3   0.2
 2   1.45  0
 3   2.99  2.19
 4   0.12  0
 5   0.97  0
 6   1.239 0.123
 7   5.002 0.462
 8   4.341 0.858
 9   1.234 0.055
10   0.001 0

Solution

  • We may create a grouping with sample based on the probbablity, and then round the v1 column based on the value of the group

    library(dplyr)
    df %>%
      group_by(grp = sample(1:3, size = n(), replace = TRUE,
         prob = c(0.10, 0.4, 0.5))) %>% 
      mutate(v1 = round(v1, first(grp))) %>%
      ungroup %>% 
      select(-grp)
    

    -output

    # A tibble: 10 × 2
          id    v1
       <int> <dbl>
     1     1 2.36 
     2     2 1.44 
     3     3 2.99 
     4     4 0.123
     5     5 0.97 
     6     6 1.24 
     7     7 5.00 
     8     8 4.3  
     9     9 1.23 
    10    10 0    
    

    For multiple columns, use across to loop over

    df %>%
       mutate(across(v1:v2, ~ round(.x, sample(1:3, size = n(),
        replace = TRUE, prob = c(0.10, 0.40, 0.50)))))
    

    Or we pass the sampled output in digits argument of round directly

    df$v1 <- with(df, round(v1, sample(1:3, size = nrow(df), 
        replace = TRUE, prob = c(0.10, 0.4, 0.5))))
    

    Update

    Just checking the rounded values

    library(stringr)
    df %>%
       mutate(across(v1:v2, ~ sample(1:3, size = n(),
        replace = TRUE, prob = c(0.10, 0.40, 0.50)), 
        .names = "{.col}_sample_ind"),
        across(v1:v2, ~  round(.x, digits = cur_data()[[str_c(cur_column(),
          "_sample_ind")]]), 
        .names = "{.col}_rounded")) %>%
       as_tibble
    

    -output

      # A tibble: 10 × 7
          id      v1        v2 v1_sample_ind v2_sample_ind v1_rounded v2_rounded
       <int>   <dbl>     <dbl>         <int>         <int>      <dbl>      <dbl>
     1     1 2.35    0.222                 3             2      2.36       0.22 
     2     2 1.45    0.00123               3             3      1.44       0.001
     3     3 2.99    2.19                  1             2      3          2.19 
     4     4 0.123   0.00012               3             2      0.123      0    
     5     5 0.968   0.000298              3             1      0.968      0    
     6     6 1.24    0.123                 3             3      1.24       0.123
     7     7 5.00    0.462                 2             3      5          0.462
     8     8 4.34    0.858                 2             1      4.34       0.9  
     9     9 1.23    0.0548                2             2      1.23       0.05 
    10    10 0.00112 0.0000120             2             3      0          0