Search code examples
rdataframedplyrmagrittr

How to write a readable code for arithmetic operations on data frame using pipe/dplyr?


I want to subtract a value from my entire dataset, while excluding the first column from this operation. While there are many ways to do it, I'm looking for a very readable code.

I've come across subtract() from the magrittr package, but I can't incorporate it within the pipe in a reasonable way.

My Data

set.seed(12)
df <- data.frame(replicate(10,sample(1:100,10,rep=TRUE)))
df[1] <- 1:10
colnames(df) <- c("ID", "A", "B", "C", "D", "E", "F", "G", "H", "I")

> df
#    ID  A  B  C  D  E  F  G  H  I
# 1   1 91 57 26 91 83 73 14 75 16
# 2   2 82 72 32 37 18 52 80 22 59
# 3   3 82 43 84 87 85 56 74 67 38
# 4   4 38 46 20 48 55 53 66 12 18
# 5   5 90 30 64 71 58 39 12  5 66
# 6   6 48 37 19 27 88 28 42 76 83
# 7   7 13 34 84 77 13 40 40 67 10
# 8   8 56 39  4 84 32 59 37  5 50
# 9   9 68 78 13 91 40 15 80 86 79
# 10 10 24 71 77  5 88  7  5 42  6

Attempts to subtract 5 from the entire dataset, except for first column

library(magrittr)
library(dplyr)

## first attempt
df %>%
  mutate_at(vars(-ID), funs(subtract(5)))
#    ID  A  B  C  D  E  F  G  H  I   ## while first column remains intact,
# 1   1 -5 -5 -5 -5 -5 -5 -5 -5 -5   ## the rest just gets assigned with -5.
# 2   2 -5 -5 -5 -5 -5 -5 -5 -5 -5   ## not good.
# 3   3 -5 -5 -5 -5 -5 -5 -5 -5 -5
# 4   4 -5 -5 -5 -5 -5 -5 -5 -5 -5
# 5   5 -5 -5 -5 -5 -5 -5 -5 -5 -5
# 6   6 -5 -5 -5 -5 -5 -5 -5 -5 -5
# 7   7 -5 -5 -5 -5 -5 -5 -5 -5 -5
# 8   8 -5 -5 -5 -5 -5 -5 -5 -5 -5
# 9   9 -5 -5 -5 -5 -5 -5 -5 -5 -5
# 10 10 -5 -5 -5 -5 -5 -5 -5 -5 -5

## second attempt
df %>%
  subtract(5)
#    ID  A  B  C  D  E  F  G  H  I   ## subtracts correctly, simple and sweet.
# 1  -4 86 52 21 86 78 68  9 70 11   ## however, there's no specification to 
# 2  -3 77 67 27 32 13 47 75 17 54   ## skip the first column.
# 3  -2 77 38 79 82 80 51 69 62 33
# 4  -1 33 41 15 43 50 48 61  7 13
# 5   0 85 25 59 66 53 34  7  0 61
# 6   1 43 32 14 22 83 23 37 71 78
# 7   2  8 29 79 72  8 35 35 62  5
# 8   3 51 34 -1 79 27 54 32  0 45
# 9   4 63 73  8 86 35 10 75 81 74
# 10  5 19 66 72  0 83  2  0 37  1

## third attempt
b2i_minus_five <- df[, -1] -5 
cbind(df[1], b2i_minus_five)
#    ID  A  B  C  D  E  F  G  H  I   ## gets the job done, but ugly code,
# 1   1 86 52 21 86 78 68  9 70 11   ## at least in my opinion.
# 2   2 77 67 27 32 13 47 75 17 54
# 3   3 77 38 79 82 80 51 69 62 33
# 4   4 33 41 15 43 50 48 61  7 13
# 5   5 85 25 59 66 53 34  7  0 61
# 6   6 43 32 14 22 83 23 37 71 78
# 7   7  8 29 79 72  8 35 35 62  5
# 8   8 51 34 -1 79 27 54 32  0 45
# 9   9 63 73  8 86 35 10 75 81 74
# 10 10 19 66 72  0 83  2  0 37  1

Is there a way to get the job done in the spirit of the second attempt, hopefully with just adding a little touch to it?

Again, the motivation here is to write a simple and clear code, which is also why I insist on using subtract() rather than -5.

Thanks!


Solution

  • I think the issue is in the way you call subtract(). anyway, the latest version of dplyr 0.8.0 has a new way to handle these calls, with list() instead of funs(). With the new version, you obtain what you are trying to get with:

    set.seed(12)
    df <- data.frame(replicate(10,sample(1:100,10,rep=TRUE)))
    df[1] <- 1:10
    colnames(df) <- c("ID", "A", "B", "C", "D", "E", "F", "G", "H", "I")
    
    library(magrittr)
    library(dplyr)
    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    packageVersion("dplyr")
    #> [1] '0.8.3'
    
    ## first attempt
    df %>%
      mutate_at(vars(-ID), list(~subtract(., 5)))
    #>    ID  A  B  C  D  E  F  G  H  I
    #> 1   1 86 52 21 86 78 68  9 70 11
    #> 2   2 77 67 27 32 13 47 75 17 54
    #> 3   3 77 38 79 82 80 51 69 62 33
    #> 4   4 33 41 15 43 50 48 61  7 13
    #> 5   5 85 25 59 66 53 34  7  0 61
    #> 6   6 43 32 14 22 83 23 37 71 78
    #> 7   7  8 29 79 72  8 35 35 62  5
    #> 8   8 51 34 -1 79 27 54 32  0 45
    #> 9   9 63 73  8 86 35 10 75 81 74
    #> 10 10 19 66 72  0 83  2  0 37  1