Search code examples
rapplydplyrchaining

R: mutate() with apply after select() using chaining


require('dplyr')
set.seed(8)
df <- data.frame(v1=rnorm(5),
                 v2=rnorm(5),
                 v3=rnorm(5))

If I which to count the number of values above, say, 0 and put this in an new column I would do:

mutate(df, n=apply(df,1,function(x)sum(x>0)))

This would give:

       v1         v2          v3 n
1 -0.08458607 -0.1078814 -0.75979380 0
2  0.84040013 -0.1702891  0.29204986 2
3 -0.46348277 -1.0883317  0.42139859 1
4 -0.55083500 -3.0110517 -1.29448908 0
5  0.73604043 -0.5931743  0.06928509 2

Now I want to use dplyrwith chaining and make the same thing on a subset of columns,v1 and v2, but cannot figure out how to give apply the right data. If I just do (after making df again of cause):

df %>%
   select(v1, v2) %>%
   mutate(n=apply(df,1,function(x)sum(x>0)))

...Gives the same as above (same ni.e. it counts across all three columns), while passing data with .or just blank: Does not work.

df %>%
   select(v1, v2) %>%
   mutate(n=apply(.,1,function(x)sum(x>0)))

or:

df %>%
   select(v1, v2) %>%
   mutate(n=apply(1,function(x)sum(x>0)))

Whats wrong?


Solution

  • After we use select for subsetting the columns that are needed, apply the rowwise() function and then use do. Here . refers to the dataframe that we got after the select step. When we do sum(.>0), it will apply that function on each row of the new dataset. Lastly, we data.frame(., n=..), gets all the previous columns along with the newly created n.

    df %>% 
       select(v1, v2) %>% 
       rowwise() %>% 
       do(data.frame(., n=sum(.>0)))
    #           v1         v2 n
    #1 -0.08458607 -0.1078814 0
    #2  0.84040013 -0.1702891 1
    #3 -0.46348277 -1.0883317 0
    #4 -0.55083500 -3.0110517 0
    #5  0.73604043 -0.5931743 1