require('dplyr')
set.seed(8)
df <- data.frame(v1=rnorm(5),
v2=rnorm(5),
v3=rnorm(5))
If I which to count the number of values above, say, 0 and put this in an new column I would do:
mutate(df, n=apply(df,1,function(x)sum(x>0)))
This would give:
v1 v2 v3 n
1 -0.08458607 -0.1078814 -0.75979380 0
2 0.84040013 -0.1702891 0.29204986 2
3 -0.46348277 -1.0883317 0.42139859 1
4 -0.55083500 -3.0110517 -1.29448908 0
5 0.73604043 -0.5931743 0.06928509 2
Now I want to use dplyr
with chaining and make the same thing on a subset of columns,v1
and v2
, but cannot figure out how to give apply the right data. If I just do (after making df
again of cause):
df %>%
select(v1, v2) %>%
mutate(n=apply(df,1,function(x)sum(x>0)))
...Gives the same as above (same n
i.e. it counts across all three columns), while passing data with .
or just blank: Does not work.
df %>%
select(v1, v2) %>%
mutate(n=apply(.,1,function(x)sum(x>0)))
or:
df %>%
select(v1, v2) %>%
mutate(n=apply(1,function(x)sum(x>0)))
Whats wrong?
After we use select
for subsetting the columns that are needed, apply the rowwise()
function and then use do
. Here .
refers to the dataframe that we got after the select
step. When we do sum(.>0)
, it will apply that function on each row of the new dataset. Lastly, we data.frame(., n=..)
, gets all the previous columns along with the newly created n
.
df %>%
select(v1, v2) %>%
rowwise() %>%
do(data.frame(., n=sum(.>0)))
# v1 v2 n
#1 -0.08458607 -0.1078814 0
#2 0.84040013 -0.1702891 1
#3 -0.46348277 -1.0883317 0
#4 -0.55083500 -3.0110517 0
#5 0.73604043 -0.5931743 1