Search code examples
rdplyrwindow-functions

Using multiple columns in dplyr window functions?


Comming from SQL i would expect i was able to do something like the following in dplyr, is this possible?

# R
tbl %>% mutate(n = dense_rank(Name, Email))

-- SQL
SELECT Name, Email, DENSE_RANK() OVER (ORDER BY Name, Email) AS n FROM tbl

Also is there an equivilant for PARTITION BY?


Solution

  • I did struggle with this problem and here is my solution:

    In case you can't find any function which supports ordering by multiple variables, I suggest that you concatenate them by their priority level from left to right using paste().

    Below is the code sample:

    tbl %>%
      mutate(n = dense_rank(paste(Name, Email))) %>%
      arrange(Name, Email) %>%
      view()
    

    Moreover, I guess group_by is the equivalent for PARTITION BY in SQL.

    The shortfall for this solution is that you can only order by 2 (or more) variables which have the same direction. In the case that you need to order by multiple columns which have different direction, saying that 1 asc and 1 desc, I suggest you to try this: Calculate rank with ties based on more than one variable