Search code examples
rdbplyr

Use of dbplyr for in-database processing vs directly writing SQL code


We are planning to explore use of dbplyr for Snowflake and would like get more insight.

  1. dbplyr converting SQL code for in-database processing will be equally efficient than directly writing SQL code ?
  2. does dbplyr supports complex multi-level (at least 3 level) sub-query ?

Over question is that ....Does it make sense to use dbplyr for enterprise level application for big data analysis ?


Solution

  • dbplyr works by translating dplyr code into SQL. This has some advantages:

    • elegance of R tools
    • access to other parts of R programming language (e.g. loops, plots, packages)
    • the same code can be translated to a different database type

    and some disadvantages:

    • not all R commands have translations defined
    • some restrictions on how R code can be written for translation
    • presentation of translated commands is less elegant than human written SQL code
    • error messages in R are not always sufficient to debug problems that occur in the database

    Whether it is right for your application is a practical question you will need to test.

    Regarding sub-queries:

    • dbplyr uses sub-queries throughout its translation, but not in the same way developers write sub-queries
    • SQL has an upper limit on the number of sub-queries it can handle, if you write your dplyr code badly then you can encounter this limit

    Other answers you might find relevant: