We are planning to explore use of dbplyr for Snowflake and would like get more insight.
- dbplyr converting SQL code for in-database processing will be equally efficient than directly writing SQL code ?
- does dbplyr supports complex multi-level (at least 3 level) sub-query ?
Over question is that ....Does it make sense to use dbplyr for enterprise level application for big data analysis ?
dbplyr works by translating dplyr code into SQL. This has some advantages:
- elegance of R tools
- access to other parts of R programming language (e.g. loops, plots, packages)
- the same code can be translated to a different database type
and some disadvantages:
- not all R commands have translations defined
- some restrictions on how R code can be written for translation
- presentation of translated commands is less elegant than human written SQL code
- error messages in R are not always sufficient to debug problems that occur in the database
Whether it is right for your application is a practical question you will need to test.
Regarding sub-queries:
- dbplyr uses sub-queries throughout its translation, but not in the same way developers write sub-queries
- SQL has an upper limit on the number of sub-queries it can handle, if you write your dplyr code badly then you can encounter this limit
Other answers you might find relevant: