Search code examples
rtidyverse

Are there any disadvantages to using tidyverse?


For anything related to processing data in R, I've recently been seeing tidyverse recommended as almost essential. This raises a question - if it is all that it's hyped up to be, is there any reason not to use it? For example, are the frameworks in tidyverse restrictive in any way that is worthy of mention?


Solution

  • First drawback: stability

    One drawback is that tidyverse functions change more rapid than, say, base R. So if you want stability over long time I would go for base R. That said, the tidyverse developers are open about their different approach. See e.g. the Welcome to the Tidyverse vignette:

    the biggest difference [between base R and tidyverse] is in priorities: base R is highly focussed on stability, whereas the tidyverse will make breaking changes in the search for better interfaces.

    ...and Hadley's answer on to Do you expect the tidyverse to be the part of core R packages some day?

    It’s extremely unlikely because the core packages are extremely conservative so that base R code is stable, and backward compatible. I prefer to have a more utopian approach where I can be quite aggressive about making backward incompatible changes while trying to figure out a better API.

    Second drawback: flexibility

    The tidy data concept is great but the Iimitation to have same row number after transformation as before (see mutate) is not always possible. See for example

    library(tidyverse)
    data.frame(matrix(rnorm(1000), ncol = 10)) %>%
      mutate_all(function(i) density(i)$x)
    

    which gives an error because row number changes. Sometime I run into situations like that where mutate complains that row number is not the same. It is similiar with summarise that expects only length one per column which is not the case for range, for instance. There are workarounds, for sure, but I prefer base R that here would simply be

    apply(data.frame(matrix(rnorm(1000), ncol = 10)),
          2,
          function(i) density(i)$x)
    

    Third drawback: complexity

    There are situations where the tidyverse works but is much more cumbersome. Some time ago I asked a question how to do this code

    df[df$age > 90, ] <- NA
    

    ... within the tidyverse and the two answers suggested using

    df %>% select(x, y, age) %>%
      mutate_all(~replace(.x, age> 90, NA))
    # or
    df %>%
      mutate_all(function(i) replace(i, .$age> 90, NA))
    

    Both answers work but are obviously not as quick to code as with base R.

    Forth drawback: Limitation

    If you want to define your own function you do something like my_fun <- function(x) ..., where function itself is a base R function which to my knowledge has no tidyverse counterpart. There are many examples where there is not a tidyverse equivalent for a base R function and probably never will be, e.g. rnorm, eval, c, and so on. In fact, this is not that much a drawback of tidyverse but it shows that tidyverse and base are great for different things and this is why you should learn both.

    Why this question should not be closed

    The question was closed as a duplicate and linked to another about tidyverse vs. data.table. In my opinion, if someone asks about disadvantages of tidyverse (or any other package) this does not mean the person is asking for a comparison with the data.table package. Instead, it is more obvious to tell the disadvantages of tidyverse by comparing it with R base which is not done in the linked question, e.g. this question is not a duplicate.