Search code examples
rdplyrreshapereshape2

Is it possible to call dcast function from dplyr pipelines?


Is it possible to use dcast function within dplyr pipelines? How should I define the first argument of dcast? How can I first filter data using dplyr and then transform it to wide format using reshape2?

set.seed(45)
df <- data.frame(
    name = rep(c("firstName", "secondName"), each=4),
    numbers = rep(1:4, 2),
    value = rnorm(8)
    )

I want to do this:

library(dplyr)
library(reshape2)

df <- df %>% 
  filter(numbers<>4) %>% 
  dcast(...)

Is it possible to use dcast in dplyr transformations? If so, what is the first argument here?


Solution

  • The data argument in dcast is not really needed in %>% as it implicitly assumes the data to be whatever that is passed over from the previous step. We can specify the formula and the 'value.var' column

    library(dplyr)   
    df %>% 
        filter(numbers != 4) %>%
        reshape2::dcast(name ~ numbers, value.var = 'value')
    #      name          1          2          3
    #1  firstName  0.3407997 -0.7033403 -0.3795377
    #2 secondName -0.8981073 -0.3347941 -0.5013782
    

    If we need to specify the data

    df %>% 
        filter(numbers != 4) %>%
        reshape2::dcast(., name ~ numbers, value.var = 'value')
    

    With tidyverse, there is pivot_wider(from tidyr - succeeds reshape2 functions) that does similar reshaping as reshape2::dcast (and more) and return a tibble

    library(tidyr)
    df %>% 
        filter(numbers != 4) %>% 
        pivot_wider(names_from = numbers, values_from = value)
    # A tibble: 2 x 4
    #  name          `1`    `2`    `3`
    #  <chr>       <dbl>  <dbl>  <dbl>
    #1 firstName   0.341 -0.703 -0.380
    #2 secondName -0.898 -0.335 -0.501