Search code examples
rdplyrhistogram

Piping histograms in dplyr (R)


Is it possible to pipe multiple graphs in dplyr.

This is working:

birdsss = data.frame(x1 = 1:10,x2 = 21:30,x3 = 41:50)
birdsss%>%  
  with(hist(x1, breaks = 50))

but this is not working:

birdsss%>%  
  with(hist(x1, breaks = 50)) %>%  
  with(hist(x2, breaks = 50)) %>%  
  with(hist(x3, breaks = 50))
Error in hist(x2, breaks = 50) : object 'x2' not found

I've also tried:

birdsss%>%  
  with(hist(x1, breaks = 50)) &  
  with(hist(x2, breaks = 50)) &  
  with(hist(x3, breaks = 50))

and

birdsss%>%  
  with(hist(x1, breaks = 50)) ;  
  with(hist(x2, breaks = 50)) ; 
  with(hist(x3, breaks = 50))

What could be the solution to print multiple columns in one line?

Something like:

 birdsss%>%  
      with(hist(x1:x3, breaks = 50))

I'm using a longer pipe (filter(), select(), etc.) and what to finish with multiple graph. I simplified the code here.


Solution

  • lapply

    To put some of my comments from above into an answer, the simplest way make a histogram of each variable is

    # let's put them in a single plot
    par(mfrow = c(1, 3))
    
    lapply(birdsss, hist, breaks = 50)    # or chain into it: birdsss %>% lapply(hist, breaks = 50)
    
    # set back to normal
    par(mfrow = c(1, 1))
    

    This does mess up the labels, though:

    lapply plot

    Map/mapply

    To fix this with base, we'd need to iterate in parallel over the data and the labels, which can be done with Map or mapply (since we don't care about results—only the side effects—the difference doesn't matter):

    par(mfrow = c(1, 3))
    
    Map(function(x, y){hist(x, breaks = 50, main = y, xlab = y)}, 
        birdsss, 
        names(birdsss))
    
    par(mfrow = c(1, 1))
    

    Map plot

    Much prettier. However, if you want to chain into it, you'll need to use the . to show where the data is supposed to go:

    birdsss %>% 
        Map(function(x, y){hist(x, breaks = 50, main = y, xlab = y)}, 
            ., 
            names(.))
    

    purrr

    Hadley's purrr package makes *apply-style looping more obviously chainable (and though unrelated, working with lists easier) without worrying about .s. Here, since you're iterating for the side-effects and want to iterate over two variables, use walk2:

    library(purrr)
    
    walk2(birdsss, names(birdsss), ~hist(.x, breaks = 50, main = .y, xlab = .y))
    

    which returns the exact same thing as the previous Map call (if you set mfrow the same way), though without useless output to the console. (If you want that information, use map2 instead.) Note that the parameters to iterate over come first, though, so you can easily chain:

    birdsss %>% walk2(names(.), ~hist(.x, breaks = 50, main = .y, xlab = .y))
    

    ggplot

    On a completely different tack, if you're planning on sticking everything in a single plot eventually anyway, ggplot2 makes making related plots very easy with its facet_* functions:

    library(ggplot2)
    
    # gather to long form, so there is a variable of variables to split facets by
    birdsss %>% 
        tidyr::gather(variable, value) %>% 
        ggplot(aes(value)) + 
            # it sets bins intead of breaks, so add 1
            geom_histogram(bins = 51) + 
            # make a new "facet" for each value of `variable` (formerly column names), and 
            # use a convenient x-scale instead of the same for all 3
            facet_wrap(~variable, scales = 'free_x')
    

    ggplot version

    It looks a bit different, but everything is editable. Note you get nice labels without any work.