Search code examples
rhistogramggvis

R programming ggvis histogram verses hist - How to size the buckets, and define X axis spacing (ticks)


I am learning to use ggvis and wanted to understand how to create the equivalent histogram to that produced by hist. Specifically, how do you set the bin widths and upper and lower bounds of x in ggvis histograms? What am I missing?

Question: How do I get the ggvis histogram output to match the hist output?

Let me provide an example:

require(psych)
require(RCurl)
require(ggvis)

if ( !exists("impact") ) {
  url <- "https://dl.dropboxusercontent.com/u/8272421/stat/stat_one.txt"
  myCsv <- getURL(url, ssl.verifypeer = FALSE)
  impact <- read.csv(textConnection(myCsv), sep = "\t")
  impact$subject <- factor(impact$subject)
}

describe(impact)

hist(impact$verbal_memory_baseline, 
     main = "Distribution of verbal memory baseline scores", 
     xlab = "score", ylab = "frequency")

Example Output of Hist

Ok, lets try and reproduce with ggvis... the output does not match...

impact %>%
ggvis( x = ~verbal_memory_baseline, fill := "white") %>%
layer_histograms(width = 5) %>%
add_axis("x", title = "score") %>%
add_axis("y", title = "frequency")

ggvis histogram output

How do I get the ggvis output to match the hist output?


> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.2 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] psych_1.5.6      knitr_1.11       ggvis_0.4.2.9000 setwidth_1.0-4  colorout_1.1-1   vimcom_1.2-3    

loaded via a namespace (and not attached):
[1] Rcpp_0.12.0          digest_0.6.8         dplyr_0.4.3.9000     assertthat_0.1       mime_0.3            
[6] R6_2.1.1             jsonlite_0.9.16      xtable_1.7-4         DBI_0.3.1            magrittr_1.5        
[11] lazyeval_0.1.10.9000 rstudioapi_0.3.1     rmarkdown_0.7        tools_3.2.2          shiny_0.12.2        
[16] httpuv_1.3.3         yaml_2.1.13          parallel_3.2.2       rsconnect_0.4.1.4    mnormt_1.5-3        
[21] htmltools_0.2.6

Solution

  • Try

    impact %>%
      ggvis( x = ~verbal_memory_baseline, fill := "white") %>%
      layer_histograms(width = 5, boundary = 5) %>% 
      add_axis("y", title = "frequency") %>%
      add_axis("x", title = "score", ticks = 5)
    

    Which gives:

    enter image description here


    The official documentation is a bit cryptic about how boundary and center works. Have a look at DataCamp's How to Make a Histogram with ggvis in R

    The width argument already set the bin width to 5, but where do bins start and where do they end? You can use the center or boundary argument for this. center should refer to one of the bins’ center value, which automatically determines the other bins location. The boundary argument specifies the boundary value of one of the bins. Here again, specifying a single value fixes the location of all bins. As these two arguments specify the same thing in a different way, you should set at most one of center or boundary.


    If you want the same result using center instead of boundary try:

    impact %>%
      ggvis( x = ~verbal_memory_baseline, fill := "white") %>%
      layer_histograms(width = 5, center = 77.5) %>% 
      add_axis("y", title = "frequency") %>%
      add_axis("x", title = "score", ticks = 5)
    

    Here you specify the center of a bin (77.5) and it determines all the others automatically