Search code examples
rggplot2bins

Change order of bins in geom_histogram?


I'm using ggplot2 and trying to change the order of bins. I'm using the data for NY's Stop and Frisk program found here: http://www.nyclu.org/content/stop-and-frisk-data

The times are given as integers (ex: 5 = 12:05 AM, 355 = 3:55 AM, 2100 = 9 PM).

I used the following to create a histogram of the times of stops

myplot <- ggplot(Stop.and.Frisk.2011) + geom_histogram(aes(x=timestop),binwidth=300)

This gave me a fairly good graph of times, with the bins going from Midnight-3 AM, 3AM - 6 AM, 6 AM - 9 AM, etc.

However, I'm hoping to move the first two bins (Midnight - 3 AM and 6 AM - 9 AM) to the end to simulate more of a normal work day.

Is there a simple way to change the order of the bins? I've tried using the breaks function, but can't find a way to get it to loop back around.

Essentially, I want the bins to be in the following order: 600-900, 900-1200, 1200-1500, 1500-1800, 1800-2100, 2100-2400, 0-300, 300-600.

Thanks in advance!


Solution

  • One approach is to bin the data before calling ggplot. Here is an example that uses the cut function to create 3-hour intervals:

    # Load ggplot2 for plotting
    library(ggplot2)
    
    # Read in the data
    df <- read.csv('SQF 2012.csv', header = TRUE)
    
    # Create intervals every 3 hours based
    # on the `timestop` variable
    df$intervals <- cut(df$timestop,
                        breaks = c(0, 300, 600,
                                   900, 1200, 1500,
                                   1800, 2100, 2400))
    
    # Re-order the sequence prior to plotting
    df$sequence <- ifelse(df$intervals == '(600,900]', 1, NA)
    df$sequence <- ifelse(df$intervals == '(900,1.2e+03]', 2, df$sequence)
    df$sequence <- ifelse(df$intervals == '(1.2e+03,1.5e+03]', 3, df$sequence)
    df$sequence <- ifelse(df$intervals == '(1.5e+03,1.8e+03]', 4, df$sequence)
    df$sequence <- ifelse(df$intervals == '(1.8e+03,2.1e+03]', 5, df$sequence)
    df$sequence <- ifelse(df$intervals == '(2.1e+03,2.4e+03]', 6, df$sequence)
    df$sequence <- ifelse(df$intervals == '(0,300]', 7, df$sequence)
    df$sequence <- ifelse(df$intervals == '(300,600]', 8, df$sequence)
    df$sequence <- as.numeric(df$sequence)
    
    # Create the plot
    ggplot(df, aes(x = sequence)) +
      geom_histogram(binwidth = 0.5) +
      scale_x_continuous(breaks = c(1, 2, 3, 4, 5, 6, 7, 8),
                         labels = c('6AM-9AM', '9AM-12PM', '12PM-3PM', '3PM-6PM',
                                    '6PM-9PM', '9PM-12AM', '12AM-3AM', '3AM-6AM')) +
      xlab('Time') +
      ylab('Number\n') + 
      theme(axis.text = element_text(size = rel(1.1))) +
      theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
      theme(axis.title = element_text(size = rel(1.1), face = 'bold'))
    

    Output