Search code examples
rggplot2logarithm

Have ticks at edges of bins (instead of center) with ggplot2 in R?


I have the following R data frame nPhotosClassified:

> glimpse(nPhotosClassified)
Observations: 236
Variables: 2
$ person_id         <int> 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 194, 195, 199...
$ nPhotosClassified <int> 113, 164, 2126, 637, 75, 16, 161, 29, 15, 6338, 596, 18, 14, 63, 36777, 19117, 5625...

With it I tried to make a geom_histogram of the nPhotosClassified variable with ggplot2:

ggplot(data = nPhotosClassified, mapping = aes(x = nPhotosClassified)) + 
    geom_histogram(bins = 10) + 
    scale_x_log10(name = "Number of photos classified",
                  breaks = c(1, 10, 100, 1000, 10000)) + 
    ylab(label = "Number of users") + 
    theme_bw() + 
    geom_vline(xintercept = 100, colour = "red") +
    theme(# This gets rid of the whole border around the plot, but also makes
          # the axes disappear:
          panel.border = element_blank(), 
          # So manually add lines for the axes back:
          axis.line = element_line())

Which gives me this result:

enter image description here

For this question, I've added a red vertical line to indicate that the major tick marks fall on the center of these bins.

Question: How do I adjust the bins (or the tick marks???) so that all the tick marks fall on the edge of bins rather than in the middle of them?

For example, how do I end up with two bins between 1 and 10, two bins between 10 and 100, and so on? Please note that I want my x-axis to be on the log10 scale.

Thank you!

EDIT: Here is the full dataset:

> dput(nPhotosClassified)
structure(list(person_id = c(179L, 180L, 181L, 182L, 183L, 184L, 
185L, 186L, 187L, 188L, 189L, 190L, 191L, 192L, 194L, 195L, 199L, 
201L, 204L, 205L, 207L, 208L, 209L, 210L, 211L, 213L, 214L, 215L, 
216L, 217L, 219L, 220L, 221L, 222L, 223L, 224L, 225L, 226L, 227L, 
228L, 229L, 230L, 234L, 235L, 237L, 238L, 241L, 242L, 243L, 246L, 
249L, 250L, 251L, 252L, 253L, 255L, 256L, 259L, 261L, 264L, 265L, 
266L, 267L, 268L, 271L, 272L, 274L, 275L, 276L, 277L, 278L, 281L, 
282L, 283L, 285L, 294L, 296L, 298L, 299L, 302L, 304L, 305L, 307L, 
309L, 310L, 311L, 312L, 317L, 318L, 319L, 320L, 323L, 325L, 326L, 
327L, 330L, 331L, 332L, 335L, 341L, 344L, 347L, 348L, 363L, 367L, 
375L, 376L, 377L, 378L, 386L, 388L, 389L, 390L, 396L, 397L, 398L, 
399L, 401L, 402L, 404L, 406L, 407L, 409L, 412L, 413L, 414L, 415L, 
419L, 421L, 425L, 426L, 428L, 429L, 432L, 433L, 440L, 441L, 445L, 
448L, 452L, 456L, 461L, 462L, 464L, 468L, 471L, 473L, 474L, 475L, 
478L, 483L, 486L, 491L, 492L, 493L, 494L, 495L, 497L, 498L, 501L, 
502L, 505L, 509L, 512L, 518L, 520L, 532L, 533L, 535L, 537L, 539L, 
540L, 543L, 544L, 550L, 551L, 552L, 554L, 562L, 564L, 581L, 582L, 
590L, 592L, 593L, 597L, 599L, 601L, 602L, 612L, 618L, 622L, 632L, 
634L, 635L, 637L, 650L, 651L, 658L, 659L, 660L, 661L, 665L, 666L, 
668L, 671L, 672L, 675L, 684L, 686L, 693L, 697L, 705L, 708L, 719L, 
725L, 726L, 730L, 733L, 734L, 752L, 756L, 777L, 785L, 789L, 791L, 
796L, 797L, 799L, 800L, 802L, 807L, 808L, 810L, 813L, 814L), 
    nPhotosClassified = c(113L, 164L, 2126L, 637L, 75L, 16L, 
    161L, 29L, 15L, 6338L, 596L, 18L, 14L, 63L, 36777L, 19117L, 
    5625L, 584L, 3477L, 541L, 6L, 6L, 112L, 8L, 5L, 290L, 120L, 
    12L, 9L, 2675L, 9L, 4L, 657L, 149L, 151L, 8L, 4104L, 285L, 
    192L, 734L, 5L, 129L, 155L, 11L, 516L, 410L, 55L, 1L, 581L, 
    293L, 28L, 17810L, 2690L, 5L, 587L, 359L, 9L, 493L, 404L, 
    21L, 3L, 2L, 91L, 23L, 3L, 728L, 29L, 1540L, 10556L, 1L, 
    54L, 905L, 25L, 22L, 1L, 14L, 16L, 13L, 10L, 21L, 121L, 7870L, 
    53L, 1777L, 11L, 850L, 35L, 635L, 7L, 5728L, 1972L, 3613L, 
    16L, 51L, 131L, 77L, 267L, 718L, 11L, 18L, 5088L, 113L, 48L, 
    302L, 33L, 44L, 20L, 22L, 7L, 30L, 8L, 69L, 4L, 11L, 2428L, 
    3131L, 2459L, 12L, 150L, 21L, 702L, 10L, 23L, 38L, 1L, 1L, 
    24L, 10L, 6L, 1443L, 221L, 4363L, 27L, 46L, 9L, 8L, 10633L, 
    56L, 38L, 20L, 171L, 36L, 5L, 3L, 108L, 10L, 559L, 83L, 60L, 
    3L, 9L, 697L, 100L, 27L, 114L, 186L, 8127L, 10L, 58L, 76L, 
    472L, 6L, 72L, 3748L, 130L, 9L, 2459L, 80L, 468L, 198L, 4L, 
    108L, 35L, 10L, 310L, 207L, 499L, 20L, 32L, 1178L, 730L, 
    999L, 13L, 1L, 5L, 2L, 1L, 178L, 4L, 31L, 16L, 1592L, 385L, 
    73L, 698L, 4L, 42L, 90L, 772L, 509L, 1L, 17L, 17L, 36L, 987L, 
    395L, 15L, 23194L, 16L, 956L, 15L, 5614L, 3L, 1700L, 74L, 
    65L, 18L, 389L, 35L, 8L, 3L, 9L, 1271L, 12L, 80L, 117L, 356L, 
    3L, 59L, 85L, 382L, 8L, 6L, 33L, 5L, 119L)), class = c("tbl_df", 
"tbl", "data.frame"), .Names = c("person_id", "nPhotosClassified"
), row.names = c(NA, -236L))

Solution

  • In the end, I thought using the breaks argument to be the most straightforward way to think about this, mostly due to the complication of an x scale transformation.

    The histogram bin breaks need to ultimately be set on the transformed scale. This translates to setting the histogram breaks on the scale of log10(nPhotosClassified).

    The breaks depends on the range of log10(nPhotosClassified).

    with(nPhotosClassified, range(log10(nPhotosClassified)) )
    
    [1] 0.000000 4.565576
    

    So the breaks need to go from 0 to 5. You wanted these evenly spaced between integers (i.e., 2 bins per 10^integer), so we want a break every 0.5 units.

    ggplot(data = nPhotosClassified, mapping = aes(x = nPhotosClassified)) + 
         geom_histogram(breaks = seq(0, 5, by = .5) ) + 
         scale_x_log10(name = "Number of photos classified",
                       breaks = c(1, 10, 100, 1000, 10000))
    

    enter image description here

    There may be a less manual way to do this, but the other arguments to control the histogram bins, like boundary, didn't seem to translate well with scale transformation.