Search code examples
rggplot2zoogeom-col

geom_col is not using stat_identify when values are rounded to whole numbers


I'm trying to use geom_col to chart columns for values in time series (annual and quarterly).

When I use Zoo package's YearQtr datatype for the x-axis values and I round the y-axis values to a whole number, geom_col appears to not use the default postion = 'identity' for determining the column bar heights based on the y-value of each occurrence. Instead it appears to switch to position = 'count' and treats the rounded y-values as factors, counting the number of occurrences for each factor value (e.g., 3 occurrences have a rounded y-value = 11)

If I switch to geom_line, the graph is fine with quarterly x-axis values and rounded y-axis values.

library(zoo)
library(ggplot2)

Annual.Periods <- seq(to = 2020, by = 1, length.out = 8) # 8 years
Quarter.Periods <- as.yearqtr(seq(to = 2020, by = 0.25, length.out = 8)) # 8 Quarters

Values <- seq(to = 11, by = 0.25, length.out = 8)

Data.Annual.Real <- data.frame(X = Annual.Periods, Y = round(Values, 1))
Data.Annual.Whole <- data.frame(X = Annual.Periods, Y = round(Values, 0))
Data.Quarter.Real <- data.frame(X = Quarter.Periods, Y = round(Values, 1))
Data.Quarter.Whole <- data.frame(X = Quarter.Periods, Y = round(Values, 0))

ggplot(data = Data.Annual.Real, aes(X, Y)) + geom_col()
ggplot(data = Data.Annual.Whole, aes(X, Y)) + geom_col()
ggplot(data = Data.Quarter.Real, aes(X, Y)) + geom_col()
ggplot(data = Data.Quarter.Whole, aes(X, Y)) + geom_col() # appears to treat y-values as factors and uses position = 'count' to count occurrences (e.g., 3 occurrences have a rounded Value = 11)

ggplot(data = Data.Quarter.Whole, aes(X, Y)) + geom_line() 

rstudioapi::versionInfo()
# $mode
# [1] "desktop"
# 
# $version
# [1] ‘1.3.959’
# 
# $release_name
# [1] "Middlemist Red"

sessionInfo()
# R version 4.0.0 (2020-04-24)
# Platform: x86_64-apple-darwin17.0 (64-bit)
# Running under: macOS Mojave 10.14.6
# 
# Matrix products: default
# BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
# LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
# 
# locale:
#   [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
# 
# attached base packages:
#   [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
#   [1] ggplot2_3.3.1 zoo_1.8-8 

Solution

  • ggplot tries to guess the orientation of its geom_col()-function, meaning which variable serves as the base of the bars and which as the values to represent. Apparently without any decimal numbers in your Y- variable it choses it as it's base (it stays numeric though, no conversion to factor), and sums up your quarters.

    For cases like this you can provide geom_col() with the information what variable to use as the base of the bars via the orientation=argument:

    ggplot(data = Data.Quarter.Whole, aes(X, Y)) + geom_col(orientation = "x") 
    
    

    EDIT: I have just seen that Roman answered it in the comments.

    enter image description here