Search code examples
pythonggplot2python-ggplot

python ggplot geom_bar y axis incorrect values


df:

duration status    line
75526    Good      A
75526    Muy buen  B
75546    pas mal   C
75516    loco      D

I am plotting via:

p = ggplot(aes(x='status',weight='duration',fill='line'),data=df) + geom_bar(stat='identity')

Importantly, I am using stat='identity' to ensure the y-axis is the column value and not some density measurement. Yet, it is showing incorrect y-axis values.

I can compute the maximum duration value and I see that this is around the 86,000 mark (i.e 24hrs in seconds). Why is the plot showing seconds in excess of 250,000?

enter image description here


Solution

  • This plot is going to group the dataframe by status and line and use the sum of durations (aka weights) in every group as the bar height. Some groups must have multiple entries, that's where these extra tall bars come from.