Search code examples
pythondata-visualizationhistogramlogarithm

Is it necessary to use a plt.bar() to center labels in a log histogram?


I have been struggling to center the labels under the histogram bars in this log plot. Specifically, I was hoping for the first bar to be over the "1" as this is the first value.

All of the solutions online seem to suggest using the plt.bar() function, which I haven't gotten to work correctly with the log scale and bin size. Do I need to start over from scratch with a bar graph? Any other tips for fixing this to center the histogram? A link to the data is here: https://drive.google.com/file/d/1USxTNcxveKoM1X_-TTn6ZX4GcUdRm2a7/view?usp=sharing. The code and current figure are below:

import pandas as pd
import matplotlib.pyplot as plt

MJ = pd.read_excel('MJ1a_data.xlsx')
hist = plt.hist(x = MJ.MJ1a, 
bins=np.logspace(start=np.log10(1), stop=np.log10(10000), num=25), rwidth = .7)
plt.gca().set_xscale('log')
plt.xticks(ticks = [1,10,100,1000,10000],labels = [1,10,100,1000,10000], horizontalalignment = 'center', fontname = "Arial", fontsize = 14, fontweight = 'medium')
plt.yticks(fontname = "Arial", fontsize = 14, fontweight = 'medium')
plt.xlabel("Total Lifetime MJ Use", fontname = 'Arial', fontsize = 14)
plt.ylabel("Frequency", fontname = 'Arial', fontsize = 14)

enter image description here


Solution

  • A histogram and a bar plot are different plot types for different situations. A histogram is used to show the approximate distribution of a continuous variable, whereas a bar plot is used to compare the frequencies of several categories.

    Since your x-variable is continuous, a histogram is definitely the right choice over a bar plot. But now you are apparently trying to make the histogram look like a bar plot. I would strongly recommend to not do that, because it would be misleading.

    Specifically, don't set the width to a lower value. The "bars" are supposed to touch each other in a histogram, because they show densities for the specified bins, and there are no gaps between the bins. The 1 should not be centered under the first "bar", because it is the left border of the first bin. Centering it would imply that it were the average value within the first bin.