The question is pretty long because of the pictures, but there isn't much content in reality. Question at the bottom.
Hi, I have a series of 30000 samples of ages ranging from 21 to 74. Series head:
0 24
1 26
2 34
3 37
4 57
Name: AGE, dtype: int64
I plot it using built-in Pandas feature .plot
:
age_series = original_df['AGE']
fig = plt.figure()
fig.suptitle('Age distribution')
age_series.value_counts().sort_index().plot(kind='bar')
My problem is that it makes the x-axis not really user-friendly:
I could increase the horizontal width between bars, but I don't want to do that. Instead, I'd like to make only a subset of the x-axis labels visible.
I tried using MaxNLocator
and MultipleLocator
adding this line:
plt.gca().xaxis.set_major_locator(plt.MaxNLocator(10))
However, it doesn't achieve my goals, as it now incorrectly labels bars and removes ticks (which I understand since using these functions change the xticks object):
An ugly solution is to loop within the xticks object:
xticks = plt.gca().xaxis.get_major_ticks()
for i in range(len(xticks)):
if i % 10 != 0:
xticks[i].set_visible(False)
Allowing this render, which is close to what I want:
I'm not satisfied however, as the loop is too naive. I'd like to be able to access values from the xticks (the label) and make a decision upon it, to be able to show only multiple of 10 labels.
This works (based upon this answer):
for i, l in enumerate(labels):
val = int(l.get_text())
if val % 10 != 0:
labels[i] = ''
plt.gca().set_xticklabels(labels)
Question: Is there any different solution, which feels more Pythonic/efficient ? Or do you have suggestions on how to make this data readable ?
To be more generic you could do something like that:
import numpy as np
ax = plt.gca()
max_value = original_df['AGE'].max()
min_value = original_df['AGE'].min()
number_of_steps = 5
l = np.arange(min_value, max_value+1, number_of_steps)
ax.set(xticks=l, xticklabels=l)