Search code examples
pythonpython-3.xpandasmatplotlibaxis-labels

Hiding xticks labels every n-th label or on value on Pandas plot / make x-axis readable


The question is pretty long because of the pictures, but there isn't much content in reality. Question at the bottom.

Hi, I have a series of 30000 samples of ages ranging from 21 to 74. Series head:

0    24
1    26
2    34
3    37
4    57
Name: AGE, dtype: int64

I plot it using built-in Pandas feature .plot:

age_series = original_df['AGE']
fig = plt.figure()
fig.suptitle('Age distribution')
age_series.value_counts().sort_index().plot(kind='bar')

My problem is that it makes the x-axis not really user-friendly: Original plotting

I could increase the horizontal width between bars, but I don't want to do that. Instead, I'd like to make only a subset of the x-axis labels visible. I tried using MaxNLocator and MultipleLocator adding this line:

plt.gca().xaxis.set_major_locator(plt.MaxNLocator(10))

However, it doesn't achieve my goals, as it now incorrectly labels bars and removes ticks (which I understand since using these functions change the xticks object): MaxNLocator(10)

An ugly solution is to loop within the xticks object:

xticks = plt.gca().xaxis.get_major_ticks()
for i in range(len(xticks)):
    if i % 10 != 0:
        xticks[i].set_visible(False)

Allowing this render, which is close to what I want: enter image description here

I'm not satisfied however, as the loop is too naive. I'd like to be able to access values from the xticks (the label) and make a decision upon it, to be able to show only multiple of 10 labels.

This works (based upon this answer):

for i, l in enumerate(labels):
    val = int(l.get_text())
    if val % 10 != 0:
        labels[i] = ''
    plt.gca().set_xticklabels(labels)

Ugly workaround

Question: Is there any different solution, which feels more Pythonic/efficient ? Or do you have suggestions on how to make this data readable ?


Solution

  • To be more generic you could do something like that:

    import numpy as np
    
    ax = plt.gca()
    
    max_value = original_df['AGE'].max()
    min_value = original_df['AGE'].min()
    number_of_steps = 5
    l = np.arange(min_value, max_value+1, number_of_steps)
    
    ax.set(xticks=l, xticklabels=l)