Search code examples
pythonpandasmatplotlibxticks

How can I create xticks with varying intervals?


I am drawing a line chart using matplotlib as shown below.

import matplotlib.pyplot as plt
import pandas as pd
import io

temp = u"""
tenor,yield
1M,5.381
3M,5.451
6M,5.505
1Y,5.393
5Y,4.255
10Y,4.109
"""

data = pd.read_csv(io.StringIO(temp), sep=",")
plt.plot(data['tenor'], data['yield'])

Output: The tick intervals on the x-axis are all the same.

enter image description here

What I want : Set the tick interval of the x-axis differently as shown in the screen below

Is there any way to set tick intvel differently?

enter image description here


Solution

  • In the column 'tenor', 'M' represents month and 'Y' represents year. Create a 'Month' column with 'Y' scaled by 12 Months.

    It's more concise to plot the data directly with pandas.DataFrame.plot, and use .set_xticks to change the xtick-labels.

    Tested in python 3.12.0, pandas 2.1.1, matplotlib 3.8.0

    data = pd.read_csv(io.StringIO(temp), sep=",")
    
    # Add a column "Month" from the the column "tenor"
    data["Month"] = data['tenor'].apply(lambda x : int(x[:-1]) *12 if 'Y' in x else int(x[:-1]))
    
    # plot yield vs Month
    ax = data.plot(x='Month', y='yield', figsize=(17, 5), legend=False)
    
    # set the xticklabels
    _ = ax.set_xticks(data.Month, data.tenor)
    

    The output:

    Output


    .apply with a lambda function is fastest

    • Given 6M rows
    data = pd.DataFrame({'tenor': ['1M', '3M', '6M', '1Y', '5Y', '10Y'] * 1000000})
    

    Compare Implementations with %timeit in JupyterLab

    • .apply & lambda
    %timeit data['tenor'].apply(lambda x: int(x[:-1]) *12 if 'Y' in x else int(x[:-1]))
    
    2 s ± 50.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    
    • .apply with a function call
    def scale(x):
        v = int(x[:-1])
        return v * 12 if 'Y' in x else v
    
    %timeit data['tenor'].apply(scale)
    
    2.02 s ± 20.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    
    • Vectorized np.where with assignment expression
    %timeit np.where(data.tenor.str.contains('Y'), (v := data.tenor.str[:-1].astype(int)) * 12 , v)
    
    2.44 s ± 26.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    
    • Vectorized np.where without assignment expression
    %timeit np.where(data.tenor.str.contains('Y'), data.tenor.str[:-1].astype(int) * 12 , data.tenor.str[:-1].astype(int))
    
    3.36 s ± 5.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)