Search code examples
pythonmatplotlibseaborn

Formatting datetime64 dates as xticks


In a DataFrame I have column "Datum2" with datetime64 formatted dates (YYYYMMDD) that I want to use as xtick labels. My current plot looks liket this:

enter image description here

I want to change the labels to show the year and the abbreviations of the months as shown (e.g. 2022, Nov, Dec, 2023, Jan, Feb etc).

Currently I have this code:

import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats as stats
import numpy as np

df_tagesspiegel_final['Datum2'] = pd.to_datetime(df_tagesspiegel_final['Datum'], format='%Y%m%d')

# Scatterplot erstellen
sns.scatterplot(x='Datum2', y = 'compound', data=df_tagesspiegel_final)

#Quintile als xticks speichern
xticks = [df_tagesspiegel_final['Datum2'].min(), df_tagesspiegel_final['Datum'].median(numeric_only=True), df_tagesspiegel_final['Datum'].max()]

plt.gca().set(xticks=xticks, xlabel='Datum', ylabel='compound', title='Compound-Sentiment im Zeitverlauf')

plt.show()

How would I go about formatting the dates accordingly?
I feel like this might be a start https://matplotlib.org/stable/api/dates_api.html#matplotlib.dates.ConciseDateConverter
but to be honest, I am very new to python and I'm in way over my head


Solution

  • Here is an example using matplotlib.dates for formatting a custom datetime-formatted x-axis using seaborn

    • Every January tick is replaced with the (new) year label instead of month
    • ConciseDateFormatter is used for auto-abbreviating the months

    E.g.,:

    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt
    import matplotlib.dates as mdates
    
    # Sample data
    df_tagesspiegel_final = pd.DataFrame(
        {
            "Datum": [
                "20220101",
                "20220115",
                "20220221",
                "20220610",
                "20220903",
                "20221016",
                "20230201",
                "20230215",
            ],
            "compound": [0.11, 0.21, 0.3, 0.25, 0.47, 0.32, 0.48, 0.5],
        }
    )
    
    df_tagesspiegel_final["Datum2"] = pd.to_datetime(
        df_tagesspiegel_final["Datum"], format="%Y%m%d"
    )
    
    # Scatterplot construction
    fig, ax = plt.subplots(figsize=(6, 4), dpi=150)
    sns.scatterplot(
        x="Datum2", y="compound", data=df_tagesspiegel_final, ax=ax, zorder=2
    )
    
    # Set the locators for the x-axis
    months = mdates.MonthLocator()  # Every month
    years = mdates.YearLocator()  # Every year
    
    # Get current axes ("gca")
    ax = plt.gca()
    
    ax.xaxis.set_major_locator(months)
    ax.xaxis.set_minor_locator(years)
    
    # Set the date format
    ax.xaxis.set_major_formatter(mdates.ConciseDateFormatter(months))
    
    # Display the plot
    plt.grid(
        True, which="both", linestyle="--", linewidth=0.5, zorder=1, alpha=0.5
    )
    plt.xlabel("Datum")
    plt.ylabel("compound")
    plt.title("Compound-Sentiment im Zeitverlauf")
    plt.show()
    
    # Show data structure of sample data
    print(df_tagesspiegel_final)
    print(df_tagesspiegel_final.info())
    
    

    gives:

    Matplotlib.dates datetime-formatted custom x-axis tick labels in seaborn scatter plot.

          Datum  compound     Datum2
    0  20220101      0.11 2022-01-01
    1  20220115      0.21 2022-01-15
    2  20220221      0.30 2022-02-21
    3  20220610      0.25 2022-06-10
    4  20220903      0.47 2022-09-03
    5  20221016      0.32 2022-10-16
    6  20230201      0.48 2023-02-01
    7  20230215      0.50 2023-02-15
    
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 8 entries, 0 to 7
    Data columns (total 3 columns):
     #   Column    Non-Null Count  Dtype         
    ---  ------    --------------  -----         
     0   Datum     8 non-null      object        
     1   compound  8 non-null      float64       
     2   Datum2    8 non-null      datetime64[ns]
    dtypes: datetime64[ns](1), float64(1), object(1)
    memory usage: 320.0+ bytes
    None