In a DataFrame I have column "Datum2" with datetime64 formatted dates (YYYYMMDD) that I want to use as xtick labels. My current plot looks liket this:
I want to change the labels to show the year and the abbreviations of the months as shown (e.g. 2022, Nov, Dec, 2023, Jan, Feb etc).
Currently I have this code:
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats as stats
import numpy as np
df_tagesspiegel_final['Datum2'] = pd.to_datetime(df_tagesspiegel_final['Datum'], format='%Y%m%d')
# Scatterplot erstellen
sns.scatterplot(x='Datum2', y = 'compound', data=df_tagesspiegel_final)
#Quintile als xticks speichern
xticks = [df_tagesspiegel_final['Datum2'].min(), df_tagesspiegel_final['Datum'].median(numeric_only=True), df_tagesspiegel_final['Datum'].max()]
plt.gca().set(xticks=xticks, xlabel='Datum', ylabel='compound', title='Compound-Sentiment im Zeitverlauf')
plt.show()
How would I go about formatting the dates accordingly?
I feel like this might be a start https://matplotlib.org/stable/api/dates_api.html#matplotlib.dates.ConciseDateConverter
but to be honest, I am very new to python and I'm in way over my head
matplotlib.dates
for formatting a custom datetime-formatted x-axis using seabornConciseDateFormatter
is used for auto-abbreviating the monthsE.g.,:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# Sample data
df_tagesspiegel_final = pd.DataFrame(
{
"Datum": [
"20220101",
"20220115",
"20220221",
"20220610",
"20220903",
"20221016",
"20230201",
"20230215",
],
"compound": [0.11, 0.21, 0.3, 0.25, 0.47, 0.32, 0.48, 0.5],
}
)
df_tagesspiegel_final["Datum2"] = pd.to_datetime(
df_tagesspiegel_final["Datum"], format="%Y%m%d"
)
# Scatterplot construction
fig, ax = plt.subplots(figsize=(6, 4), dpi=150)
sns.scatterplot(
x="Datum2", y="compound", data=df_tagesspiegel_final, ax=ax, zorder=2
)
# Set the locators for the x-axis
months = mdates.MonthLocator() # Every month
years = mdates.YearLocator() # Every year
# Get current axes ("gca")
ax = plt.gca()
ax.xaxis.set_major_locator(months)
ax.xaxis.set_minor_locator(years)
# Set the date format
ax.xaxis.set_major_formatter(mdates.ConciseDateFormatter(months))
# Display the plot
plt.grid(
True, which="both", linestyle="--", linewidth=0.5, zorder=1, alpha=0.5
)
plt.xlabel("Datum")
plt.ylabel("compound")
plt.title("Compound-Sentiment im Zeitverlauf")
plt.show()
# Show data structure of sample data
print(df_tagesspiegel_final)
print(df_tagesspiegel_final.info())
gives:
Datum compound Datum2
0 20220101 0.11 2022-01-01
1 20220115 0.21 2022-01-15
2 20220221 0.30 2022-02-21
3 20220610 0.25 2022-06-10
4 20220903 0.47 2022-09-03
5 20221016 0.32 2022-10-16
6 20230201 0.48 2023-02-01
7 20230215 0.50 2023-02-15
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Datum 8 non-null object
1 compound 8 non-null float64
2 Datum2 8 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 320.0+ bytes
None