Search code examples
pythonmatplotlibtime-seriesvisualization

How to Control Dates on x-axis in Matplotlib?


Okay so I've been working on this temperature timeseries data and I want to know if there is a way to control the dates on the x-axis. Matplotlib randomly select Dates for the x-axis. I mean I wanted to show the exact date beneath the anomaly and then later on name it as EQ while setting the other values as the number of days before and after the EQ.

for example the x-axis may become [-25, -20, -15, -10, -5, "EQ", 5, 10]

Here's the data that I am currently working with:

data

Here's what I want to achieve.

enter image description here

and here's the code that I've written so far!

enter image description here

fig, ax = plt.subplots(figsize=(8,4))

sns.scatterplot(data =at_day, x=at_day.index, y="AT_Day_Time_Values", hue=at_day["Legend"], palette="bright", s = 80);
sns.lineplot(at_day["AT_Day_Time_Values"], linewidth=1.5, linestyle = "--", color = "black", ax = ax, label="AT (Day)")
sns.rugplot(data =at_day, x=at_day.index, y="AT_Day_Time_Values", hue=at_day["Legend"],  ax= ax)

ax.set_xticks(at_day.index.date)

ax.xaxis.set_major_locator(mdates.DayLocator(interval=5))
ax.set(ylim=[295, 307], ylabel="K");
ax.grid(axis="y")

Please ignore other things like hue and stuff for now I am only interested in learning to control the dates on xaxis


Solution

  • You can automatically generate the ticks with this approach. I assumed you always wanted a difference of 5 days, so here you can also adjust this later if you want:

    import pandas as pd
    import numpy as np
    %matplotlib notebook
    import matplotlib.pyplot as plt
    df = pd.read_csv("air_temp.csv")
    df["Dates"] = pd.to_datetime(df["Dates"])
    # get limits for prompt
    minDate = df["Dates"].min()
    maxDate = df["Dates"].max()
    EQ = "" # init
    while EQ not in df["Dates"].values: # check value, do until...
        # prompt user to input dates between two values, liek t his yyyy-mm-dd
        EQ = input(f"Choose a date between {minDate.date()} and {maxDate.date()}: ")
        EQ = pd.to_datetime(EQ) # convert to date
    df['DatesDiff'] = df['Dates'] - EQ # get time difference
    plt.figure() # generate a figure
    plt.plot(df['DatesDiff'].dt.days, df['AT_Day_Time_Values']) # plot
    minVal = np.floor(min(df["DatesDiff"].dt.days) / 5) * 5 # get suitable limit in negative
    maxVal = np.ceil(max(df["DatesDiff"].dt.days) / 5) * 5 # get suitable limit in positive
    ticks = np.arange(minVal, maxVal + 1, 5) # define the ticks you want to set, they will include 0 and have 5 spacing
    labels = [str(int(tick)) if tick != 0 else 'EQ' for tick in ticks] # generate labels as string, with "EQ" at 0
    plt.xticks(ticks, labels) # set the ticks
    plt.title("EQ = "+str(EQ.date())) # set title for OP
    

    This is with EQ: 2022-08-05

    example 1

    I guess you would like to autoamtically identify the peak, so here is the way to do it:

    df = pd.read_csv("air_temp.csv")
    df["Dates"] = pd.to_datetime(df["Dates"])
    # get limits for prompt
    minDate = df["Dates"].min()
    maxDate = df["Dates"].max()
    EQ = df["Dates"].loc[df["AT_Day_Time_Values"].idxmax()] # get maximum temp
    df["DatesDiff"] = df["Dates"] - EQ # get time difference
    plt.figure() # generate a figure
    plt.plot(df["DatesDiff"].dt.days, df["AT_Day_Time_Values"]) # plot
    minVal = np.floor(min(df["DatesDiff"].dt.days) / 5) * 5 # get suitable limit in negative
    maxVal = np.ceil(max(df["DatesDiff"].dt.days) / 5) * 5 # get suitable limit in positive
    ticks = np.arange(minVal, maxVal + 1, 5) # define the ticks you want to set, they will include 0
    labels = [str(int(tick)) if tick != 0 else 'EQ' for tick in ticks] # generate labels as string, with "EQ" at 0
    plt.xticks(ticks, labels) # set the ticks
    plt.title("EQ = "+str(EQ.date())) # set title for OP
    

    I guess since it is max this time, this was easy. Detecting an anomaly would get more complicated when the data becomes even noisier. Here's the result:

    peak