Search code examples
pandasdataframeplottime-seriestimeserieschart

Plotting timeseries data with multiple categories


I have a dataset from a production line, which is formatted as time series data. There is a batch column, which indicates the name of the batch (str), and there is a phase column which indicates the phase of the production (str). I am working with the datetime as the index of the pandas DataFrame.

I want to plot this data on timeseries graph, overlaying the data from each phase and distinguishing each batch (i.e. different colour), with each process variable (i.e. temp1, temp2, press1, press2) on a different axis (as per the diagram) How can this be done?

EDIT: for clarity, I need the trends to be plotted against a datetime baseline, otherwise they will not overlay.

enter image description here

Example of the dataset: | datetime | temp1 | temp 2|press1|press2|batch | phase | |:---- |:--: | :--: | :--: | :--: |:--: |:--: | | 2023-02-03 15:45:34| 34.45 | 23.34 | 13.23| 45.5 | 'D' | '10-Wait' | | ... | ... | ... | ... | ... | 'D' | ... | | 2023-02-03 15:55:34| 36.55 | 22.14 | 18.23| 78.5 | 'D' | '20-Initialise'|

To create a similar dataset -to mine- you can use the following code:

import numpy as np
import pandas as pd
import datetime  

date = pd.date_range(start='1/1/2023', end='10/06/2023', freq=datetime.timedelta(seconds=30))
tags = ['temp1','temp2','press1','press2']
data=np.random.rand(len(date),len(tags))
df=pd.DataFrame(data,columns=tags).set_index(date)

batches = ['A','B','C','D','E','F','G']
n=len(batches)
period_start = pd.to_datetime('1/1/2023')
period_end = pd.to_datetime('10/06/2023')
batch_start = (pd.to_timedelta(np.random.rand(n) * ((period_end - period_start).days + 1), unit='D') + period_start)
batch_end = (batch_start + pd.to_timedelta(8,unit='H'))

df_batches = pd.DataFrame(data=[batch_start,batch_end],columns=[batches],index=['start','end']).T

for item in batches:
    start_time = df_batches['start'][item]
    end_time = df_batches['end'][item]
    df.loc[((df.index>=start_time)&(df.index<=end_time)), 'batch'] = item
df.dropna(subset=['batch'],inplace=True)

df['phase']=''
phases = ['10-Wait','20-Initialise','30-Warm','40-Running']

for batch in batches:
    wait_len = int(len(df[df['batch']==batch].index)*0.2)
    init_len = int(len(df[df['batch']==batch].index)*0.4)
    warm_len = int(len(df[df['batch']==batch].index)*0.6)
    run_len = int(len(df[df['batch']==batch].index))
   
    wait_start = df[df['batch']==batch].index[0]
    wait_end = df[df['batch']==batch].index[wait_len]
    init_end = df[df['batch']==batch].index[init_len]
    warm_end = df[df['batch']==batch].index[warm_len]
    run_end = df[df['batch']==batch].index[-1]  
 
    df['phase'].loc[wait_start:wait_end] = phases[0]
    df['phase'].loc[wait_end:init_end] = phases[1]
    df['phase'].loc[init_end:warm_end] = phases[2]
    df['phase'].loc[warm_end:run_end] = phases[3]

df.to_csv('stackoverflowqn.csv')

Solution

  • Credit to @AvishWagde who definitely broke the back of the problem. The 1 missing ingredient was having the x-axis of each plot baselined against zero.

    The solution to baselining these plots was to create a new Timedelta column which starts from 00:00:00 and goes upwards, in increments of 00:00:30.

    In Avish's code he uses:

    for batch in batches:
        batch_data = df[(df['batch'] == batch) & (df['phase'] == phase)]
        axes[row, col].plot(batch_data.index, batch_data[variable], label=batch, marker='o')
    

    However, since the Dataframe index is a Datetime, plotting this on the x-axis will not result in a comparison of the data. As stated they need to be plotted against a baseline. Using Timedelta on the x-axis allows comparison of the process data in each phase. In this case the 00:00:00 is taken to be the start of each phase. This dataset was recorded at 30s intervals, and it is necessary to convert the Timedelta from Index to Series, as per this line pd.to_timedelta(np.arange(0,len(batch_data)*30,30),unit='s').to_series() which results in this slight change:

    for batch in batches:
        batch_data = df[(df['batch'] == batch) & (df['phase'] == phase)] 
        baseline_time = pd.to_timedelta(np.arange(0,len(batch_data)*30,30),unit='s').to_series()
        batch_data = batch_data.set_index(baseline_time)
        axes[row, col].plot(batch_data.index, batch_data[variable], label=batch, marker='')
    

    For the full working code:

    import pandas as pd
    import numpy as np
    import seaborn as sns
    import matplotlib.pyplot as plt
    
    df = pd.read_csv('stackoverflowqn.csv',index_col=[0])
    df.index = pd.to_datetime(df.index)
    
    sns.set_style("whitegrid")
    
    phases = df['phase'].unique()
    batches = df['batch'].unique()
    variables = ['temp1', 'temp2', 'press1', 'press2']  # List of process variables
    
    num_cols = 2  # Number of columns for the subplot grid
    num_rows = (len(variables) + num_cols - 1) // num_cols
    
    for phase in phases:
        fig, axes = plt.subplots(nrows=num_rows, ncols=num_cols, figsize=(15, 6 * num_rows))
        plt.subplots_adjust(hspace=0.5)
        plt.suptitle(phase, y=1.02)
    
        for idx, variable in enumerate(variables):
            row = idx // num_cols
            col = idx % num_cols
    
            for batch in batches:
                batch_data = df[(df['batch'] == batch) & (df['phase'] == phase)]
                baseline_time = pd.to_timedelta(np.arange(0,len(batch_data)*30,30),unit='s').to_series()
                batch_data = batch_data.set_index(baseline_time)
                axes[row, col].plot(batch_data.index, batch_data[variable], label=batch, marker='')
    
            axes[row, col].set_title(variable)
            axes[row, col].set_xlabel('time')
            axes[row, col].set_ylabel(variable)
            axes[row, col].legend()
    
        plt.tight_layout()
        plt.show()
    

    Wait: 2 x variables Output from the working code - showing Wait phase 10 Initialise: 2 x variables Output from the working code - showing Initialise phase 20
    Warm: 2 x variables Output from the working code - showing Warm phase 30