python plotly data-visualization plotly-dash

Plotly: How to create a line plot of a time series variable that has a multiple-color label?

I have datframe as df which has a column that I am passing under y as 'parameter' as shown below and it to be plotted against variable 'time'. This variable has 2 labels under the column 'labels' which is passed under the color.

import plotly.express as px
fig= px.line(data_frame= df,  x='time', y='parameter', color='labels')

Please find the images I have attached for the graph. Both images are of the same variable, but 2nd image is zoomed version of the first one to get better idea.

As you can see, I am plotting one variable against time and expecting separate colors for 2 labels, plotly is giving 2 separate lines in graph in color blue and red which looks quite messy and wrong. What changes should I make to have one continuous graph in 2 separate colors?

More explanation: I do not want the blue line running through red graph (please refer the attached images) and vice versa as I am plotting only one graph. I want graph as shown 3rd image. Thank you in advance.

Solution

Second suggestion

(Please read my first suggestion further down for a a few assumptions and conditions)

I've managed to build an approach that pretty much should cover all you're asking for here. The only detail that provides a real challenge is how gaps between traces are visualized sinc my second suggestion builds on adding a unique trace for each single label. You may suspect that this would potentially fill the legend with a bunch of duplicate names, but that is taken care of by grouping trace names by the associated label. I've also set up a dictionary where you can specify colors for each label. This is the result:

Plot 2.1 - Color defined by label

Notice the grey line? That's the result of the "connectivity" problem I described earlier. You can chose to hide or show that line by setting the opacity parameter (last number) in color='rgba(200,200,200,0.2)'. You'll find a complete code snippet to reproduce this figure below. There's a lot going on there to tweak this whole thing togeteher, so don't hesitate to ask about the details if anything is unclear.

Complete code:

# imports
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np
import random

# settings
observations = 100
np.random.seed(5)
value = np.random.uniform(low=-1, high=1, size=observations).tolist()
time = [t for t in pd.date_range('2020', freq='D', periods=observations)]#.format()]

df=pd.DataFrame({'time': time, 
                 'value':value})
df['value']=df['value'].cumsum()
df1 = df.copy()
df1=df1.set_index('time')

# custom function to build labels as conditions of parameter values
def classify(e):
    if e > 0.75: return 'high'
    if e > 0.25: return 'medium'
    if e >= 0: return 'low'
    
# custom function to set mode = line or marker, given data length
def modes(df):
    if len(df) > 1: return 'lines'
    else: return  'markers'
    
# dictionary to specify marker or line color
# this will depend on your real world labels !!!
cols = {'high': 'green',
         'medium': 'blue',
         'low': 'red'}

df['label1'] = [(elem-df['value'].min())/(df['value'].max()-df['value'].min()) for elem in df['value']]
df['label'] = [classify(elem) for elem in df['label1']]
df = df.drop('label1', 1)

df['group'] = df['label'].ne(df['label'].shift()).cumsum()
df = df.groupby('group')
dfs = []
for name, data in df:
    dfs.append(data)

fig = go.Figure()
# one line to connect them all
fig=go.Figure((go.Scatter(x=df1.index, y=df1['value'],
                          name = 'all data',
                          line=dict(color='rgba(200,200,200,0.7)'))))

showed = []
for frame in dfs:

    if frame['label'].iloc[0] not in showed:
        
        fig.add_trace(go.Scatter(x=frame['time'], y = frame['value'],
                                 mode = modes(frame),
                                 marker_color = cols[frame['label'].iloc[0]],
                                 legendgroup=frame['label'].iloc[0],
                                 name=frame['label'].iloc[0]))
        showed.append(frame['label'].iloc[0])
    else:
        fig.add_trace(go.Scatter(x=frame['time'], y = frame['value'],
                                 mode = modes(frame),
                                  marker_color = cols[frame['label'].iloc[0]],
                                 legendgroup=frame['label'].iloc[0],
                                 name=frame['label'].iloc[0],
                                 showlegend=False
                                ))
fig.update_layout(template='plotly_dark')
fig.update_xaxes(showgrid=False)
fig.update_layout(uirevision='constant')
fig.show()

First suggestion

How you should do this would depend highly on the structure of your dataset. By the sound of your question, I can only guess that it looks something like this:

         time     param   label
0  2020-01-01 -0.556014  medium
1  2020-01-02  0.185451    high
2  2020-01-03 -0.401111  medium
3  2020-01-04  0.436111    high
4  2020-01-05  0.412933    high
5  2020-01-06  0.636421    peak
6  2020-01-07  1.168237    peak
7  2020-01-08  1.205073    peak
8  2020-01-09  0.798674    peak
9  2020-01-10  0.174116    high

If so, then yon can quickly run into a problem of a weird connectivity between your datapoints if you'd like to display param with a line trace with different colors. The first thing that comes to mind is to combine a line of one color, with markers of multiple colors like this:

This will give you a nice interactivity where you can switch all elements on and off, perhaps to study only the parts of your data where label=='peak:

Let me know how this works out for you and we can talk some more details. You'll find a data sample and all details here:

Complete code:

# imports
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np
import random

# settings
observations = 100
np.random.seed(5); cols = list('a')
param = np.random.uniform(low=-1, high=1, size=observations).tolist()
time = [t for t in pd.date_range('2020', freq='D', periods=observations).format()]

df=pd.DataFrame({'time': time, 
                 'param':param})
df['param']=df['param'].cumsum()

def classify(e):
    if e > 0.9: return 'peak'
    if e > 0.75: return 'high'
    if e > 0.25: return 'medium'
    if e > 0.9: return 'low'
    if e >= 0: return 'bottom'

df['label1'] = [(elem-df['param'].min())/(df['param'].max()-df['param'].min()) for elem in df['param']]
df['label'] = [classify(elem) for elem in df['label1']]
df = df.drop('label1', 1)

fig=go.Figure((go.Scatter(x=df['time'], y=df['param'],
                         mode='lines',
                         line=dict(color='rgba(0,0,200,0.7)'))))
fig.add_traces(px.scatter(df, x='time', y='param', color='label').data)
fig.update_layout(template='plotly_dark')
fig.update_xaxes(showgrid=False)
fig.show()