Search code examples
pythonplotly

Ploting timeseries graph with ploty


I have some transaction data containing product sold, datetime and value. I am organizing such information on a daily graph with 5min cumulative value using plotly. I can create 100% "ok" graph like this :

df = teste.groupby([pd.Grouper(key='data', freq='5min', origin='start_day', convention = 'start', dropna = True, sort=True, closed = 'left')]).aggregate({'gmv' :'sum'}).reset_index()
df.sort_values(by='data', inplace=True)
dti = pd.date_range(df['data'].min().normalize(), df['data'].max(), freq='5min', name='data')
df = df.set_index('data').reindex(dti, fill_value=0).reset_index()
df["cum_sale"]=df.groupby([df['data'].dt.date])['gmv'].cumsum(axis=0)

df['time'] = df['data'].dt.time
df['date'] = df['data'].dt.date
    
fig = px.line(df, x="time", y="cum_sale", color="date")
fig.show()

This code achieve this graphics

enter image description here

Recently I have some days where there were a discount over some products, so I want to visualize such days in a different color of all other.

I´ve just added this code to set which days are promotional e which are not.

df['promo'] = False
df.loc[df['date'].isin([date(2023,6,16), date(2023,6,17), date(2023,6,18)]), 'promo'] = True
fig = px.line(df, x="time", y="cum_sale", color="promo")
fig.show()

After that, I just try to plot using the "promo" column for color instead of "date", but ploty create some crazy lines over the graphic and I can´t remove it.

enter image description here

Any tip?

Update

After updating my code with the answer, the weird lines do disappear

enter image description here


Solution

  • Without a minimum reproducible code it's a bit hard to be sure, but I think the following is happening:

    In the first graph, because you assign colors to a date, PX splits these into different traces and assigns a different color to each trace. If you are familiar with plotly.graph_objects (go.Figure()) this should sound familiar.

    Now in the second version, you no longer implicitly separate the days anymore. The variables that go into the graph are "time", "cum_sale" and "promo". Nothing about date. So in this case it splits the traces over "promo", because you told it to. So what it did is create two graphs: one for TRUE and one for FALSE. And the straight lines are the line graph connecting the end of one day to the start of the next.

    The most controlled solution is to switch to plotly.go.Figure() plots.

    A quick fix would be to assign different symbols to the PX graphs, so it splits over another variable.

    Change this line:

    fig = px.line(df, x="time", y="cum_sale", color="date")
    

    into

    fig = px.line(df, x="time", y="cum_sale", color="promo", symbol="date")
    

    See if that works.