Search code examples
pythoncountplotlyhistogramplotly-python

How to add a line plotting the number of sales on a histogram returning the evolution per month and per year of the sum of sales per category?


With Plotly i plotted a histogram returning the evolution per month of the sum of sales grouped by category.

Still with Plotly, i would like to add a line above tracing the evolution of the number of sales. I would like to get a marker for each month showing the number of sales.

Here is my code used for my histogram:

import plotly.express as px
import plotly.graph_objects as go

fig = px.histogram(
    dataset,
    x="Years and month",
    y="Price",
    color="Category",
    text_auto=".2f", 
    height=600,  
    width=980)  

fig.update_layout(
    bargap=0.2, 
    title_x=0.5)  
fig.update_xaxes(
    dtick="M1",
    tickformat="%b\n%Y")
fig.show()

I tried adding this line of code but only got a straight line along my x axis at the bottom of my bars:

fig.add_trace(go.Scatter(x=dataset["Years and month"], y=dataset["Price"],
                    mode='lines',
                    name="Sales"))
# I don't know what argument to put to have the count of dataset["Price"]

dataset's info:

# dataset.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 679111 entries, 0 to 679331
Data columns (total 3 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   Price             679111 non-null  float64
 1   Category          679111 non-null  int64  
 2   Years and month   679111 non-null  object 
dtypes: float64(1), int64(1), object(1)
memory usage: 20.7+ MB
None

Here a sample of my dataset:

Price    Category   Years and month
16.07       1          2021-12
9.28        0          2021-07
3.99        0          2021-03
27.46       1          2021-11
15.81       1          2022-03
17.99       0          2022-09
16.99       1          2022-01
9.41        0          2021-12
9.99        0          2022-05
8.99        0          2021-04

Small problem on top of that: my dataset has 679532 entries, which impacts my jupyter notebook when I am too greedy in requests (ex: go.scatter(mode="lines+markers") which crashes my notebook).

Here is a photo of my histogram with the desired result (black's line drawed with Paint): Histogram with line + marker


Solution

  • I finally found the solution myself.

    Edit: i rename the columns "Years and month" by "Year and month"

    To add a trace with plotly.express you must use:

    fig.add_traces(list(px.*the fig you want (ex: line; histogram; scatter; etc...)*(*all the arguments to trace your fig*).select_traces()))
    

    To obtain the desired aggregation, you had to do a groupby() followed by the column to aggregate.

    In order to obtain the number of products sold, you must use hover_data=[] ​​and indicate the data to be aggregated, example here:

    hover_data=[dataset.groupby(
        "Year and month")["Price"].count()]
    

    To get a line with markers, add .update_traces(mode='lines+markers') just before the .select_traces()

    Here is the full code for the solution:

    import plotly.express as px
    
    fig = px.histogram(dataset,
                       x="Year and month",
                       y="Price",
                       color="Category",
                       text_auto=".2f",
                       height=600,
                       width=980)
    
    fig.update_layout(bargap=0.2)
    fig.update_xaxes(dtick="M1", tickformat="%b\n%Y")
    
    fig.add_traces(
        list(
            px.line(dataset.groupby("Year and month")["Price"].sum(),
                    hover_data=[
                        dataset.groupby("Year and month")["Price"].count()
                    ]).update_traces(mode='lines+markers').select_traces()))
    fig.show()
    

    Here is a picture of the result (the text is in French): Here is a picture of the result (the text is in French)