Search code examples
pythonplotlyline

Grouped dates are 30 days behind for Plotly Express line graphs


I have a list of daily transactions that I am trying to plot on a line graph. I decided to group by month and year and sum those groupings. The data plots on the Plotly line graph as expected except the end dates are 30 days behind. This makes it difficult if I want to add/subtract the dates to obtain a certain date range.

To get a certain date range, I am not using the grouped dates but the original dates and applying relativedelta. How can I resolve this?

import pandas as pd
from datetime import datetime, timedelta
import plotly.express as px
import sqlite3
import numpy as np
from dateutil.relativedelta import relativedelta


data = {
'Transaction_type':[ 'Debit', 'Debit', 'Credit','Debit','Debit','Debit', 'Debit', 'Credit','Debit','Debit'],
    'Amount': [40,150,1000,60,80,120, 80, 1000,500,80]
}

df = pd.DataFrame(data)
df['Date'] = pd.date_range(start='6/1/2022',end='7/30/2022', periods = len(df))

df['Date'] = pd.to_datetime(df['Date'])
df['year_month'] = df['Date'].dt.strftime('%Y-%m')


#Income Expense Visual
Income_Expense = df.copy()
Income_Expense.Transaction_type.replace(['credit'], 'Income', inplace= True) #Change to Income for line legend
Income_Expense.Transaction_type.replace(['debit'], 'Expense', inplace= True) #Change to Expense for line legend

Income_Expense = pd.pivot_table(Income_Expense, values = ['Amount'], index = ['Transaction_type', 'year_month'],aggfunc=sum).reset_index()

scatter_plot = px.line(Income_Expense, x = 'year_month', y = 'Amount', color = 'Transaction_type', title = 'Income and Expense', color_discrete_sequence= ['red','green'],
                       category_orders= {'Cash Flow': ['Expense', 'Income']})
scatter_plot.update_layout(legend_traceorder = 'reversed')
scatter_plot.update_layout(yaxis_tickformat = ',')




scatter_plot.show()

Solution

  • The reason for the error is the strftime(). This will convert your date to a string. From that point onwards, plotly thinks of each date as a string. So, the names are not as you may want it. You can do a Income_Expense.info() to check

    So, you need to leave the dates in the datetime format. pandas Grouper can be used to group the dates by monthly frequency. You can then plot it and specify the date format so that plotly understands that these are dates. Below is the updated code.

    Note that Date needs to be in index for grouper to work. So, first I do this by the set_index(), then use the grouper with frequency as month along Transaction type, then do a sum and reset_index. This will create a dataframe that looks like the one you had, except that these are now datetime, not strings.

    import pandas as pd
    from datetime import datetime, timedelta
    import plotly.express as px
    import sqlite3
    import numpy as np
    from dateutil.relativedelta import relativedelta
    
    data = {'Transaction_type':[ 'Debit', 'Debit', 'Credit','Debit','Debit','Debit', 'Debit', 'Credit','Debit','Debit'], 'Amount': [40,150,1000,60,80,120, 80, 1000,500,80]}
    
    df = pd.DataFrame(data)
    df['Date'] = pd.date_range(start='6/1/2022',end='7/30/2022', periods = len(df))
    df['Date'] = pd.to_datetime(df['Date'])
    df['year_month'] = df['Date'].dt.strftime('%Y-%m')
    #Income Expense Visual
    Income_Expense = df.copy()
    Income_Expense.Transaction_type.replace(['credit'], 'Income', inplace= True) #Change to Income for line legend
    Income_Expense.Transaction_type.replace(['debit'], 'Expense', inplace= True) #Change to Expense for line legend
    Income_Expense = df.set_index('Date').groupby([pd.Grouper(freq="M"), 'Transaction_type']).sum().reset_index()
    scatter_plot = px.line(Income_Expense, x = 'Date', y = 'Amount', color = 'Transaction_type', title = 'Income and Expense', color_discrete_sequence= ['red','green'],
                           category_orders= {'Cash Flow': ['Expense', 'Income']})
    scatter_plot.update_layout(legend_traceorder = 'reversed')
    scatter_plot.update_layout(yaxis_tickformat = ',')
    scatter_plot.update_xaxes(tickformat="%d-%b-%Y")
    scatter_plot.show()
    

    enter image description here