Search code examples
pythonpandasplotly-pythonplotly-express

colored by a third variable mess up the order of date axis with plotly.express


I have a dataframe like below.

import plotly.express as px
import pandas as pd
dfm = pd.DataFrame({'Year':['2017','2017','2017','2017','2018','2018','2018','2018'],
                    'Month':['01', '04', '10', '12', '01', '04', '10', '12'],
                    'Counts':[12, 33, 9, 45, 11, 54, 22, 13],
                    'Region': ['A', 'B', 'A', 'A', 'B', 'B', 'A', 'B']})
dfm['Year_Month'] = dfm['Year']+'_'+dfm['Month']

I plotted variable Counts vs Year_Month. Everything looks normal.

fig = px.line(dfm, x="Year_Month", y="Counts")
fig.update_traces(mode='markers+lines')

However, when I tried to color the line by a third variable, Region in this case, The Year_Month axis totally messed up.

fig = px.line(dfm, x="Year_Month", y="Counts", color='Region')
fig.update_traces(mode='markers+lines')

Does anyone know why? How can I fix this?


Solution

  • I believe this is because you are using a non-standard date format yyyy_mm, which plotly does not recognize as a date, but rather a categorical and thus uses different rules than what you would expect for missing timeseries data.

    You can fix this by changing your date format to yyyy-mm (underscore changed to a hyphen via dfm['Year_Month'] = dfm['Year']+'-'+dfm['Month']), which plotly does recognize as valid dates and plots as expected:

    enter image description here