I have a dataframe like below.
import plotly.express as px
import pandas as pd
dfm = pd.DataFrame({'Year':['2017','2017','2017','2017','2018','2018','2018','2018'],
'Month':['01', '04', '10', '12', '01', '04', '10', '12'],
'Counts':[12, 33, 9, 45, 11, 54, 22, 13],
'Region': ['A', 'B', 'A', 'A', 'B', 'B', 'A', 'B']})
dfm['Year_Month'] = dfm['Year']+'_'+dfm['Month']
I plotted variable Counts
vs Year_Month
. Everything looks normal.
fig = px.line(dfm, x="Year_Month", y="Counts")
fig.update_traces(mode='markers+lines')
However, when I tried to color the line by a third variable, Region
in this case, The Year_Month
axis totally messed up.
fig = px.line(dfm, x="Year_Month", y="Counts", color='Region')
fig.update_traces(mode='markers+lines')
Does anyone know why? How can I fix this?
I believe this is because you are using a non-standard date format yyyy_mm
, which plotly does not recognize as a date, but rather a categorical and thus uses different rules than what you would expect for missing timeseries data.
You can fix this by changing your date format to yyyy-mm
(underscore changed to a hyphen via dfm['Year_Month'] = dfm['Year']+'-'+dfm['Month']
), which plotly does recognize as valid dates and plots as expected: