Search code examples
pythonpandasplotlysdmx

Plotting SDMX with PANDAS or plot


What's the easiest way to plot SDMX data with PANDAS or Plotly?

I have the following code:

import pandasdmx as sdmx 
import plotly.express as px

df = sdmx.Request('OECD').data(
  resource_id='MEI_FIN',
  key='IR3TIB.GBR+USA.M',
  params={'startTime': '1900-06', 'dimensionAtObservation': 'TimeDimension'},
).write().reset_index()
df

i end up getting the following error when trying to plot

fig = px.line(df, x="TIME_PERIOD", y='', title='Life expectancy in Country: Denmark')
fig.show()

as the following:

ValueError: Value of 'y' is not the name of a column in 'data_frame'. Expected one of `[('TIME_PERIOD', '', ''), ('IR3TIB', 'GBR', 'M'), ('IR3TIB', 'USA', 'M')] but received:` 

I am pretty new with python so i would appreciate every comment that could help me with this.


Solution

  • I think that your main problem is due to the fact that your df is with multiindex. I'm not sure if this is what you what to achieve but you can try the following code:

    import pandasdmx as sdmx 
    import plotly.express as px
    
    df = sdmx.Request('OECD').data(
      resource_id='MEI_FIN',
      key='IR3TIB.GBR+USA.M',
      params={'startTime': '1900-06', 'dimensionAtObservation': 'TimeDimension'},
    ).write().reset_index()
    
    # with this we get rid of multi-index
    # you could use a loop if you prefer I used
    # list of comprehension
    
    df.columns = ["_".join([c for c in col if c!='']) 
                  for col in df.columns]
    
    fig = px.line(df,
                  x="TIME_PERIOD",
                  y=['IR3TIB_GBR_M', 'IR3TIB_USA_M'],
                  title='Life expectancy in GBR and USA')\
            .update_layout(title_x=0.5)
    fig.show()
    

    enter image description here