Search code examples
pythonpandasdataframevisualizationaltair

Pandas/Altair - plot multiple series


I have the following Pandas DataFrame (from read_csv):

In [6]: rev_df.head()
Out[6]:
          Entity Code  Year  National Gov Revenues (Wallis (2000))  Local Gov Revenues (Wallis (2000))  State Gov Revenues (Wallis (2000))
0  United States  USA  1902                                    3.0                                 4.0                                 0.8
1  United States  USA  1913                                    2.4                                 4.2                                 0.9
2  United States  USA  1922                                    5.8                                 5.2                                 1.7
3  United States  USA  1927                                    4.7                                 6.0                                 2.1
4  United States  USA  1934                                    6.0                                 7.6                                 3.8

Year is a column, and there are 3 additional columns, 1 each for Local/State/National revenues. I'd like to create a stacked area chart, like this: https://ourworldindata.org/grapher/government-revenues-national-income?country=~USA

My altair code should hopefully look like this:

alt.Chart(rev_df).mark_area().encode(
  x = 'Year',
  y = 'Revenue',
  color = 'Level' ## where level is {Local|State|National}
)

I'm having trouble understanding the best transforms to apply on my DataFrame to do this cleanly and comprehensibly. What's the best approach? I think I need to create a single column, Level, and essentially triple the number of rows in the DataFrame, but I'm not sure how to articulate or implement this transform.

Related - I've been transforming the Year column, which is int64 as returned by read_csv, thus: rev_df['Year'] = pd.to_datetime(rev_df['Year'], format='%Y')

Is there a best practice here? Or is this fine?


Solution

  • You can use transform_fold as mentioned in the comment from debbes:

    from io import StringIO
    
    import pandas as pd
    import altair as alt
    
    
    rev_df = pd.read_csv(
        StringIO(
    '''
    Entity        Code Year  National  Local  State
    UnitedStates  USA  1902  3.0       4.0    0.8
    UnitedStates  USA  1913  2.4       4.2    0.9
    UnitedStates  USA  1922  5.8       5.2    1.7
    UnitedStates  USA  1927  4.7       6.0    2.1
    UnitedStates  USA  1934  6.0       7.6    3.8
    '''
        ),
        sep='\s+',
        parse_dates=['Year']
    )
    
    alt.Chart(rev_df).mark_area().encode(
      x = 'Year',
      y = alt.Y('Revenue:Q', stack=True),
      color = 'Level:N'
    ).transform_fold(
        ['National', 'Local', 'State'],
        as_=['Level', 'Revenue']
    )
    

    enter image description here

    Your date approach is fine but you can also use parse_dates as in my example above.