I have the following Pandas DataFrame (from read_csv
):
In [6]: rev_df.head()
Out[6]:
Entity Code Year National Gov Revenues (Wallis (2000)) Local Gov Revenues (Wallis (2000)) State Gov Revenues (Wallis (2000))
0 United States USA 1902 3.0 4.0 0.8
1 United States USA 1913 2.4 4.2 0.9
2 United States USA 1922 5.8 5.2 1.7
3 United States USA 1927 4.7 6.0 2.1
4 United States USA 1934 6.0 7.6 3.8
Year
is a column, and there are 3 additional columns, 1 each for Local/State/National revenues. I'd like to create a stacked area chart, like this:
https://ourworldindata.org/grapher/government-revenues-national-income?country=~USA
My altair code should hopefully look like this:
alt.Chart(rev_df).mark_area().encode(
x = 'Year',
y = 'Revenue',
color = 'Level' ## where level is {Local|State|National}
)
I'm having trouble understanding the best transforms to apply on my DataFrame to do this cleanly and comprehensibly. What's the best approach? I think I need to create a single column, Level
, and essentially triple the number of rows in the DataFrame, but I'm not sure how to articulate or implement this transform.
Related - I've been transforming the Year
column, which is int64
as returned by read_csv
, thus:
rev_df['Year'] = pd.to_datetime(rev_df['Year'], format='%Y')
Is there a best practice here? Or is this fine?
You can use transform_fold
as mentioned in the comment from debbes:
from io import StringIO
import pandas as pd
import altair as alt
rev_df = pd.read_csv(
StringIO(
'''
Entity Code Year National Local State
UnitedStates USA 1902 3.0 4.0 0.8
UnitedStates USA 1913 2.4 4.2 0.9
UnitedStates USA 1922 5.8 5.2 1.7
UnitedStates USA 1927 4.7 6.0 2.1
UnitedStates USA 1934 6.0 7.6 3.8
'''
),
sep='\s+',
parse_dates=['Year']
)
alt.Chart(rev_df).mark_area().encode(
x = 'Year',
y = alt.Y('Revenue:Q', stack=True),
color = 'Level:N'
).transform_fold(
['National', 'Local', 'State'],
as_=['Level', 'Revenue']
)
Your date approach is fine but you can also use parse_dates
as in my example above.