Search code examples
pythonvega-litealtair

strange year values on X axis


If I use the vega dataset "disasters" and make a straightforward chart, I get some weird values for year.

In Altair the code is:

import altair as alt
from vega_datasets import data

dis=data.disasters()

alt.Chart(dis).mark_bar().encode(
    x=alt.X('Year:T'),
    y=alt.Y('Deaths'),
    color='Entity'
)

enter image description here

(vega editor link)


Solution

  • Adding to @kanitw's answer: when you convert an integer to a datetime, the integer is treated as nanoseconds since the zero date. You can see this in pandas by executing the following:

    >>> pd.to_datetime(dis.Year)
    0   1970-01-01 00:00:00.000001900
    1   1970-01-01 00:00:00.000001901
    2   1970-01-01 00:00:00.000001902
    3   1970-01-01 00:00:00.000001903
    4   1970-01-01 00:00:00.000001905
    Name: Year, dtype: datetime64[ns]
    

    Altair/Vega-Lite uses a similar convention.

    If you would like to parse the year as a date when loading the data, and then plot the year with Altair, you can do the following:

    import altair as alt
    from vega_datasets import data
    
    dis=data.disasters(parse_dates=['Year'])
    
    alt.Chart(dis).mark_bar().encode(
        x=alt.X('year(Year):T'),
        y=alt.Y('Deaths'),
        color='Entity'
    )
    

    example chart

    First we parse the year column as a date by passing the appropriate pandas.read_csv argument to the loading function, and then use the year timeUnit to extract just the year from the full datetime.

    If you are plotting data from a CSV URL rather than a pandas dataframe, Vega-Lite is smart enough to parse the CSV file based on the encoding you specify in the Chart, which means the following will give the same result:

    dis=data.disasters.url
    
    alt.Chart(dis).mark_bar().encode(
        x=alt.X('year(Year):T'),
        y=alt.Y('Deaths:Q'),
        color='Entity:N'
    )
    

    example chart