Search code examples
pythonvisualizationaltair

Sort normalized stacked bar chart by dataframe order with Altair


How can I keep the order of my stacked bars chart from my Dataframe ? The head of my Dataframe looks like this :

My dataframe

The countries are ordered as I want them to be and I can handle it by setting sort=None. But I want to order lineages in the stacked bar by sequences_number, only keeping the 'Others' value at the end, as it is in my Dataframe. By using sort=None in the X encoding channel, it still sort lineages alphabetically. So I added a 'rank' column in my Dataframe in order to use the sort=alt.SortField("rank:Q", order="ascending") but it still order lineage alphabetically instead of by sequences_number (with 'Others' at the end).

Here is my code :

histo_base_proportion = (
    alt.Chart(
        top10_selection_africa
    ).encode(
        alt.Y(
            "country:N",
            scale=alt.Scale(padding=0.3), 
            sort=None #sorting works here
        ),
        alt.X(
            "sequences_number:Q",
            sort=alt.SortField("rank:Q", order="ascending"), #but does not work here
            title = "Lineages",
            stack="normalize"
        ),
        alt.Color(
            "lineage"
        ).scale(
            domain=fixed_domain_lineages, range=fixed_range_lineages
        ).legend(None),
        tooltip=[
          {"type": "nominal", 'title': 'Country',"field": "country"},
          {"type": "nominal", 'title': 'Lineage',"field": "lineage"},
          {"type": "quantitative", 'title': 'Nombres', "field": "sequences_number"}
        ]
    ).mark_bar(
        size=12
    ).properties(
        width=800,
        title="Proportion of sequenced genomes lineage by country in " + year_title
    )
)

Here is what I get : Lineages number proportion by country. Top 5 lineages is shown per country, lineages outside of top 5 are in 'Others' category, in black color.

This represents the lineages number proportion by country. Top 5 lineages is shown per country, lineages outside of top 5 are in 'Others' category, in black color. In my case, I would have wanted the first green lineage of Rwanda to be just before 'Others' (black color) or the dark blue lineage of Comoros to be just before 'Others' (black color), for examples.

How can I do that ? Thanks in advance.


Solution

  • You can use the order encoding instead of sort to order the stacked segments as in this example:

    import altair as alt
    from vega_datasets import data
    
    source = data.barley()
    
    alt.Chart(source).mark_bar().encode(
        x='sum(yield)',
        y='variety',
        color='site',
        order=alt.Order(
          # Sort the segments of the bars by this field
          'site',
          sort='ascending'
        )
    )
    

    You can read more about how order works in the documentation