I am trying to sort a dataframe by multiple columns (more precisely, the first column is a string and should be sorted alphabetically and the second column is numeric and should be sorted from lowest to highest).
import altair as alt
import pandas as pd
df = pd.DataFrame(
{
"unique_ID": ["a", "b", "c", "d", "e", "f"],
"group": ["Group 1", "Group 1", "Group 3", "Group 3", "Group 3", "Group 2"],
"value": [20, 10, 30, 50, 40, 60]
}
)
df
alt.Chart(df).mark_bar().encode(
y="unique_ID",
x="value",
color="group",
)
(I would like to upload the result I am currently getting, but the image upload is failing, I will edit the post as soon as I can upload)
What I would like to achieve is to sort this barplot by two variables, first alphabetically by group (i.e. Group 1, Group 2, Group 3) and second, by value from lowest to highest.
The order I would like to achieve is b - a - f - c - e - d (I hope I did not make any mistake here).
Note: Of course, this is just a simple example in order to understand how to sort by multiple variables. I understand that I can sort via EncodingSortField
e.g.
y=alt.Y(
"unique_ID",
sort=alt.EncodingSortField(field="group", op="min")
)
but as far as I know, this only works for one column.
Thank you very much!
You would have to compute a combined label as per https://github.com/vega/vega-lite/issues/7017
import altair as alt
import pandas as pd
df = pd.DataFrame(
{
"unique_ID": ["f", "b", "c", "d", "e", "a"],
"group": ["Group1", "Group1", "Group3", "Group3", "Group3", "Group2"],
"value": [20, 10, 30, 50, 40, 60]
}
)
df
alt.Chart(df).mark_bar().transform_calculate(
sort_field='datum.group + datum.unique_ID'
).encode(
y=alt.Y("unique_ID:N").sort(field='sort_field', op='min'),
x="value",
color="group",
)
I'm unsure why op='min'
is needed here and the default 'average' is not working.