Search code examples
pythonaltair

how to use quantileNormal in altiar with labelExpr


I have a dataset df that contains a computed frequency curve that looks like

    aep flow    variance    n-day
0   0.001   64480.8 0.01190750  01-day
1   0.002   56995.7 0.00925476  01-day
2   0.005   47984.8 0.00633636  01-day
3   0.01    41772.4 0.00456081  01-day
4   0.02    36024.0 0.00314372  01-day
5   0.05    29043.5 0.00179256  01-day
6   0.1 24145.0 0.00113570  01-day
7   0.2 19466.1 0.00075381  01-day
8   0.5 13215.4 0.00055517  01-day
9   0.8 9261.1  0.00054307  01-day
10  0.9 7785.4  0.00066750  01-day
11  0.95    6787.7  0.00094589  01-day

Where df.aep is an annual exceedance probability, df.flow is observed streamflow. I am interested in making a flood frequency plot which commonly has a log-scaled y-axis and and an x-axis scale is a reversed normal axis.

For this chart spec:

alt.Chart(df).transform_calculate(
    normal = 'quantileNormal(datum.aep)').mark_line().encode(
    x = alt.X('aep:Q', axis = alt.Axis(format = '%'), scale = alt.Scale( reverse=True)),
    y = alt.Y('flow:Q', scale = alt.Scale(type='log')),
    color = 'n-day')

I am getting a plot that looks like:

enter image description here

Ultimately, I am more interested in a plot that looks uses the transform quantileNormal:

alt.Chart(df).transform_calculate(
normal = 'quantileNormal(datum.aep)').mark_line().encode(
x = alt.X('normal:Q', scale = alt.Scale( reverse=True)),
y = alt.Y('flow:Q', scale = alt.Scale(type='log')),
color = 'n-day')

enter image description here

But now I am at the point where I need the scaling provided by quantileNormal(datum.aep) which are z-scores, with the labels from df.aep.

  1. Is there a way to specify this type of scale in altair, while still using df.aep as the plotted y-values?

So I am thinking I can set up a labelExpr to map in the labels using:

from altair_transform import extract_data

aa = extract_data(a)
aa.normal = aa.normal.round(3)

bb = aa[['normal']]
bb.index = list('abcdefghijkl')
bb = pd.Series(index = bb.index, data = bb.values.flatten())

cc = aa[['aep']]
cc.index = list('abcdefghijkl')
cc = pd.Series(index = cc.index, data = cc.values.flatten())

s = ''
for norm, aep in zip(bb.iteritems(), cc.iteritems()):
    s += f"datum.normal == {norm[1]} ? '{aep[1]:.3f}' :  "
s += 'null'

where s evaluates as:

"datum.normal == -3.09 ? '0.001' :  datum.normal == -2.878 ? '0.002' :  datum.normal == -2.576 ? '0.005' :  datum.normal == -2.326 ? '0.010' :  datum.normal == -2.054 ? '0.020' :  datum.normal == -1.645 ? '0.050' :  datum.normal == -1.282 ? '0.100' :  datum.normal == -0.842 ? '0.200' :  datum.normal == 0.0 ? '0.500' :  datum.normal == 0.842 ? '0.800' :  datum.normal == 1.282 ? '0.900' :  datum.normal == 1.645 ? '0.950' :  null"

Then, If I change my chart spec to:

a = alt.Chart(aa).mark_line().encode(
    x = alt.X('normal:Q', 
    axis = alt.Axis(
        values = [bb.a, bb.b, bb.c, bb.d, bb.e, bb.f, bb.g, bb.h, bb.i, bb.j, bb.k, bb.l],
        labelExpr=s
        ), 
        scale = alt.Scale( reverse=True)),
    y = alt.Y('flow:Q', scale = alt.Scale(type='log')),
    color = 'n-day')

I get a chart with null labels. Can labelExpr be used like this?

Vega-Chart

enter image description here


Solution

  • I believe you need to use datum.label or datum.value rather than datum.normal to compare against the value of the label:

    import altair as alt
    import pandas as pd
    
    alt.Chart(pd.DataFrame({'x': [3.21, 1.23, 4.56], 'y': [1, 2, 3]})).mark_point().encode(
        x=alt.X(
            'x',
            axis=alt.Axis(
                values=[1, 2, 3],
                labelExpr='datum.label == 1 ? "label" : datum.value == 2 ? "value" : datum.x == 3 ? "x": null')),
        y='y'
    )
    

    enter image description here