Search code examples
data-visualizationholoviewspyvizhvplot

Overlay NdOverlays while keeping color / changing marker


I want a scatter plot of different types of "asset", each asset should have the same color and labeled in the legend. I can do this using an NdOverlay of Scatter. Then I want to overlay two such plots, eg one coming from a model and another from experiment, so that the first and second only change in marker but keeps the same color for each asset.

I would expect this to work

df1 = pd.DataFrame({"asset": ["A", "B", "B"], "x": [1,2,3], "y": [1,2,3]})
df2 = pd.DataFrame({"asset": ["A", "B", "B"], "x": [1.5,2.5,3.5], "y": [1,2,3]})
df1.hvplot.scatter(x="x", y="y", by="asset") * df2.hvplot.scatter(x="x", y="y", by="asset").opts({"Scatter": {"style": {"marker": "d"}}})

but the colors in df1.hvplot per asset are different to those of df2.hvplot. I would like the most concise way starting from df1 and df2.

Edit: Is there a simple solution where I do not have to think about the sorting of df1 and df2 or whether they have the exact same set of "assets". Eg, I need something that would also work with

df1 = pd.DataFrame({"asset": ["A", "B", "B"], "x": [1,2,3], "y": [1,2,3]})
df2 = pd.DataFrame({"asset": ["C", "B", "A"], "x": [1.5,2.5,3.5], "y": [1,2,3]})
l1=df1.hvplot.scatter(x="x", y="y", by="asset")
l2=df2.hvplot.scatter(x="x", y="y", by="asset").opts(hv.opts.Scatter(marker='d'))
ll=l1*l2

or

df1 = pd.DataFrame({"asset": ["A", "B", "B"], "x": [1,2,3], "y": [1,2,3]})
df2 = pd.DataFrame({"asset": ["A", "B", "B", "C"], "x": [1.5,2.5,3.5, 4], "y": [1,2,3, 1]})
l1=df1.hvplot.scatter(x="x", y="y", by="asset")
l2=df2.hvplot.scatter(x="x", y="y", by="asset").opts(hv.opts.Scatter(marker='d'))
ll=l1*l2

Solution

  • Edit: If you need more flexibility, there are two options:

    1. Styling a dimensioned container, and
    2. Styling by using additional value dimensions.

    For more information, see here: jupyter notebook, github repo, but the code goes like this.

    Option 1 (more verbose, but often easier if you are working in a HoloMap-like container anyway):

    import holoviews as hv
    from holoviews import opts, dim
    hv.extension('bokeh')
    import pandas as pd
    import numpy as np
    
    def cycle_kdim_opts(layout, kdim_opts):
        """
        For each given kdim of an Nd holoviews container, create an options dict
        that can be passed into a holoviews `opts` object.
    
        Parameters
        ----------
        layout : A holoviews Nd container (HoloMap, ...)
        kdim_opts : dict of the form {kdim: {style_option: [alternatives]}}
            For an example, see below.
    
    
        """
        # Output shown for:
        # kdim_opts = {
        #     'h': {'color': ['orange', 'cyan']},
        #     'g': {'size': [30, 10]},
        # }
        values = {kd.name: list(d) for kd, d in zip(layout.kdims, zip(*layout.data.keys()))}
        # print(values)
        # {'g': ['a', 'b', 'b'], 'h': ['d', 'c', 'd']}
    
        mapping = {}
        for kd, o in kdim_opts.items():
            unique_values = list(set(values[kd]))
            styles = list(o.values())[0]
            mapping[kd] = dict(zip(unique_values, styles))
        # print(mapping)
        # {'h': {'c': 'orange', 'd': 'cyan'}, 'g': {'b': 30, 'a': 10}}
    
        kdim2style = {k: list(v.keys())[0] for k, v in kdim_opts.items()}
        # print(kdim2style)
        # {'h': 'color', 'g': 'size'}
    
        mapped_styles = {kdim2style[kd]: hv.Cycle([mapping[kd][value] for value in values])
                         for kd, values in values.items()}
        # print(mapped_styles)
        # {'size': Cycle(['10', '30', '30']), 'color': Cycle(['cyan', 'orange', 'cyan'])}
    
        return mapped_styles
    
    df1 = pd.DataFrame({'asset': ['A', 'B', 'B'], 'x': [1.,2.,3.], 'y': [1.,2.,3.]})
    df2 = pd.DataFrame({'asset': ['A', 'B', 'B', 'C'], 'x': [1.5,2.5,3.5,4], 'y': [1.,2.,3.,1.]})
    df = df1.assign(source='exp').merge(df2.assign(source='mod'), how='outer')
    
    labels = hv.Labels(df.assign(l=df.asset+',\n'+df.source), ['x', 'y'], 'l')
    
    l = hv.Dataset(df, ['x', 'y', 'asset', 'source',], []).to(hv.Points).overlay()
    
    od = {
        'source': {'size': [30, 10]},
        'asset': {'color': ['orange', 'cyan', 'yellow']},
    }
    
    options = (
        opts.NdOverlay(legend_position='right', show_legend=True, width=500),
        opts.Points(padding=.5, show_title=False, title_format='',
                    toolbar=None, **cycle_kdim_opts(l, od)),
    )
    
    l.opts(*options) * labels
    

    enter image description here

    Option 2: Way less verbose, but takes more effort to e.g. customize the legend later on.

    df1 = pd.DataFrame({'asset': ['A', 'B', 'B'], 'x': [1.,2.,3.], 'y': [1.,2.,3.]})
    df2 = pd.DataFrame({'asset': ['A', 'B', 'B', 'C'], 'x': [1.5,2.5,3.5,4], 'y': [1.,2.,3.,1.]})
    df = df1.assign(source='exp').merge(df2.assign(source='mod'), how='outer')
    
    labels = hv.Labels(df.assign(l=df.asset+',\n'+df.source), ['x', 'y'], 'l')
    
    l = hv.Points(df, ['x', 'y'], ['asset', 'source',])
    
    options = (
        opts.NdOverlay(legend_position='right', show_legend=True, width=500),
        opts.Points(padding=.5, show_title=False, show_legend=True,
                    marker=dim('source').categorize({'exp':'circle', 'mod':'diamond'}),
                    color=dim('asset').categorize({'A':'orange', 'B':'cyan', 'C':'yellow'}),
                    size=10, toolbar=None)
    )
    
    l.opts(*options) * labels
    

    Original suggestion (closest to your example): You could e.g. explicitly set the colours using a hv.Cycle object:

    df1 = pd.DataFrame({"asset": ["A", "B", "B"], "x": [1,2,3], "y": [1,2,3]})
    df2 = pd.DataFrame({"asset": ["A", "B", "B"], "x": [1.5,2.5,3.5], "y": [1,2,3]})
    l1=df1.hvplot.scatter(x="x", y="y", by="asset")
    l2=df2.hvplot.scatter(x="x", y="y", by="asset").opts(hv.opts.Scatter(marker='d'))
    ll=l1*l2
    ll.opts(hv.opts.Scatter(padding=.1, color=hv.Cycle(['blue', 'orange'])))