Search code examples
pythonplotaltair

Altair: Creating a layered violin + stripplot


I'm trying to create a plot that contains both a violin plot and a stripplot with jitter. How do I go about doing this? I provided my attempt below. The problem that I have been encountering is that the violin plot seems to be invisible in the plots.

# 1. Create violin plot
violin = alt.Chart(df).transform_density(
    "n_genes_by_counts",
    as_=["n_genes_by_counts", "density"],
).mark_area(orient="horizontal").encode(
    y="n_genes_by_counts:Q",
    x=alt.X("Density:Q", stack="center", title=None),
)

# 2. Create stripplot
stripplot = alt.Chart(df).mark_circle(size=8, color="black").encode(
    y="n_gene_by_counts",
    x=alt.X("jitter:Q", title=None),
).transform_calculate(
    jitter="sqrt(-2*log(random()))*cos(2*PI*random())"
)

# 3. Combine both
combined = stripplot + violin

I have a feeling that it could be a problem with the scaling of the X axis. That is, density is much, much smaller than jitter. If that's the case, how to I make jitter so that it's on the same order of magnitude as density? Would it be possible for someone to show me how to create a violin+stripplot given a column name n_gene_by_counts that belongs to some pandas dataframe df? Here's an example image of the kind of plot I'm looking for: n_genes_by_counts plot


Solution

  • As you suspected, the different scales will make the violin very small in the stripplot unless you adjust for it. In your case, you have also accidentally capitalized Density:Q in the channel encoding, which means that your violinplot is actually empty since this channel doesn't exist. This example works:

    import altair as alt
    from vega_datasets import data
    
    df = data.cars()
    
    # 1. Create violin plot
    violin = alt.Chart(df).transform_density(
        "Horsepower",
        as_=["Horsepower", "density"],
    ).mark_area().encode(
        x="Horsepower:Q",
        y=alt.Y("density:Q", stack="center", title=None),
    )
    
    # 2. Create stripplot
    stripplot = alt.Chart(df).mark_circle(size=8, color="black").encode(
        x="Horsepower",
        y=alt.X("jitter:Q", title=None),
    ).transform_calculate(
        jitter="(random() / 400) + 0.0052"  # Narrowing and centering the points
    )
    
    # 3. Combine both
    violin + stripplot
    

    enter image description here

    By using scipy, you could also lay out the points themselves in the shape of the violin, which I am personally quite found of (discussion in this issue):

    import altair as alt
    import numpy as np
    import pandas as pd
    from scipy import stats
    from vega_datasets import data
    
    
    # NAs are not supported in SciPy's density calculation
    df = data.cars().dropna()
    y = 'Horsepower'
    
    # Compute the density function of the data
    dens = stats.gaussian_kde(df[y])
    # Compute the density value for each data point
    pdf = dens(df[y].sort_values())
    
    # Randomly jitter points within 0 and the upper bond of the probability density function
    density_cloud = np.empty(pdf.shape[0])
    for i in range(pdf.shape[0]):
        density_cloud[i] = np.random.uniform(0, pdf[i])
    # To create a symmetric density/violin, we make every second point negative
    # Distributing every other point like this is also more likely to preserve the shape of the violin
    violin_cloud = density_cloud.copy()
    violin_cloud[::2] = violin_cloud[::2] * -1
    
    # Append the density cloud to the original data in the correctly sorted order
    df_with_density = pd.concat([
        df,
        pd.DataFrame({
            'density_cloud': density_cloud,
            'violin_cloud': violin_cloud
            },
            index=df['Horsepower'].sort_values().index)],
        axis=1
    )
    
    # Visualize using the new Offset channel
    alt.Chart(df_with_density).mark_circle().encode(
        x='Horsepower',
        y='violin_cloud'
    )
    

    enter image description here

    Both these approaches will work with multiple categoricals without faceting in the next version of Altair when support for x/y offset channels are added.