I'm trying to create a plot that contains both a violin plot and a stripplot with jitter. How do I go about doing this? I provided my attempt below. The problem that I have been encountering is that the violin plot seems to be invisible in the plots.
# 1. Create violin plot
violin = alt.Chart(df).transform_density(
"n_genes_by_counts",
as_=["n_genes_by_counts", "density"],
).mark_area(orient="horizontal").encode(
y="n_genes_by_counts:Q",
x=alt.X("Density:Q", stack="center", title=None),
)
# 2. Create stripplot
stripplot = alt.Chart(df).mark_circle(size=8, color="black").encode(
y="n_gene_by_counts",
x=alt.X("jitter:Q", title=None),
).transform_calculate(
jitter="sqrt(-2*log(random()))*cos(2*PI*random())"
)
# 3. Combine both
combined = stripplot + violin
I have a feeling that it could be a problem with the scaling of the X axis. That is, density
is much, much smaller than jitter
. If that's the case, how to I make jitter
so that it's on the same order of magnitude as density
? Would it be possible for someone to show me how to create a violin+stripplot given a column name n_gene_by_counts
that belongs to some pandas dataframe df
? Here's an example image of the kind of plot I'm looking for:
As you suspected, the different scales will make the violin very small in the stripplot unless you adjust for it. In your case, you have also accidentally capitalized Density:Q
in the channel encoding, which means that your violinplot is actually empty since this channel doesn't exist. This example works:
import altair as alt
from vega_datasets import data
df = data.cars()
# 1. Create violin plot
violin = alt.Chart(df).transform_density(
"Horsepower",
as_=["Horsepower", "density"],
).mark_area().encode(
x="Horsepower:Q",
y=alt.Y("density:Q", stack="center", title=None),
)
# 2. Create stripplot
stripplot = alt.Chart(df).mark_circle(size=8, color="black").encode(
x="Horsepower",
y=alt.X("jitter:Q", title=None),
).transform_calculate(
jitter="(random() / 400) + 0.0052" # Narrowing and centering the points
)
# 3. Combine both
violin + stripplot
By using scipy, you could also lay out the points themselves in the shape of the violin, which I am personally quite found of (discussion in this issue):
import altair as alt
import numpy as np
import pandas as pd
from scipy import stats
from vega_datasets import data
# NAs are not supported in SciPy's density calculation
df = data.cars().dropna()
y = 'Horsepower'
# Compute the density function of the data
dens = stats.gaussian_kde(df[y])
# Compute the density value for each data point
pdf = dens(df[y].sort_values())
# Randomly jitter points within 0 and the upper bond of the probability density function
density_cloud = np.empty(pdf.shape[0])
for i in range(pdf.shape[0]):
density_cloud[i] = np.random.uniform(0, pdf[i])
# To create a symmetric density/violin, we make every second point negative
# Distributing every other point like this is also more likely to preserve the shape of the violin
violin_cloud = density_cloud.copy()
violin_cloud[::2] = violin_cloud[::2] * -1
# Append the density cloud to the original data in the correctly sorted order
df_with_density = pd.concat([
df,
pd.DataFrame({
'density_cloud': density_cloud,
'violin_cloud': violin_cloud
},
index=df['Horsepower'].sort_values().index)],
axis=1
)
# Visualize using the new Offset channel
alt.Chart(df_with_density).mark_circle().encode(
x='Horsepower',
y='violin_cloud'
)
Both these approaches will work with multiple categoricals without faceting in the next version of Altair when support for x/y offset channels are added.