Search code examples
pythonplotlyhistogramscatter-plotplotly-python

Python: Why is my marginal Y histogram plot changing when the X-variable is changing?


I am trying to use plotly to create a scatter plot where the x-variable can be selected from a drop-down menu. The scatter plot would also have marginal histogram plots.

# Generate data and import libraries
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

# set the random seed
np.random.seed(123)

# generate arrays of random numbers
a = np.random.rand(100)
b = np.random.rand(100)
c = np.random.rand(100)
y = np.random.rand(100)

# create a DataFrame with columns A, B, and C
df_scatterplot = pd.DataFrame({'A': a, 'B': b, 'C': c, 'y':y})
df_scatterplot_nocc = df_scatterplot.loc[:, df_scatterplot.columns != 'y']

# print the DataFrame
print(df_scatterplot)

I have created a working script where the y-variable can be changed:

# Plot scatter plot and marginal histograms
fig = px.scatter(df_scatterplot, x='A', y='y', marginal_x="histogram", marginal_y="histogram")

buttonlist = []

for col in df_scatterplot.columns:
    buttonlist.append(
      dict(
          args = [
                    {'y': [df_scatterplot[str(col)]]},  # update y variable
                    {'yaxis.title.text': str(col)}      # update y-axis title
                  ],
          label=str(col),
          method='update'
      )
    )

# Add dropdown menu
fig.update_layout(updatemenus=[
          go.layout.Updatemenu(
              buttons=buttonlist,
              x=0.75, xanchor="left", y=1.0, yanchor="top",
          ),
      ],
  )

fig.update_layout(autosize=False, width=1000, height=700,)
fig.show()

Output for varying y variable: varying y variable

As expected, only the marginal_y histogram is changing. The issue is when trying to modify my script so that the x-variable can be changed, resulting in the marginal_y histogram plot changing as well (it plots a marginal_x histogram for both x and y). It should not be changing.

Modifying to:

args = [
                    {'x': [df_scatterplot[str(col)]]},  # update x variable
                    {'xaxis.title.text': str(col)}      # update x-axis title
                  ],

Output for varying x variable now becomes: varying x variable

I have tried changing the method argument from update to restyle with no luck. Any help is appreciated; thank you.


Solution

  • You can fix this by specifying which trace to update, passing an array of trace indices to the update method :

    args = [
        {'x': [df_scatterplot[str(col)]]},  # update x variable
        {'xaxis.title.text': str(col)},     # update x-axis title
        [0, 1]                              # update only trace 0 and 1 (preserve marginal_y)
    ],
    

    The idea is to preserve the marginal_y histogram from the data update.

    Nb. The plot ends up with 3 traces, indexed as follows :

    • 0 the main scatter trace on subplot xy
    • 1 the marginal_x histogram above the scatter on subplot x3y3
    • 2 the marginal_y histogram on the right on subplot x2y2