Search code examples
pythonplotlyhoverhistogram

How to show all occurrences in the hover data of a plotly.express histogram


I'm trying to construct a histogram in plotly that can show data from other columns in the histogram's bins using the hover_data argument. As an example, take the following small dataset:

import pandas as pd

word_data = {'author':['Martin Luther King Jr.',
                       'Martin Luther King Jr.',
                       'Martin Luther King Jr.',
                       'Malcolm X',
                       'Malcolm X',
                       'Fred Hampton',
                       'Fred Hampton',
                       'James Baldwin',
                       'James Baldwin'], 
             'words': ['dream', 'color', 'nonviolence',
                       'color', 'rights',
                       'panthers', 'rights',
                       'color', 'rights']}

words_df = pd.DataFrame(word_data)
print(words_df)

Which (for reference) results in:

                   author        words
0  Martin Luther King Jr.        dream
1  Martin Luther King Jr.        color
2  Martin Luther King Jr.  nonviolence
3               Malcolm X        color
4               Malcolm X       rights
5            Fred Hampton     panthers
6            Fred Hampton       rights
7           James Baldwin        color
8           James Baldwin       rights

I've built the following plotly histogram:

import plotly.express as px

fig = px.histogram(words_df, x='words', hover_data=['author'],
                  labels={
                      'words': 'Most Common Words'
                  },
                   title='Most Common Words that Speakers Use'
                  ).update_xaxes(categoryorder='total descending').update_layout(yaxis_title='Number of Speakers')
fig.show()

plotly histogram

As you can see the hover data only shows values from words and count. I am trying to find a way to also incorporate a list of the speakers who used the word associated with a given bin into its hover data. I tried passing ['author'] into the hover_data argument, but that doesn't seem to work. Does anyone know of a way to achieve this?


Solution

  • If you prepare your data frame you can do this as a bar figure.

    import pandas as pd
    import plotly.express as px
    
    word_data = {
        "author": [
            "Martin Luther King Jr.",
            "Martin Luther King Jr.",
            "Martin Luther King Jr.",
            "Malcolm X",
            "Malcolm X",
            "Fred Hampton",
            "Fred Hampton",
            "James Baldwin",
            "James Baldwin",
        ],
        "words": [
            "dream",
            "color",
            "nonviolence",
            "color",
            "rights",
            "panthers",
            "rights",
            "color",
            "rights",
        ],
    }
    
    words_df = pd.DataFrame(word_data)
    
    px.bar(
        words_df.groupby("words", as_index=False)
        .agg(count=("words", "size"), speakers=("author", list))
        .sort_values(["count", "words"], ascending=[0, 1]),
        x="words",
        y="count",
        hover_data=["speakers"],
        title="Most Common Words that Speakers Use",
    ).update_layout(xaxis_title="Most Common Words", yaxis_title="Number of Speakers")
    

    enter image description here