I'm trying to construct a histogram in plotly
that can show data from other columns in the histogram's bins using the hover_data
argument. As an example, take the following small dataset:
import pandas as pd
word_data = {'author':['Martin Luther King Jr.',
'Martin Luther King Jr.',
'Martin Luther King Jr.',
'Malcolm X',
'Malcolm X',
'Fred Hampton',
'Fred Hampton',
'James Baldwin',
'James Baldwin'],
'words': ['dream', 'color', 'nonviolence',
'color', 'rights',
'panthers', 'rights',
'color', 'rights']}
words_df = pd.DataFrame(word_data)
print(words_df)
Which (for reference) results in:
author words
0 Martin Luther King Jr. dream
1 Martin Luther King Jr. color
2 Martin Luther King Jr. nonviolence
3 Malcolm X color
4 Malcolm X rights
5 Fred Hampton panthers
6 Fred Hampton rights
7 James Baldwin color
8 James Baldwin rights
I've built the following plotly
histogram:
import plotly.express as px
fig = px.histogram(words_df, x='words', hover_data=['author'],
labels={
'words': 'Most Common Words'
},
title='Most Common Words that Speakers Use'
).update_xaxes(categoryorder='total descending').update_layout(yaxis_title='Number of Speakers')
fig.show()
As you can see the hover data only shows values from words
and count
. I am trying to find a way to also incorporate a list of the speakers who used the word associated with a given bin into its hover data. I tried passing ['author']
into the hover_data
argument, but that doesn't seem to work. Does anyone know of a way to achieve this?
If you prepare your data frame you can do this as a bar figure.
import pandas as pd
import plotly.express as px
word_data = {
"author": [
"Martin Luther King Jr.",
"Martin Luther King Jr.",
"Martin Luther King Jr.",
"Malcolm X",
"Malcolm X",
"Fred Hampton",
"Fred Hampton",
"James Baldwin",
"James Baldwin",
],
"words": [
"dream",
"color",
"nonviolence",
"color",
"rights",
"panthers",
"rights",
"color",
"rights",
],
}
words_df = pd.DataFrame(word_data)
px.bar(
words_df.groupby("words", as_index=False)
.agg(count=("words", "size"), speakers=("author", list))
.sort_values(["count", "words"], ascending=[0, 1]),
x="words",
y="count",
hover_data=["speakers"],
title="Most Common Words that Speakers Use",
).update_layout(xaxis_title="Most Common Words", yaxis_title="Number of Speakers")