Search code examples
python-3.xpandasbokehholoviewshvplot

Change Default Hover Data for hvplot.hist


I have the following dataframe called df that contains 2 columns:

In [4]: df.head(20)                                                                               
Out[4]: 
     age age_band
0    NaN      NaN
1   61.0    55-64
2    NaN      NaN
3   55.0    55-64
4    NaN      NaN
5   67.0      65+
6    NaN      NaN
7   20.0    18-24
8   53.0    45-54
9    NaN      NaN
10   NaN      NaN
11  23.0    18-24
12  60.0    55-64
13   NaN      NaN
14  54.0    45-54
15   NaN      NaN
16  67.0      65+
17   NaN      NaN
18  50.0    45-54
19  70.0      65+
In [5]: df.info()                                                                                 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 107632 entries, 0 to 107631
Data columns (total 2 columns):
age         73289 non-null float64
age_band    73289 non-null object
dtypes: float64(1), object(1)
memory usage: 1.6+ MB
In [7]: df["age_band"].value_counts()                                                             
Out[7]: 
45-54    22461
55-64    17048
35-44    14582
65+      12990
25-34     4078
18-24     2130
Name: age_band, dtype: int64
In [8]: df["age"].min()                                                                           
Out[8]: 19.0

In [9]: df["age"].max()                                                                           
Out[9]: 74.0

AIM: I want to plot a histogram for df["age"] using hvplot. In this plot, I would like to place the ages into bins that correspond with my df["age_band"] column values. The following plot does this:

In [10]: df.hvplot.hist("age",bins=[18,25,35,45,55,65,74],xticks=[18,25,35,45,55,65,74],hover_cols
    ...: =["age_band"],line_width=4,line_color="w")

enter image description here

When you hover over each bin, the count for each age_band correctly displays as Count, however, rather than each age band value, it seems to display the mean or median age for each bin.

Upon further investigation, it appears that setting hover_cols="age_band" actually had no effect on the plot (you get an identical plot if it is omitted.)

I then tried to use HoverTool:

In [11]: from bokeh.models import HoverTool 
    ...:      
    ...: hover = HoverTool(tooltips=df["age_band"].dropna()) 
    ...:  
    ...: df.hvplot.hist("age",bins=[18,25,35,45,55,65,74],xticks=[18,25,35,45,55,65,74],line_width
    ...: =4,line_color="w").opts(tools=[hover])

However I got the following error:

ValueError: expected an element of either String or List(Tuple(String, String)), got 1         55-64

So then I tried:

In [12]: from bokeh.models import HoverTool 
    ...:      
    ...: hover = HoverTool(tooltips="age_band") 
    ...:  
    ...: df.hvplot.hist("age",bins=[18,25,35,45,55,65,74],xticks=[18,25,35,45,55,65,74],line_wi
    ...: dth=4,line_color="w").opts(tools=[hover]) 

Which resulted in:

enter image description here

So then I also tried:

In [13]: hover = HoverTool(tooltips=[("18-24","2130"),("25-34","4078"),("35-44","14582"),("45-54",
    ...: "22461"),("55-64","17048"),("65+","12990")]) 
    ...:  
    ...: df.hvplot.hist("age",bins=[18,25,35,45,55,65,74],xticks=[18,25,35,45,55,65,74],line_width
    ...: =4,line_color="w").opts(tools=[hover]) 

Which resulted in the following:

enter image description here

Is there a way to produce a histogram of df["age"], using hvplot.hist, where when you hover over a bin then you are presented with the corresponding age_band & Count of age_band?

Thanks


Solution

  • Setting by=['age_band'] should work and should show you that column when you hover:

    df.hvplot.hist(
        y='age',
        by=['age_band'],
        legend=False,
        color='lightblue',
        bins=[18,25,35,45,55,65,74],
        xticks=[18,25,35,45,55,65,74],
    )
    


    Although in the case you describe you could also choose to create a barplot on the value_counts:

    age_band_counts = df['age_band'].value_counts().sort_index()
    
    age_band_counts.hvplot.bar(bar_width=1.0)