I have a dataframe as follows. The values are in a cell, a list of elements. I want to visualize distribution of the values from the "Values" column using histogram"S" stacked in rows OR separated by colours (Area_code).
How can I get the values and construct histogram"S" in plotly? Any other idea also welcome. Thank you.
Area_code Values
0 New_York [999, 54, 231, 43, 177, 313, 212, 279, 199, 267]
1 Dallas [915, 183, 2326, 316, 206, 31, 317, 26, 31, 56, 316]
2 XXX [560]
3 YYY [884, 13]
4 ZZZ [203, 1066, 453, 266, 160, 109, 45, 627, 83, 685, 120, 410, 151, 33, 618, 164, 496]
If you reshape your data, this would be a perfect case for px.histogram. And from there you can opt between several outputs like sum, average, count
through the histfunc
method:
fig = px.histogram(df, x = 'Area_code', y = 'Values', histfunc='sum')
fig.show()
You haven't specified what kind of output you're aiming for, but I'll leave it up to you to change the argument for histfunc
and see which option suits your needs best.
I'm often inclined to urge users to rethink their entire data process, but I'm just going to assume that there are good reasons why you're stuck with what seems like a pretty weird setup in your dataframe. The snippet below contains a complete data munginge process to reshape your data from your setup, to a so-called long format:
Area_code Values
0 New_York 999
1 New_York 54
2 New_York 231
3 New_York 43
4 New_York 177
5 New_York 313
6 New_York 212
7 New_York 279
8 New_York 199
9 New_York 267
10 Dallas 915
11 Dallas 183
12 Dallas 2326
13 Dallas 316
14 Dallas 206
15 Dallas 31
16 Dallas 317
17 Dallas 26
18 Dallas 31
19 Dallas 56
20 Dallas 316
21 XXX 560
22 YYY 884
23 YYY 13
24 ZZZ 203
And this is a perfect format for many of the great functionalites of plotly.express.
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
# data input
df = pd.DataFrame({'Area_code': {0: 'New_York', 1: 'Dallas', 2: 'XXX', 3: 'YYY', 4: 'ZZZ'},
'Values': {0: [999, 54, 231, 43, 177, 313, 212, 279, 199, 267],
1: [915, 183, 2326, 316, 206, 31, 317, 26, 31, 56, 316],
2: [560],
3: [884, 13],
4: [203, 1066, 453, 266, 160, 109, 45, 627, 83, 685, 120, 410, 151, 33, 618, 164, 496]}})
# data munging
areas = []
value = []
for i, row in df.iterrows():
# print(row['Values'])
for j, val in enumerate(row['Values']):
areas.append(row['Area_code'])
value.append(val)
df = pd.DataFrame({'Area_code': areas,
'Values': value})
# plotly
fig = px.histogram(df, x = 'Area_code', y = 'Values', histfunc='sum')
fig.show()