Search code examples
pythonpandasdataframeplotlyhistogram

How to build a histogram from a pandas dataframe where each observation is a list?


I have a dataframe as follows. The values are in a cell, a list of elements. I want to visualize distribution of the values from the "Values" column using histogram"S" stacked in rows OR separated by colours (Area_code).

How can I get the values and construct histogram"S" in plotly? Any other idea also welcome. Thank you.

    Area_code   Values
0   New_York    [999, 54, 231, 43, 177, 313, 212, 279, 199, 267]
1   Dallas  [915, 183, 2326, 316, 206, 31, 317, 26, 31, 56, 316]
2   XXX     [560]
3   YYY     [884, 13]
4   ZZZ     [203, 1066, 453, 266, 160, 109, 45, 627, 83, 685, 120, 410, 151, 33, 618, 164, 496]

Solution

  • If you reshape your data, this would be a perfect case for px.histogram. And from there you can opt between several outputs like sum, average, count through the histfunc method:

    fig = px.histogram(df, x = 'Area_code', y = 'Values', histfunc='sum')
    fig.show()
    

    You haven't specified what kind of output you're aiming for, but I'll leave it up to you to change the argument for histfunc and see which option suits your needs best.

    enter image description here

    I'm often inclined to urge users to rethink their entire data process, but I'm just going to assume that there are good reasons why you're stuck with what seems like a pretty weird setup in your dataframe. The snippet below contains a complete data munginge process to reshape your data from your setup, to a so-called long format:

       Area_code  Values
    0   New_York     999
    1   New_York      54
    2   New_York     231
    3   New_York      43
    4   New_York     177
    5   New_York     313
    6   New_York     212
    7   New_York     279
    8   New_York     199
    9   New_York     267
    10    Dallas     915
    11    Dallas     183
    12    Dallas    2326
    13    Dallas     316
    14    Dallas     206
    15    Dallas      31
    16    Dallas     317
    17    Dallas      26
    18    Dallas      31
    19    Dallas      56
    20    Dallas     316
    21       XXX     560
    22       YYY     884
    23       YYY      13
    24       ZZZ     203
    

    And this is a perfect format for many of the great functionalites of plotly.express.

    Complete code:

    import plotly.graph_objects as go
    import plotly.express as px
    import pandas as pd
    
    # data input
    df = pd.DataFrame({'Area_code': {0: 'New_York', 1: 'Dallas', 2: 'XXX', 3: 'YYY', 4: 'ZZZ'},
                     'Values': {0: [999, 54, 231, 43, 177, 313, 212, 279, 199, 267],
                      1: [915, 183, 2326, 316, 206, 31, 317, 26, 31, 56, 316],
                      2: [560],
                      3: [884, 13],
                      4: [203, 1066, 453, 266, 160, 109, 45, 627, 83, 685, 120, 410, 151, 33, 618, 164, 496]}})
    
    # data munging
    areas = []
    value = []
    for i, row in df.iterrows():
    #     print(row['Values'])
            for j, val in enumerate(row['Values']):
                areas.append(row['Area_code'])
                value.append(val)
    df = pd.DataFrame({'Area_code': areas,
                       'Values': value})
    
    # plotly
    fig = px.histogram(df, x = 'Area_code', y = 'Values', histfunc='sum')
    fig.show()