Search code examples
pythonpandasplotlypandas-groupbyplotly-python

plot a bar chart using groupby function and plotly and streamlit


i am trying to plot a bar chart based on groupby function but once i try it crash and display the below error:

this error below appear when the user select 3 items from the multiselect widget.

ValueError: All arguments should have the same length. The length of argument color is 3, whereas the length of previously-processed arguments ['gender', 'count'] is 95

code:

some_columns_df = df.loc[:,['gender','country','city','hoby','company','status']]
some_collumns = some_columns_df.columns.tolist()

select_box_var= st.selectbox("Choose X Column",some_collumns)
multiselect_var= st.multiselect("Select Columns To GroupBy",some_collumns)  

test_g3 = df.groupby([select_box_var] + multiselect_var).size().reset_index(name='count')
fig = px.histogram(test_g3,x=select_box_var, y='count',color=multiselect_var ,barmode = 'group',text_auto = True)

            

I know the error is in the color parameter in the px.histogram


Solution

  • The reason is color only accepts one category.

    color=['column_a','column_b']
    

    Would cause

    ValueError: All arguments should have the same length. The length of argument color is 2, whereas the length of previously-processed arguments ['total_bill'] is 244

    2 is the length of list ['column_a','column_b'], while 244 is the dataframe's rows.


    According to the document:

    color (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to assign color to marks.

    Therefore, either we use a column_name, or we use a series. Here's my approach:

    import plotly.express as px
    df = px.data.tips() # a data set from plotly
    df.head()
    

    Output

      total_bill   tip     sex smoker  day    time  size
    0       16.99  1.01  Female     No  Sun  Dinner     2
    1       10.34  1.66    Male     No  Sun  Dinner     3
    2       21.01  3.50    Male     No  Sun  Dinner     3
    3       23.68  3.31    Male     No  Sun  Dinner     2
    4       24.59  3.61  Female     No  Sun  Dinner     4
    

    Column:
    sex with unique values Female and Male

    time with unique values Dinner and Lunch I choose these two columns, it's easier to figure out that there is only 4 combination.



    We create a series that concat columns sex and time

    categories = df[['sex','time']].agg(', '.join, axis=1)
    print(categories)
    

    Output

    0      Female, Dinner
    1        Male, Dinner
    2        Male, Dinner
    3        Male, Dinner
    4      Female, Dinner
                ...      
    239      Male, Dinner
    240    Female, Dinner
    241      Male, Dinner
    242      Male, Dinner
    243    Female, Dinner
    Length: 244, dtype: object
    

    Utilize this categories as color reference

    fig = px.histogram(df, x="total_bill", color =categories)
    fig.show()
    

    enter image description here



    If ','.join didn't work, having issue,

    categories = df[['sex','time']].agg(', '.join, axis=1)
    

    then we try another way

    categories = df['sex'] + df['time']
    

    Sup[1]

    enter image description hereenter image description here