Search code examples
pythonpandasplotlyscatter-plotmedian

Python/Plotly: How to make each data point on Scatter plot represent median value?


Here is my dataset:

ob1=np.linspace(1, 10, 13).round(2).tolist()
ob2=np.linspace(10, 1, 12).round(2).tolist()
ob=ob1+ob2

ex_dic={'Vendor':['A','A','A','A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','B','B'],
       'Month':[1,1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,5,6,7,8,9,10,11,12],
       'Observation':ob
       }
ex_df=pd.DataFrame.from_dict(ex_dic)

It looks like this:

dataframe

Here is code for my Plotly visualization:

ex_month_list=ex_df.Month.unique().tolist()
ex_vendor_list=ex_df.Vendor.unique().tolist()

fig=go.Figure()

for i in ex_vendor_list:
    by_vendor_df=ex_df.loc[ex_df['Vendor']==i]
    fig.add_trace(go.Scatter(x=by_vendor_df.Month, y=by_vendor_df.Observation, name=str(i),
                             mode='lines+markers', marker_line_width=2, marker_size=8))

It will show something like this: Scatter plot Y-axis shows the observations (1-10), X-axis shows months (1-12)

Here is where the problem is:

enter image description here

I have tried applying median() here and there but cannot manage to make my plot represent median observations for each month... For example here is what I came up with so far (in terms of logic):

for i in vendor_list:
    vendor_df=some_df.loc[some_df['Vendor']==i]
    for m in month_list:
        month_df=vendor_df.loc[vendor_df['Month']==m]
        by_month_observations=month_df['Observation'].to_list()
        median_val=stat.median(by_month_observations)
        print(median_val)

Code above does return median values and it works all good, BUT now that some values went from 2 observations to 1 - I cannot append it back to dataframe since lengths are not the same anymore...Therefore, not sure if this is the best way to go with.

Please let me know by looking at the code above what is the smart way to go about this so that each datapoint that is printed is a median value for each month by vendor. Help is really appreciated!


Solution

  • Well, I figured myself the way to do it - simple use of .groupby() did the job!

    Here is the the df I used trying to solve my problem:

    some_dic={'Vendor':['A','A','A','A','B','B','B','B','B'],
           'Month':[6,7,8,8,6,7,8,8,8],
           'Observation':[1,2,3,4,10,8,6,3,1]
             }
    some_df=pd.DataFrame.from_dict(some_dic)
    

    Here is the code that generated successfully plot with median values:

    ...
    grouped_df=vendor_df.groupby(vendor_df.Month)[['Observation']].median()
    grouped_df.reset_index(inplace=True)
    ...