Search code examples
pandasboxplotmarkerscatter

Pandas Boxplot , also include most recent value with a marker


I have a boxplot I would like to display from a timeseries with 5 different categorical variables (different types of oil).

How do I display the most recent value with a marker, displayed over the relevant box plots. In my example, I have the variable maxDate to show the most recent value of each oil type.

import pandas as pd
import seaborn as sns

# read Data Files, create data frame for all products
dfBr = pd.read_excel (r'\filepath.xlsx',
                   skiprows=1,
                   usecols=(0,1,13,14))
dfCb = pd.read_excel (r'\filepath.xlsx',
                   skiprows=1,
                   usecols=(0,1,13,14))
sns.set_style('whitegrid')
total = [dfBr,dfCb]
df = pd.concat(total)

df.columns =['Commodity', 'Date', 'mmLong', 'mmShort']
df.tail() 

df['Net_OI']=df['mmLong']-df['mmShort']
df['LS_Ratio']=df['mmLong']/df['mmShort']

df=df[df['Date'] > 180600]

df['Commodity'] = df['Commodity'].replace(['CRUDE OIL, LIGHT SWEET - NEW YORK MERCANTILE EXCHANGE',
                                           'ICE Brent Crude Futures - ICE Futures Europe',
                                           'CRUDE OIL, LIGHT SWEET-WTI - ICE FUTURES EUROPE',
                                           'GASOLINE BLENDSTOCK (RBOB)  - NEW YORK MERCANTILE EXCHANGE',
                                           '#2 HEATING OIL- NY HARBOR-ULSD - NEW YORK MERCANTILE EXCHANGE']
                                          ,['WTI',
                                            'BRN',
                                            'ICE',
                                            'RBOB',
                                            'HO'])
maxDate = df.Date.iloc[-1]

currentWTI = df.loc[ (df['Commodity'] == 'WTI') & (df['Date'] == maxDate)]
currentBrn = df.loc[ (df['Commodity'] == 'BRN') & (df['Date'] == maxDate)]
currentIce = df.loc[ (df['Commodity'] == 'ICE') & (df['Date'] == maxDate)]
currentRb = df.loc[ (df['Commodity'] == 'RBOB') & (df['Date'] == maxDate)]
currentHo = df.loc[ (df['Commodity'] == 'HO') & (df['Date'] == maxDate)]


fig, ax = plt.subplots(figsize=(8,8))
#sns.boxplot(x='Net_OI', y='Market_and_Exchange_Names', data=three_yr_df);
sns.boxplot(x=df.LS_Ratio, y=df.Commodity);

plt.scatter(currentBrn.LS_Ratio, 0,marker='*', s=350, color='orange');

Currently the end result looks like this, but I would like to show a boxplot of all 5 items with a Star marker on each of the 5 items.

Any help would be much appreciated.

Boxplot with 1 marker


Solution

  • Since no data was presented, I used the boxplot in the official reference as an example to code your assignment. The key is to use the maximum value in each category variable as an array of where to place the stars.

    import seaborn as sns
    import matplotlib.pyplot as plt
    
    sns.set_theme(style="whitegrid")
    tips = sns.load_dataset("tips")
    
    fig, ax = plt.subplots()
    ax = sns.boxplot(x="day", y="total_bill", data=tips)
    star = tips[['day','total_bill']].groupby('day').max()
    ax.scatter(star.index, star.total_bill, marker='*', s=350, color='orange')
    plt.show()
    

    enter image description here