Search code examples
pythonpandasmatplotlibtime-seriesscatter-plot

How to add outliers as separate colored markers to a line plot


val             time
5.6     2021-11-18 03:00:00
2.034   2021-11-18 05:00:00
1.171   2021-11-18 07:00:00
3.023   2021-11-18 09:00:00
4.202   2021-11-18 16:00:00
1.202   2021-11-18 17:00:00
5.202   2021-11-18 18:00:00
7.202   2021-11-18 19:00:00
2.202   2021-11-18 20:00:00
12.202  2021-11-18 21:00:00
1.202   2021-11-18 21:00:00

above is my dataframe and i want to plot it (x=time,y=value), and make value plot red where (val>5).

plt.plot(ab['time'], ab['value'], '-gD', markevery=marks, label='line with select markers')

where marks [7.202,12.202] is a list which i created manually. but this does not work. error -: markevery is iterable but not a valid numpy fancy index

Line plot with different markers if condition is true python 3 i found one here, but if points are alot, this is time consuming


Solution

    • The easiest solution is to use Boolean indexing to create a separate dataframe for values greater then 5, and then plot them as a scatter plot with pandas.DataFrame.plot
    • The x-axis is formatted as %M-%d %H automatically. The format will change when there's more data, and there are other answers discussing how to format pandas datetime axis.
    import pandas as pd
    import matplotlib.pyplot as plt
    
    # sample data
    data = {'val': [5.6, 2.034, 1.171, 3.023, 4.202, 1.202, 5.202, 7.202, 2.202, 12.202, 1.202], 'time': ['2021-11-18 03:00:00', '2021-11-18 05:00:00', '2021-11-18 07:00:00', '2021-11-18 09:00:00', '2021-11-18 16:00:00', '2021-11-18 17:00:00', '2021-11-18 18:00:00', '2021-11-18 19:00:00', '2021-11-18 20:00:00', '2021-11-18 21:00:00', '2021-11-18 21:00:00']}
    df = pd.DataFrame(data)
    
    # convert the time column to a datetime dtype
    df.time = pd.to_datetime(df.time)
    
    # get the values greater than 5
    masked = df[df.val.gt(5)]
    
    # plot the line plot
    ax = df.plot(x='time', marker='o', figsize=(15, 5), zorder=0)
    
    # plot those greater than 5
    masked.plot(kind='scatter', x='time', y='val', color='red', ax=ax, s=30, label='outliers')
    

    enter image description here