Search code examples
pythonmatplotlibgraph

Using 'where' when plotting using matplotlib. Why does it skip graphing data points?


I am working on plotting the minimums and maximumss of a set of data, plotted around the average, shown with the solid green line. The red line is a threshold value, I want to draw partcular attention to where the data crosses that line.

axis[1].fill_between(x, data[minvalues], data[maxvalues], alpha=0.3, interpolate=False, where=data[maxvalues]< threshold, color='green', edgecolor='none', step="post")
axis[1].fill_between(x, data[minvalues], data[maxvalues], alpha=0.3, interpolate=False, where=data[maxvalues]>=threshold, color='red', edgecolor='none', step="post")
for line in data[maxvalues] <= threshold:
    print(line)

This code outputs the following boolean values and graph:

False
False 
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
False
False
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
False
False
True
True

enter image description here

For some reason, matplotlib doesn't display bars in the joins between the two sets of data. You can see from the tick marks below the bars that it only displays one bar in red sections where it should display two.

If I try to set interpolate to True, matplotlib shows an angled shaded area, and I am trying to keep this chart square. If I omit the step value, it doesn't cut out data this way, as shown here, with interpolate set to True.

enter image description here

How do I get matplotlib to include the data it misses?


Solution

  • The fill_between method won't plot the bar at the transitions between True and False in the where argument.

    see: https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.fill_between.html

    Define where to exclude some horizontal regions from being filled. The filled regions are defined by the coordinates x[where]. More precisely, fill between x[i] and x[i+1] if where[i] and where[i+1]. Note that this definition implies that an isolated True value between two False values in where will not result in filling. Both sides of the True position remain unfilled due to the adjacent False values.

    You could add a method to produce additional True values for your where argument. In the example below I've had to make some assumptions for what your data looks like, but the gapfill function worked for the data I assumed.

    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    
    
    def gapfill(bool_array):
        normal = list(bool_array)
        shifted = [False] + normal[:-1]
        return np.array([x or shifted[n] for n, x in enumerate(normal)])
    
    
    threshold = 3
    data = pd.DataFrame({
        'maxvalues': [14, 15,  1, 2,  1, 2,  1, 2,  1, 2, 14, 15,  1, 2,  1, 2,  1, 2, 14, 15,  1, 2],
        'minvalues': [ 0,  0, -1, 0, -1, 0, -1, 0, -1, 0,  0,  0, -1, 0, -1, 0, -1, 0,  0,  0, -1, 0],
    })
    maxvalues = 'maxvalues'
    minvalues = 'minvalues'
    x = np.arange(len(data[maxvalues]))
    fig, axis = plt.subplots(2)
    
    axis[1].fill_between(x, data[minvalues], data[maxvalues], alpha=0.3, interpolate=False, where=gapfill(data[maxvalues] < threshold), color='green', edgecolor='none', step="post")
    axis[1].fill_between(x, data[minvalues], data[maxvalues], alpha=0.3, interpolate=False, where=gapfill(data[maxvalues] >= threshold), color='red', edgecolor='none', step="post")
    for line in data[maxvalues] <= threshold:
        print(line)
    
    plt.show()
    

    The resultant plot is: new plot