Search code examples
pythonmatplotlibscatter-plot

How to color scatterplot based on the values of y axis


Hello everyone I made a scatter plot based on two lists. Now I want to color the scatter plot based on the y-axis value. For example, if the value on the y-axis is greater than 30000 I want to color it red and rest all values blue? What's the best way to do this


Solution

  • If you are using Numpy's ndarrays, it's simpler

    import numpy as np
    import matplotlib.pyplot as plt
    
    # test data
    y = np.random.randint(2800, 3100, size=(100,))
    x = np.arange(0, 100)
    
    # create a Boolean array (a mask), possibly negate it using the "~" unary operator
    ygt3000 = y>3000
    plt.scatter(x[~ygt3000], y[~ygt3000], color='blue')
    plt.scatter(x[ygt3000], y[ygt3000], color='red')
    

    if you are using real lists, it's a bit more complicated, but can be done using list comprehensions

    x = x.tolist()
    y = y.tolist()
    
    ygt3000 = [val>3000 for val in y]
    plt.scatter([xv for xv, ygt in zip(x, ygt3000) if not ygt],
                [yv for yv, ygt in zip(y, ygt3000) if not ygt], color='blue') 
    plt.scatter([xv for xv, ygt in zip(x, ygt3000) if ygt],
                [yv for yv, ygt in zip(y, ygt3000) if ygt], color='red') 
    

    Here it is the result of the code above when applied to two sequences of random numbers.

    enter image description here


    August 2021, because Trenton McKinney made a beautiful edit (thank you Trenton) this post came again to my attention, and I saw the light

    plt.scatter(x, y, c=['r' if v>3000 else 'b' for v in y])
    

    Just a day later, I realized that a similar feat can be used with Numpy, taking advantage of advanced indexing

    plt.scatter(x, y, c=np.array(('b','r'))[(y>3000).astype(int)])
    

    but honestly I prefer the two-pass approach I've used previously, because it's more to the point and conveys much more meaning. Or, in other words, the latter looks obfuscated code...