Search code examples
pythonmatplotlibplotgraphdata-science

Scaling a dataset in matplotlib on x and y axis relative to another dataset?


I am trying to scale two different sets of data to be visually equivalent.

Green data set has extreme Y values and significantly more data points. Hence Orange data set falls flat and short.

What functions exist that allow me to scale them equivalently with one another?

*Future viewers: 'MinMax normalization' is one method as mentioned by the responses.

Issue


Solution

  • You can do this by squeezing the values between 0 and 1.

    import numpy as np
    import matplotlib.pyplot as plt
    
    # Define the green and orange data sets
    green_data = np.random.normal(50, 10, 100)
    orange_data = np.random.normal(25, 5, 10)
    
    # Normalize the data sets using min-max scaling
    green_data_normalized = (green_data - np.min(green_data)) / (np.max(green_data) - np.min(green_data))
    orange_data_normalized = (orange_data - np.min(orange_data)) / (np.max(orange_data) - np.min(orange_data))
    
    # Plot the normalized data sets
    plt.plot(green_data_normalized, label='Green Data')
    plt.plot(orange_data_normalized, label='Orange Data')
    plt.legend()
    plt.show()
    

    normalized graphs

    Edit: If you want to be able to get the orange values to have the same x-width as your green values, you can draw a straight line between each point, and use the midpoint to figure out what data point should go in between. This will widen the line by creating more data points, numpy has this built in with np.interp (short for interpolate).

    import numpy as np
    import matplotlib.pyplot as plt
    
    # Define the green and orange data sets
    green_data = np.random.normal(50, 10, 100)
    orange_data = np.random.normal(25, 5, 10)
    
    # Define the x-values for the original and extended orange data
    x_orange_original = np.linspace(0, 1, len(orange_data))
    x_orange_extended = np.linspace(0, 1, len(green_data))
    
    # Interpolate the orange data to extend it
    orange_data_extended = np.interp(x_orange_extended, x_orange_original, orange_data)
    
    # Normalize the data sets using min-max scaling
    green_data_normalized = (green_data - np.min(green_data)) / (np.max(green_data) - np.min(green_data))
    orange_data_normalized = (orange_data_extended - np.min(orange_data_extended)) / (np.max(orange_data_extended) - np.min(orange_data_extended))
    
    # Plot the normalized data sets
    plt.plot(green_data_normalized, label='Green Data')
    plt.plot(orange_data_normalized, label='Orange Data')
    plt.legend()
    plt.show()
    

    enter image description here