Search code examples
pythonmatplotlibplotfrequencylines

How do I get rid of the vertical lines that appear after doing a relative frequency plot? (Python 3.5)


I'm currently trying to construct a relative frequency plot for the logarithmic returns that I obtain for some assets (shares specifically), where I've used a log10 scale on the y-axis using plt.yscale('log'). However, I obtain some vertical lines which clearly tend to infinity at certain points of my graph that I constructed on Python shown below:

This obviously shouldn't happen. Instead, it should look like this graph:

This is clearly similar to mine, except that it doesn't include the vertical lines on those points. My code is as follows:

plt.figure(1)
plt.figure(figsize=(9,7))
hist1, bins1 = np.histogram(returns_assetA_daily_mat, bins=20)
hist2, bins2 = np.histogram(returns_assetA_weekly_mat, bins=20)
hist3, bins3 = np.histogram(returns_assetA_monthly_mat, bins=20)
hist1 = hist1/len(returns_assetA_daily_mat)
hist2 = hist2/len(returns_assetA_weekly_mat)
hist3 = hist3/len(returns_assetA_monthly_mat)
bins1 = 0.5 * (bins1[1:] + bins1[:-1])
bins2 = 0.5 * (bins2[1:] + bins2[:-1])
bins3 = 0.5 * (bins3[1:] + bins3[:-1])
plt.plot(bins1, hist1, bins2, hist2, bins3, hist3)
plt.yscale('log')
plt.xlabel('Log-Returns')
plt.ylabel('Relative Frequency')
plt.title('Original for Asset A')
plt.show()

It's quite useful to know that

returns_assetA_daily_mat, returns_assetA_weekly_mat, returns_assetA_monthly_mat

are simply row arrays with values of the daily, weekly and monthly logarithmic returns of the assets which contain negative values, positive values and zeros too, so perhaps since I'm doing the log10 scale on the y-axis the negative values or zeros could be the cause of the underlying issue, since obviously as x tends to zero a logarithm will tend to minus infinity? Maybe the issue lies within my code structure? If there are no solutions to this issue is there any way where I can isolate those points which contain the vertical lines to tend to minus infinity so that they look like the isolated points instead? I'm a Python newbie who's currently learning it as part of my masters degree in Computational Finance, so any kind of help would be highly appreciated! Thanks a lot in advance!


Solution

  • In Matplotlib, one way to "remove" points visually in a plot is to set the affected points to numpy.nan. The effect of this is that the points before and after numpy.nan will show a gap which is what I believe you are after.

    Therefore for your arrays, find any values that are negative or 0 and set them to numpy.nan before plotting. Because you are computing histograms, these should never produce values that are negative so you are really only checking for bins that are equal to 0 instead.

    One thing you need to make sure is to change the type of your arrays to float. numpy.nan only exists for floating-point arrays.

    If you also want to plot each of them to add a legend, simply plot each array one at a time on the same figure by calling matplotlib.pyplot.plot three times then add your legend, then show the plot:

    plt.figure(1)
    plt.figure(figsize=(9,7))
    hist1, bins1 = np.histogram(returns_assetA_daily_mat, bins=20)
    hist2, bins2 = np.histogram(returns_assetA_weekly_mat, bins=20)
    hist3, bins3 = np.histogram(returns_assetA_monthly_mat, bins=20)
    hist1 = hist1/len(returns_assetA_daily_mat)
    hist2 = hist2/len(returns_assetA_weekly_mat)
    hist3 = hist3/len(returns_assetA_monthly_mat)
    bins1 = 0.5 * (bins1[1:] + bins1[:-1])
    bins2 = 0.5 * (bins2[1:] + bins2[:-1])
    bins3 = 0.5 * (bins3[1:] + bins3[:-1])
    
    # New code
    hist1_new = hist1.astype(np.float)
    hist2_new = hist2.astype(np.float)
    hist3_new = hist3.astype(np.float)
    hist1_new[hist1 <= 0] = np.nan
    hist2_new[hist2 <= 0] = np.nan
    hist3_new[hist3 <= 0] = np.nan
    
    # New - Plot the three graphs separately for making the legend
    # Also plot the NaN versions
    plt.plot(bins1, hist1_new, label='Daily')
    plt.plot(bins2, hist2_new, label='Weekly')
    plt.plot(bins3, hist3_new, label='Monthly')
    
    plt.yscale('log')
    plt.xlabel('Log-Returns')
    plt.ylabel('Relative Frequency')
    plt.title('Original for Asset A')
    plt.legend() # Added for the legend
    plt.show()
    

    I don't have your data, but I can show you a toy example of this working:

    # Import relevant packages
    import numpy as np
    import matplotlib.pyplot as plt
    
    # Create array from 0 to 8 for the horizontal axis
    x = np.arange(9)
    
    # Create test array with some zero, positive and negative values
    y = np.array([1, 2, 3, 0, -1, -2, 1, 2, -1])
    
    # Create a figure with two graphs in one row
    plt.subplot(1, 2, 1)
    
    # Graph the data normally
    plt.plot(x, y)
    
    # Visually remove those points that are zero or negative
    y2 = y.astype(np.float)
    y2[y2 <= 0] = np.nan
    
    # Plot these points now
    plt.subplot(1, 2, 2)
    plt.plot(x, y2)
    
    # Adjust the x and y limits (see further discussion below)
    plt.xlim(0, 8)
    plt.ylim(-1, 3)
    
    # Show the figure
    plt.show()
    

    Note that the second plot I enforce that the x and y limits are the same as the first plot because we are visually removing the points and so the axes will automatically adjust. We get:

    enter image description here