Search code examples
pythonmatplotlibscatter-plot

xlabel and ylabel values are not sorted in matplotlib scatterplot


I have done tedious amounts of searching on the internet and it seems that I have not been able to figure out how to ask the right question to get the answer for what I want to do.

I am trying to create a scatterplot with P/E ratio on the y-axis and Dividend Yield on the x-axis. I put the data into a CSV file and then imported each column into Python as individual lists.

Here is how my scatterplot turns out below. I am confused why the x- and y- axes are not sorted numerically. I think I have to turn the elements within the list into floats and then do some sort of sort before turning it into a scatterplot.

The other option I can think of is being able to sort the values in the process of creating the scatterplot.

Neither of these have worked out and I have reached a dead end. Any help or pointing in the right direction would be much appreciated as I can only describe my problem, but don't seem to be able to be asking the right questions in my search.

import csv
import matplotlib.pyplot as plt

etf_data = csv.reader(open('xlv_xlu_combined_td.csv', 'r'))

for i, row in etf_data.iterrows():
    symbol.append(row[0])
    index.append(row[1])
    dividend.append(row[2])
    pe.append(row[3])

symbol.pop(0)
index.pop(0)
dividend.pop(0)
pe.pop(0)

indexes = [i.split('%', 1)[0] for i in index]
dividend_yield = [d.split('%', 1)[0] for d in dividend]
pe_ratio = [p.split('X', 1)[0] for p in pe]

x = dividend_yield[:5]
y = pe_ratio[:5]

plt.scatter(x, y, label='Healthcare P/E & Dividend', alpha=0.5)
plt.xlabel('Dividend yield')
plt.ylabel('Pe ratio')
plt.legend()
plt.show()

enter image description here

xlv_xlu_combined_td.csv

symbol,index,dividend,pe
JNJ,10.11%,2.81%,263.00X
UNH,7.27%,1.40%,21.93X
PFE,6.48%,3.62%,10.19X
MRK,4.96%,3.06%,104.92X
ABBV,4.43%,4.01%,23.86X
AMGN,3.86%,2.72%,60.93X
MDT,3.50%,2.27%,38.10X
ABT,3.26%,1.78%,231.74X
GILD,2.95%,2.93%,28.69X
BMY,2.72%,2.81%,97.81X
TMO,2.55%,0.32%,36.98X
LLY,2.49%,2.53%,81.83X

Solution

    • The issue is that the values are string type, so they are plotted in the order given in the list, not in numeric order.
    • The values must have the symbols removed from the end, and then converted to a numeric type.

    Add-on to existing code using csv module

    • Given the existing code, it would be easy to map() the values in the lists to a float type.
    indexes = [i.split('%', 1)[0] for i in index]
    dividend_yield = [d.split('%', 1)[0] for d in dividend]
    pe_ratio = [p.split('X', 1)[0] for p in pe]
    
    # add mapping values to floats after removing the symbols from the values
    indexes = list(map(float, indexes))
    dividend_yield = list(map(float, dividend_yield))
    pe_ratio = list(map(float, pe_ratio))
    
    # plot
    x = dividend_yield[:5]
    y = pe_ratio[:5]
    
    plt.scatter(x, y, label='Healthcare P/E & Dividend', alpha=0.5)
    plt.xlabel('Dividend yield')
    plt.ylabel('Pe ratio')
    plt.legend(bbox_to_anchor=(1, 1), loc='upper left')
    plt.show()
    

    Using pandas

    • Remove the symbol from the end of the strings in the columns with col.str[:-1]
    • Convert the columns to float type with .astype(float)
    • Using pandas v1.2.4 and matplotlib v3.3.4
    • This option reduces the required code from 23 lines to 4 lines.
    import pandas as pd
    
    # read the file
    df = pd.read_csv('xlv_xlu_combined_td.csv')
    
    # remove the symbols from the end of the number and set the columns to float type
    df.iloc[:, 1:] = df.iloc[:, 1:].apply(lambda col: col.str[:-1]).astype(float)
    
    # plot the first five rows of the two columns
    ax = df.iloc[:5, 2:].plot(x='dividend', y='pe', kind='scatter', alpha=0.5,
                              ylabel='Dividend yield', xlabel='Pe ratio',
                              label='Healthcare P/E & Dividend')
    ax.legend(bbox_to_anchor=(1, 1), loc='upper left')
    

    Plot output of both implementations

    • Note the numbers are now ordered correctly.

    enter image description here