Search code examples
pythonpandasdataframematplotlibtimeline

matplotlib scatter plotting with noncontiguous yaxis ticks with datatype as integer


My question: while plotting x and y values from a dataframe, if we have y values as discrete numbers say, id_number or category. if we use scatter plot, it will give linearly spaced yaxis ticks which may have large vertical spacing in between the plotted values depending on how much spaced our original values are.

what i required is to plot some category values ( fixed discrete values ) against the time events ( xaxis ) in a scatter plot, but the values in the table are just integer not strings. As i don't have any deep idea how to do this, the following is what i have achieved, but with modified original table with string values. Here is my testing data ( original data is large )

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mtic
import matplotlib.category as mcat

np.random.seed(432987435)

nofpoints = 160

xval = np.arange(nofpoints)
disc = [ 200, 240, 250, 290 ]

yval = np.random.choice( disc , nofpoints)
yval_str = yval.astype(str)
yval , yval_str

cval = np.random.random( nofpoints )
df = pd.DataFrame( { 'xval': xval , 'yval':yval , 'cval': cval })
df_str = pd.DataFrame( { 'xval': xval , 'yval':yval_str , 'cval': cval })

using usual plotting method

fig = plt.figure(dpi=128 , figsize=(12,6))
ax1 = fig.add_subplot(111) 
# here we are using the original dataframe(df), without any string field inside.
#ax1.grid(True)
ax1.scatter( 'xval' , 'yval' , data=df , marker='o', facecolor='None' , edgecolor='g')
plt.show()

this is what we get usual_scatter_plotting see the large spacing between the values and each plot point is not against the tick values. (I don't want to use legend to show the category using colourmap, since it is preserved for some other purpose) with modified dataframe having string as yaxis value

fig = plt.figure(dpi=128 , figsize=(12,6))
ax2 = fig.add_subplot(111) 
# dataframe used is modified one with a string field inside.
# as we can see the order is shuffled.
ax2.scatter( 'xval' , 'yval' , data=df_str , marker='o', facecolor='None' , edgecolor='k')
plt.show()

with string values in dataframe to avoid shuffling

fig = plt.figure(dpi=128 , figsize=(12,6))
ax3 = fig.add_subplot(111) 
# to maintain the same order and avoid shuffling we used matplotlib.category
#ax3.grid(True)
disc_str = [ str(x) for x in disc ]
units = mcat.UnitData(sorted(disc_str))
ax3.yaxis.set_units(units)
ax3.yaxis.set_major_locator( mcat.StrCategoryLocator(units._mapping))
ax3.yaxis.set_major_formatter( mcat.StrCategoryFormatter(units._mapping))
ax3.scatter( 'xval' , 'yval' , data=df_str , marker='o', facecolor='None' , edgecolor='y')
plt.show()

with sorted y axis values

Is there any way to achieve this, without modifying the original table, i mean to plot integer category values as yaxis values.


Solution

  • You can do it by replacing ax1.scatter with seaborn.stripplot:

    sns.stripplot(ax = ax1, data = df, x = 'xval', y = 'yval_str', marker = 'o', color = 'white', edgecolor = 'green', linewidth = 1)
    

    Before you do that, if you want y axis in a particular order, you should sort your df:

    df = pd.DataFrame({'xval': xval, 'yval': yval, 'yval_str': yval_str, 'cval': cval}).sort_values(by = 'yval', ascending = False)
    

    Complete Code

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    np.random.seed(432987435)
    
    nofpoints = 160
    
    xval = np.arange(nofpoints)
    disc = [200, 240, 250, 290]
    
    yval = np.random.choice(disc, nofpoints)
    yval_str = yval.astype(str)
    
    cval = np.random.random(nofpoints)
    df = pd.DataFrame({'xval': xval, 'yval': yval, 'yval_str': yval_str, 'cval': cval}).sort_values(by = 'yval', ascending = False)
    
    fig = plt.figure(dpi = 128, figsize = (12, 6))
    ax1 = fig.add_subplot(111)
    sns.stripplot(ax = ax1, data = df, x = 'xval', y = 'yval_str', marker = 'o', color = 'white', edgecolor = 'green', linewidth = 1)
    plt.show()
    

    enter image description here

    If you want perfectly horizontally aligned points, you have to pass jitter = False to sns.stripplot:

    sns.stripplot(ax = ax1, data = df, x = 'xval', y = 'yval_str', marker = 'o', color = 'white', edgecolor = 'green', linewidth = 1, jitter = False)
    

    enter image description here