Search code examples
pythonmatplotlibplot

Why do plot tick labels overlap only for Imported CSV data?


When I create a simple matpltlib plot with numpy arrays, the tick labels are well-behaved, chosen intelligently to not overlap and spaced to span the data range evenly.

However when I imported data into numpy arrays, the tick labels are a mess. It appears that it has added a tick label for each datapoint, rather than auto-generating a sensible scale.

Why is my data causing it to not be automatic?
How do I get MPL to do this automatically for real-world data with irregularly-spaced X/Y data?


"Normal" behavior:


import matplotlib.pyplot as plt
import numpy as np
import numpy.random as rnd

x = np.array(  range(1000000)  )
y = rnd.rand(1,1000000)[0]

fig, ax = plt.subplots()
ax.plot(x,y)

Resulting plot, as expected: Plot of 2 numpy arrays, sensible axes


Real-world data with non-equally-spaced X-axis data, imported from file.

Snippet of data file:

-1900.209922,-106.022
-1900.176409,-103.902
-1900.142897,-112.337
-1900.109384,-109.252
...

Plotting script:

import numpy as np
import matplotlib.pyplot as plt
import csv

# Read CSV file
with open(r"graph.csv", encoding='utf-8-sig') as fp:
    reader = csv.reader(fp, delimiter=",", quotechar='"', )
    data_read = [row for row in reader]
#end with file

d = np.array(data_read).T   # transpose

x = d[0][0:10]
y = d[1][0:10]

fig, ax = plt.subplots()
ax.plot( x, y, "." )
fig.show()

I get messy tick labels: collided x-tick labels

Zoomed in, you can see it added ticks at exactly my data points: messy x-ticks, zoomed in


If I change the X-data to a linear array, then it auto-ticks the x-axis, putting labels at intuitive locations (not at datapoints):

y = d[1][0:100]
x = range( len(y) )   # integer x-axis points

fig, ax = plt.subplots()
ax.plot( x, y, "." )
fig.show()

integer x-axes

By the way, even if I load 20,000 data points, such that the y axis spans from -106 --> -88 (in case the values were too closely spaced), the y-axis labels still collide:

y[-1]
Out[31]: '-88.109'

y[0]
Out[32]: '-106.022'

20,000 datapoints

Ultimately I'll be loading a large number of datapoints (200,000), so need this solved.


Solution

  • This is an extremely common issue and it occurs when plotting string data rather than numeric. Essentially, the csv package is reading in the data as strings rather than numbers. You can fix this by following this answer, but you have two other options.

    1. Use pandas.read_csv to read in the data properly. It also has plotting methods. It is the recommended way to handle numeric csv data.
    2. Convert d to be a numpy array of floats as shown below.
    # when creating d
    d = np.array(data_read, dtype=float).T
    
    # if d is already created
    d = d.astype(float)
    

    P.S. You can save figures using plt.savefig("filename.png") rather than taking screenshots of them.