How to graph semi-large datasets (~20k points) using pandas and matplotlib.pyplot? Are there better tools for graphing?

I'm trying to graph an imported csv file using pandas and matplotlib.pyplot. The csv file has 20k data points and for simplicity is linear. I have the following code:

import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv(r'/Users/ephemeralhappiness/Desktop/Packet/20kData.csv')
df = pd.DataFrame(data, columns=['Displacement Into Surface', 'Load On Sample'])
x = df['Load On Sample']
y = df['Displacement Into Surface']
plt.scatter(x, y)
plt.xlabel('Load On Sample')
plt.ylabel('Displacement Into Surface')
plt.show()

When I run the program, I get the following graphical output:

Displacement Into Surface vs Load On Sample

The graph has black marks along the axes and the 20k points are not spaced out at all. How to fix this?

Solution

I don't think the tool is the issue here:

dot spacing: if you have a screen with 1920x1080 pixel resolution, the diagonal should have sqrt(1920^2 + 1080^2) ~= 2200 pixels. So there is no way displaying ten times that many data points on a diagonal and them being distinctly recognisable.

What you can do is to initialise very large figures, and then plot small datapoint symbols. You can then magnify parts of the figure and see individual datapoints.

As for your code, when I run it with synthetic data (linear relationship between x and y, with 20000 datapoints), the axis labels work out nicely:

df = pd.DataFrame({'Load On Sample':np.arange(20000),
                  'Displacement Into Surface': 2*np.arange(20000)})


x = df['Load On Sample']
y = df['Displacement Into Surface']
plt.scatter(x, y, s=1)
plt.xlabel('Load On Sample')
plt.ylabel('Displacement Into Surface')
plt.show()

even if your code does not generate decent tickspacing for your dataset, you can adjust the xticks, yticks and the respective labels:

example for synthetic data:

plt.scatter(x, y, s=1)
plt.xlabel('Load On Sample')
plt.ylabel('Displacement Into Surface')

plt.gca().set_xticks([0,10000,20000])
plt.gca().set_yticks([10000,20000,30000,40000])

plt.show()