Search code examples
pythonpandasmatplotlibplotscatter-plot

plotting scatter plot in python clearly with each tick visible


I have around 45 csv files each containing two columns with 13,000 entries I want to plot all these csv files together in a single scatter plot and dont want the ticks of the scatter plot to overlap with each other i want all the ticks to be visible clearly in the scatter plot.

I am attaching the core below where i am combining all these scatter plots and plotting them in a single plot.the output is a scatter plot which is every clumsy to understand the reason i am asking for all the points to be clearly visible without overlapping is study them for my research.

import os
import matplotlib.pyplot as plt
import pandas as pd

csv_directory = "Allplots/graphs"

csv_files = [file for file in os.listdir(csv_directory) if file.endswith(".csv")]

plt.figure(figsize=(12, 8))

all_indices = []
all_accuracies = []

for csv_file in csv_files:
    file_path = os.path.join(csv_directory, csv_file)
    df = pd.read_csv(file_path)
    
    bit_index = df[" Index"]
    accuracy = df["Accuracy"]
    
    all_bit_indices.extend(index)
    all_accuracies.extend(accuracy)

plt.scatter(all_indices, all_accuracies, s=10)

plt.xlabel("Index (Millions)", fontsize=12)
plt.ylabel("Accuracy", fontsize=12)
plt.title("Scatter Plot of Accuracy vs. Bit Index", fontsize=14)

# Save the plot as a PNG file
output_path = os.path.join(csv_directory, "scatter_plot.png")
plt.savefig(output_path)

plt.show()

scatter plot

I also want to plot these CSV files serailly say the first portion of graph should have points from csv1 file then csv2 file then so on i dont want the points of all csv files to mix up.


Solution

  • You can play with the various marker parameters in pyplot.scatter. I recommend lowering s for smaller marker size, changing marker = '.' (default is 'o') for a smaller marker shape, and/or adjusting edgecolors = None or edgecolors = 'face' so the marker doesn't have an obvious outline.

    For example: plt.scatter(all_indices, all_accuracies, s=1, marker = '.', edgecolors = 'face')