Search code examples
pythonloopscsvgraphcycle

Reading multiple CSV files to plot multiple curves on same graph with Python


I run several numerical computations and the results of each computation are stored in a .csv file. Lets say data1.csv, data2.csv, data3.csv, .etc that are composed of 4 columns. I would like to read column 2 and 4 of several csv files and plot the curves presenting column 4 as a function of column 2 of the same graph to compare numerical computations.

I currently succeed in plotting 1 curve but not to automatize the procedure for n .csv file.

Here is my code :

x = []
y = []
path='/ref_path/'
calcul_id='ref_computation/'
file='data1.csv'
file_in=path+calcul_id+file
with open(file_in,'r') as csvfile:
    plots = csv.reader(csvfile, delimiter = ',')
    for row in plots:
        x.append(float(row[1]))
        y.append(float(row[3]))
  
plt.plot(x,y)
plt.show()

Can you help me ? Thanks a lot !


Solution

  • Essentially, what you want to do is to iterate over the folder containing all your csv files. You can use the glob module, which is a part of Python standard library.

    Your code will look something like this :

    import glob
    import csv
    import matplotlib.pyplot as plt
    
    directory_countaining_csv_files = '...'
    
    number_of_files = len(glob.glob(f'{directory_countaining_csv_files}/data*.csv'))
    
    for filepath in glob.iglob(f'{directory_countaining_csv_files}/data*.csv'):
        x = []
        y = []
        with open(filepath,'r') as csvfile:
            plots = csv.reader(csvfile, delimiter = ',')
            for row in plots:
                x.append(float(row[1]))
                y.append(float(row[3]))
    
        plt.plot(x, y, label=f'{filepath}')
    
    #Get labels from legends 
    handles, labels = plt.gca().get_legend_handles_labels()
    
    #specify order of items in legend
    order = [i for i in range(number_of_files)]
    
    plt.legend([handles[idx] for idx in order],[labels[idx] for idx in order]) 
    
    plt.show()
    
    

    The argument 'directory_countaining_files/data*.csv' will make sure that glob.iglob will return every csv file that starts with "data". I advise you to take a look at python documentation : https://docs.python.org/fr/3.6/library/glob.html

    I added a way to order legends in the final plot, i found the idea from this example : https://www.statology.org/matplotlib-legend-order/ .

    This implementation can be awkward, 2 other ways to do it would be :

    1. Sort files inside your folder by hand.
    2. Use glob.glob() instead of glob.iglob().

    glob.glob() will return a list of csv files in your directory. You can sort this list and iterate over it, the rest of the code will be the same.

    list_csv = glob.glob(f'{directory_countaining_csv_files}/data*.csv')
    list_csv.sort()
    
    for file in liste_csv:
       x=[]
       y=[]
        ... same code as before ...