Search code examples
pythonmultithreadingpython-multiprocessingpython-multithreading

Mult-thread an Operation with loop througt the csv file in Python


I have a CSV file parser script in Python to do some stuff with a big CSV file. There is around 1 mil. rows, so the process takes some time.

import csv
import sys

with open('csvfeed.csv', newline='', encoding='utf-8') as csvfile:
    reader = csv.reader(csvfile, delimiter=';', quotechar='|')
    for row in reader:
        ParserFunction(row)

def ParserFunction(row):
    #Some logic with row

Is there a way to multi-thread this loop function, to lower the execution time?

Thanks


Solution

  • You can divide each row to be processed with a single thread instead of the main thread waiting for the previous row to finish processing to proceed with the next row:

    import csv
    import sys
    import threading
    def ParserFunction(row):
        #Some logic with row
        pass
    
    with open('csvfeed.csv', newline='', encoding='utf-8') as csvfile:
        reader = csv.reader(csvfile, delimiter=';', quotechar='|')
        for row in reader:
            threading.start_new_thread(ParserFunction, row)
        
    

    But the exact way of doing so requires knowing what is the logic exactly you want to do with each row and whether it depends on other rows or not