Search code examples
pythonlinuxwindowspython-multiprocessing

Multiprocessing issue when working on windows machine


I have a simple multiprocessing task for writing to a csv file. Program takes around 40k rows from another file, processes those data and writes them to another file. My code looks like this:

create_queue_infile(csv_file, q, opt)
pool = multiprocessing.Pool(processes=(multiprocessing.cpu_count() - 1))
while not (q.empty()):
    res = pool.apply_async(my_function, args=(q.get(), input2, 5, output,))
pool.close()
pool.join()

And the part to write to another csv file looks like this:

def write_to_csv(path, csv_row):
    with open(path, 'a', newline='', encoding="utf-8") as f:
        f.write(csv_row)

This works flawlessly on my Linux machine on any scale. However when I want to run this program on a Windows machine, my output file seems to have some corrupted lines. It almost seems like processes overlap while trying to write to the same file. A sample output looks like this:

ROW SOME_INFORMATION
ROW SOME_INFORMATION
ROW SOME_INFORMATION
SOM
ROW SOME_INFORMATION
ROW SOME_INFORMATION
ROW SOME_INFORMATION
FORMATION
ROW SOME_INFORMATION
ROW SOME_INFORMATION

Since I have a lot of data, it is impractical to keep track of every row, so I am trying to figure out the reason behind this problem. My actual worry is why this works on Linux but not on Windows.


Solution

  • Since I can't comment yet, I will post my comment as an answer.

    There are some important differences between Windows and Linux with how Python multiprocessing works. One of the biggest differences is between forking processes and spawning processes. If you do an internet search on "python multiprocessing between Windows and Linux" you will get several blog posts, forum threads, etc... that discuss the topic.

    I also encourage you to check out the following SO thread: Python multiprocessing safely writing to a file. In general, it is not a good practice to have multiple, simultaneous processes writing to the same file at the same time. Some good code structure examples are given in the SO thread.