Search code examples
pythonpython-multiprocessingpymysql

pymysql with multiprocessing throw typeerror


I used pymysql and multiprocessing.

Please refer below code.

import os
from multiprocessing import Pool
import pymysql


class Test:
    def __init__(self):
        self.connection = pymysql.connect(host='host',
                             user='use',
                             password='pwd',
                             db='db',
                             charset='utf8mb4',
                             cursorclass=pymysql.cursors.DictCursor)

    def print(self, args):
        print("[Pid:{}] {}".format(os.getpid(), args))
        return args

    def run(self):
        with Pool(processes=4) as p:
            print(p.map(self.print, [1,2]))


if __name__ == '__main__':
    test = Test()
    test.run()

I defined mysql connection in __init__ to reuse it.

But when I execute it, throw errors.

TypeError: cannot serialize '_io.BufferedReader' object

Question1.

  • If using pymysql with multiprocessing, should I create multi connection and use it instead of reuse one connection?

Question2.

  • Why above error occured?

Thanks.


Solution

  • Since multiprocessing spins up multiple processes to run your code, it tries to serialize the data for transferring from the parent process to the children. The error is probably coming when it tries to serialize self.connection - which is the pymysql.connection object.

    The simplest fix would be to use a different connection in each process. For as long as you have control over the count of processes being spun up, it should be okay.