Search code examples
python-multiprocessingpython-logging

Python multiprocessing, logging to different files


I would like to run a code on n processes, and have the logs from each process in a separate file.

I tried, naively, sthing like this

from multiprocessing import Process
import logging


class Worker(Process):
    def __init__(self, logger_name, log_file):
        super().__init__()
        self.logger = logging.getLogger(logger_name)
        self.log_file = log_file
        self.logger.addHandler(logging.FileHandler(log_file))
        print("from init", self.logger, self.logger.handlers)

    def run(self) -> None:
        print("from run", self.logger, self.logger.handlers)


if __name__ == '__main__':
    p1 = Worker("l1", "log1")
    p1.start()

(tried in python 3.9 and 3.11) but from some reason, the handler is gone. This is the output:

from init <Logger l1 (WARNING)> [<FileHandler log1 (NOTSET)>]
from run <Logger l1 (WARNING)> []

Why is the FileHandler gone? Should I use the AddHandler within the run method -- is it a correct way?

I was trying to use this answer but couldn't make it really work.

For the moment, I solved it via defining the handlers in run but it seems like a dirty hack to me...

UPDATE: This happens on my MacBook python installations. On a linux server, I couldn't reproduce this. Very confusing.

In either case, the question is probably:

"Is this the correct way to log to files, with several copies of one process?"


Solution

  • I found the reason for the observed behavior. It has to do with pickling of objects when they are transferred between Processes.

    In the standard library's implementation of Logger, a __reduce__ method is defined. This method is used in cases where an object cannot be reliably pickled. Instead of trying to pickle the object itself, the pickle protocol instead uses the returned value from __reduce__. In the case of Logger, __reduce__ returns a function name (getLogger) and a string (the name of the Logger being pickled) to be used as an argument. In the unpicking procedure, the unpickling protocol makes a function call (logging.getLogger(name)); the result of that function call becomes the unpickled Logger instance.

    The original Logger and the unpickled Logger will have the same name, but perhaps not much else in common. The unpickled Logger will have the default configuration, whereas the original Logger will have any customization you may have performed.

    In Python, Process objects do not share an address space (at least, not on Windows). When a new Process is launched, its instance variables must somehow be "transferred" from one Process to another. This is done by pickling/unpickling. In the example code, the instance variables declared in the Worker.__init__ function do indeed appear in the new Process, as you can verify by printing them in Worker.run. But under the hood Python has actually pickled and unpickled all of the instance variables, to make it look like they magically have migrated to the new Process. In the vast majority of cases this works just fine. But not necessarily if one of those instance variables defines a __reduce__ method.

    A logging.FileHandler cannot, I suspect, be pickled since it uses operating system resources (a file). This is probably the reason (or at least one of the reasons) why Logger objects can't be pickled.