Search code examples
pythonmultiprocessingpython-multiprocessing

is it safe to leave out "if __name__ == '__main__'" statement for multiprocessing in python under unix?


I am trying to implement a felxible pipeline in python, that i have split up into several modules. Each of these modules can be used as a standalone tool, but they may also sometimes have to import functions from each other. I have placed general simple functions, that are used frequently by multiple of these modules, into a "misc" module that is imported by all of the other modules when needed.

Now, each of these modules may want to run some functions in parallel using multiprocessing (usually calling some external tools). So i have created a general "run_parallel" function that takes a list of functions and corresponding arguments as arguments, determines the priority of each and distributes the avaiable cores over them accordingly, and then runs these functions in parallel using multiprocessing and starmap().

Now i think this function could nicely be placed in the "misc" module and could just be imported when any of the other functions need to run jobs in parallel. However, if i follow the (apparently) general rule to always use the if __name__ == '__main__ statement for this, that means I can't import this function and reuse it in multiple modules. I never fully understood this requirement, but it does seem to have something to do with windows, specifically? My pipeline will work ONLY under unix.

Does that mean I MUST implement this "run_parallel" method seperately for each of my modules? Or can i just safely leave it away, if my code only is meant to run under linux/unix environments?

EDIT: I realize now i just completely misunderstood the usage of this statement in the tutorials and usage examples for multiprocessing. I thought, for some reason it was required also within any function that uses something from multiprocessing (and have always been confused about why that would be). But in these examples they were also only protecting the part of the example code that would call that function, preventing it from automatically being called on every import (not preventing than function to be importet at all, as i thought). Total misunderstanding!


Solution

  • When you run a script or import a module, python executes all of the code written at module level. In the case of a function like

    def foo():
        pass
    

    "execution" only means to assign the newly compiled function object to a variable called "foo". These things do not need to be protected by a if __name__ == "__main__": block. You only need to be concerned about code that performs an action, such as code that calls foo().

    The top level script called to start a python program is called "__main__". Modules that you import are not called "__main__" and a if __name__ == "__main__": block is pointless. What is important is that modules be import-safe. That is, it should always be safe to import a module without it doing anything beyond initialization. The actions of a module should always be inside functions or classes that are called from other places.

    The top level script is different, it has to actually run the program. if __name__ == "__main__": is used to make the top level script import safe. That doesn't matter (at least for multiprocessing) for forking systems like Unix. But Windows needs to spawn a new process and import the top level script - and that import needs to safe, it can't re-execute the program itself.

    Although you don't need this protection on Unix, modules should always be import-safe. And its a good discipline for top level scripts, too. Why limit code execution when you don't have to?

    A decent recipe for scripts is

    def main()
        do all the things
        return 0
    
    if __name__ == "__main__":
        retcode = main()
        exit(retcode)