Search code examples
pythonmultiprocessingpickledill

How to import script that requires __name__ == "__main__"


I'm pretty new to Python, this question probably shows that. I'm working on multiprocessing part of my script, couldn't find a definitive answer to my problem.

I'm struggling with one thing. When using multiprocessing, part of the code has to be guarded with if __name__ == "__main__". I get that, my pool is working great. But I would love to import that whole script (making it a one big function that returns an argument would be the best). And here is the problem. First, how can I import something if part of it will only run when launched from the main/source file because of that guard? Secondly, if I manage to work it out and the whole script will be in one big function, pickle can't handle that, will use of "multiprocessing on dill" or "pathos" fix it?

Thanks!


Solution

  • You are probably confused with the concept. The if __name__ == "__main__" guard in Python exists exactly in order for it to be possible for all Python files to be importable.

    Without the guard, a file, once imported, would have the same behavior as if it were the "root" program - and it would require a lot of boyler plate and inter-process comunication (like writting a "PID" file at a fixed filesystem location) to coordinate imports of the same code, including for multiprocessing.

    Just leave under the guard whatever code needs to run for the root process. Everything else you move into functions that you can call from the importing code.

    If you'd run "all" the script, even the part setting up the multiprocessing workers would run, and any simple job would create more workers exponentially until all machine resources were taken (i.e.: it would crash hard and fast, potentially taking the machine to an unresponsive state).

    So, this is a good pattern - th "dothejob" function can call all other functions you need, so you just need to import and call it, either from a master process, or from any other project importing your file as a Python module.

    
    import multiprocessing
    ...
    def dothejob():
       ...
    
    def start():
       # code to setup and start multiprocessing workers:
       # like:
       worker1 = multiprocessing.Process(target=dothejob)
       ...
       worker1.start()
       ...
       worker1.join()
    
    if __name__ == "__main__":
       start()