Search code examples
pythonmultiprocessingpyinstallercommand-line-argumentsargv

How can I get the real argument list from a frozen sys.argv call using freeze_support?


I have a unique problem - my solution, which I wish to package using pyinstaller, JITs some things based on sys.argv at startup. When you use multiprocessing with freeze_support on Windows, multiprocessing needs to pass in different arguments to initialize the new process. The original sys.argv are eventually set when the target function is invoked. How can I get the original sys.argv before invocation of the target function?

import sys
import multiprocessing
print('ArgV:', sys.argv)


def print_argv():
    print(sys.argv)

if __name__ == '__main__':
    multiprocessing.freeze_support()
    print_argv()
    p = multiprocessing.Process(target=print_argv)
    p.start()
    p.join()

When packaged with pyinstaller and run with --hello=True, yields:

ArgV: ['scratch.exe', '--hello=True']
['scratch.exe', '--hello=True']
ArgV: ['scratch.exe', '--multiprocessing-fork', 'parent_pid=16096', 'pipe_handle=380']
['scratch.exe', '--hello=True']

I would like some magic code that gives me my original sys.argv, that is, --hello=True, when sys.argv is set to --multiprocessing-fork...


Solution

  • I've never extensively played with freezing executables, but I have several ideas...

    Taking a look at multiprocessing.spawn._main(), copying across the original sys.argv happens here:

    preparation_data = reduction.pickle.load(from_parent)
    prepare(preparation_data)
    

    If you override Process.__new__, you should be able to run code before _bootstrap (which eventually calls run on the process object), but after sys.argv is received.

    import sys
    import multiprocessing
    print('ArgV:', sys.argv)
    
    def print_argv():
        print(sys.argv)
        
    class myProcess(multiprocessing.Process):
        def __new__(cls, *args, **kwargs):
            if __name__ == "__mp_main__":
                print("hook", sys.argv)
            instance = super(myProcess, cls).__new__(cls) 
            instance.__init__(*args, **kwargs)
            return instance
    
    if __name__ == '__main__':
        multiprocessing.freeze_support()
        print_argv()
        p = myProcess(target=print_argv)
        p.start()
        p.join()
    

    Another idea is to hook the unpickle process by overriding __getstate__ and __setstate__.

    class myProcess(multiprocessing.Process):
        
        def __getstate__(self):
            return self.__dict__.copy()
        
        def __setstate__(self, state):
            print("hook", sys.argv)
            self.__dict__.update(state)
    

    Finally you could hook the audit event generated when pickle looks for a custom class to unpickle:

    class myProcess(multiprocessing.Process):
        pass
    
    def hook(event_name, args):
        if "pickle.find_class" in event_name:
            if args[1] == myProcess.__name__:
                print("hook", sys.argv)
            
    sys.addaudithook(hook)
    

    All of these occur roughly at the same time during loading of the new process, and I couldn't say which is the most robust...