Search code examples
pythonpyinstallerarchivezlibdifference

differential update of pyinstaller executable (modify embedded PYZ-00.pyz)


I'm planning to create a huge executable directory and install it on some devices.

Imagine, that lateron I discover a bug in one of my python modules. Is there any way to transfer/copy only the modified byte code and replace the original byte code with the new one.

The reason I want to do this is, that in my context bandwidth is very expensive and I'd like to patch the code remotely.

Example: I have a project with two files: prog.py: (with following three lines)

import mod1
if __name__ == "__main__":
    mod1.hello()

mod1.py: (with following two line)

def hello():
    print("hello old world")

Now I use PYTHONHASHSEED=2 pyinstaller prog.py to create my directory which I copy to my device

Now I modify mod1.py:

def hello():
    print("hello new world")

and I recompile with PYTHONHASHSEED=2 pyinstaller prog.py The full directory has (tared and gzipped) a size of about 10M The file dist/prog/prog has a size of about 1M

with pyi-archive_viewer I can extract PYZ-00.pyz out of my executable dist/prog/prog In PYZ-00.pyz I can find and extract mod1 which uses only 133 bytes.

Now if I copy that file to my device, how could I update the old dist/prog/prog such, that it has the new PYZ-00.pyz:mod1 byte code.

What code could I use to decompose, what code could I use to reassemble after having replaced one specific file (module)?

Alternative: Move pyc files to a zip file Startup performance is not that crucial. I could also live with an alternative solution, where no PYZ file is created and added to the executable, but where the dist directory contains a zip file with all the .pyc files

Another alternative: copy .pyc files into application directory This would result in __file__ having exactly the same value as in the PYZ mode. Performance wise probably not that nice and creating a lot of files, but if incremental updates are crucial perhaps one option to handle it.


Solution

  • This solution is neither capable of 'patching' a .PYZ file nor capable of putting all .pyc files into a zip file.

    But so far it is the only viable solution I found so far, that works for huge projects with loads of third party dependencies.

    The idea is to remove all (or most files from the .PYZ file) and copy the corresponding .pyc files into the working directory.

    I will enhance and elaborate this answer over time. I'm still experimenting:

    I achieve this by modyfing the spec file:

    • determine the directory MYDIR where the spec file is located in
    • create a directory, MYDIR/src where all the files from a.pure shall be copied to
    • copy all files from a.pure to to MYDIR/src. (with subdirectories corresponding to the module's name. Module mypackage.mod.common would for example be stored in MYDIR/src/mypackage/mod/common.py)
    • iterate through files and compile them to a .pyc file and remove .py file afterwards.
    • create a PYZ file which contains only the files that are not copied. (in my test case, keep no .pyc file in the PYZ)
    • create exe with the modified PYZ
    • collect all files that should be collected plus also all files from MYDIR/src (e.g. with a.datas + Tree("src")

    Spec file Changes: At the beginning

    import os
    MYDIR = os.path.realpath(SPECPATH)
    sys.path.append(MYDIR)
    import mypyinsthelpers  # allows to reuse the code in multiple projects
    

    Then after the (unmodified) a = Analysis(... section I add.

    to_rmv_from_pyc = mypyinsthelpers.mk_copy_n_compile(a.pure, MYDIR)
    
    # modified creation of pyz`
    pyz = PYZ(a.pure - to_rmv_from_pyc, a.zipped_data,
                 cipher=block_cipher)
    

    I will detail the function mypyinsthelpers.mk_copy_n_compile further down

    Change the collect phase:

    Instead of

    coll = COLLECT(exe,
                   a.binaries,
                   a.zipfiles,
                   a.datas,
    ...
    

    I write:

    coll = COLLECT(exe,
                   a.binaries,
                   a.zipfiles,
                   a.datas + Tree("src"),
    ...
    

    And here the declaration of mypyinsthelpers.mk_copy_n_compile()

    import compileall
    import os
    import shutil
    from pathlib import Path
    
    
    def mk_copy_n_compile(toc, src_tree):
        """
        - copy source files to a destination directory
        - compile them as pyc
        - delete source
        """
        dst_base_path = os.path.join(src_tree, "src")
        to_rm = []
        # copy files to destination tree
        for entry in toc:
            modname, src, typ = entry
            assert typ == "PYMODULE"
            assert src.endswith(".py") or src.endswith(".pyw")
            # TODO: might add logic to skip some files (keep them in PYC)
            to_rm.append(entry)
    
            if src.endswith("__init__.py"):
                modname += ".__init__"
    
            m_split = modname.split(".")
            m_split[-1] += ".py"
            dst_dir = os.path.join(dst_base_path, *m_split[:-1])
            dst_path = os.path.join(dst_dir, m_split[-1])
            if not os.path.isdir(dst_dir):
                os.makedirs(dst_dir)
            print(entry[:2], dst_path)
            shutil.copy(src, dst_path)
    
        # now compile all files and rmv src
        top_tree = src_tree
        src_tree = os.path.join(src_tree, "src")
        curdir = os.getcwd()
        os.chdir(dst_base_path)
        for path in Path(dst_base_path).glob("**/*.py"):
            # TODO: might add code to keep some files as source
            compileall.compile_file(
                str(path.relative_to(dst_base_path)), quiet=1, legacy=True)
            path.unlink()
        os.chdir(curdir)
        return to_rm