Search code examples
pythongithub-actionsgithub-release

Upload a GitHub release with file size above 2GB


I have a CI/CD workflow on GitHub Actions. The pipeline performs tests and builds, and it should upload a release file if a tag was created.

However, there's a problem: the size of the release.zip is 3.5GB (after the build process), and each file in GitHub Releases must be 2GB or less*l. The main reason for this large size is one file which is 3.3GB. This file must be available locally on every machine that installs the application.

Here are some details about the app:

  • It's a Python application for Windows desktop.

  • The build process uses the PyInstaller library.

  • The build process includes the problematic file to be built into the app.

  • There is no server or cloud the app is connected to.

  • The problematic file was originally about 9GB, and after compression, its size is 3.3GB.

  • The problematic file is stored with LFS.

Here is the relevant job from the workflow:

  build:
    needs: test
    runs-on: windows-latest

    steps:
      # Checkout
      - uses: actions/checkout@v4
        with:
          lfs: true

      # LFS
      - name: Install Git LFS
        run: |
          choco install git-lfs

      - name: Pull lfs files
        run: |
          git lfs pull

      # Python install
      - name: Set up Python 3.10
        uses: actions/setup-python@v3
        with:
          python-version: "3.10"

      # PyInstaller build
      - name: Build
        run: |
          cd ..
          mkdir build
          cd build
          python -m pip install --upgrade pip
          pip install pyinstaller
          python -m pip install -r ..\requirements.txt
          pyinstaller ..\prod.spec
          xcopy /E /I .\dist\gui\ ..\gui\
          cd ..\
          dir

      # Release
      - name: Extract Tag Name
        id: extract_tag_name
        run: |
          $tagName = $Env:GITHUB_REF -replace 'refs/tags/', ''
          Write-Host "::set-output name=tag_name::$tagName"

      # Zip the output folder
      - name: Zip folder
        if: steps.extract_tag_name.outputs.tag_name != null && steps.extract_tag_name.outputs.tag_name != 'refs/heads/main'
        run: |
          7z a -r release.zip ./gui/

      - name: Create Release
        if: steps.extract_tag_name.outputs.tag_name != null && steps.extract_tag_name.outputs.tag_name != 'refs/heads/main'
        id: create_release
        uses: actions/create-release@v1
        with:
          tag_name: ${{ steps.extract_tag_name.outputs.tag_name }}
          release_name: Release ${{ steps.extract_tag_name.outputs.tag_name }}
          body: Release ${{ steps.extract_tag_name.outputs.tag_name }} created automatically by GitHub Actions.
          token: ${{ secrets.ACTION_RELEASE }}
          draft: false
          prerelease: false
        env:
          GITHUB_TOKEN: ${{ secrets.ACTION_RELEASE }}

      - name: Upload Release Assets
        if: steps.extract_tag_name.outputs.tag_name != null && steps.extract_tag_name.outputs.tag_name != 'refs/heads/main'
        id: upload_asset
        uses: actions/upload-release-asset@v1
        with:
          upload_url: ${{ steps.create_release.outputs.upload_url }}
          asset_path: ./release.zip
          asset_name: release.zip
          asset_content_type: application/zip
        env:
          GITHUB_TOKEN: ${{ secrets.ACTION_RELEASE }}

Several solutions have been attempted, but none have resolved the problem:

  • GitHub LFS: Attempted to upload the release.zip to LFS during workflow runtime. However, this approach was unsuccessful as it failed to push. Additionally, there are concerns about the feasibility of pushing LFS during the workflow.

  • Splitting the zip: Tried to split the zip file into partial zip files. Unfortunately, this failed during the workflow due to the actions/upload-release-asset@v1 not supporting multiple files. Moreover, attempting to split files in this manner on my local machine resulted in corrupted files.

  • Maximum compression: Utilized the zip -mx9 option for maximum compression, but this did not alleviate the issue. The output zip file still exceeded 3GB.


Solution

  • Unfortunately, this failed during the workflow due to the actions/upload-release-asset@v1 not supporting multiple files.

    The README of actions/upload-release-asset states its no longer maintained and suggests using softprops/action-gh-release instead, which does support multiple files:

    for example to upload everything under a 'dist' directory:

    # ...
          uses: softprops/action-gh-release@v1
          with:
            files: "dist/*"
            # ... use other parameters as needed
    

    Moreover, attempting to split files in this manner on my local machine resulted in corrupted files.

    Files are just sequences of bytes. You should always be able to split up any arbitrary file into chunks of smaller files. On Linux systems, the split command can do precisely this. See one such example described here. Consider generating and publishing hashes as well to help verify integrity before and after upload.

    Here's a Python implementation of splitting and joining arbitrary binary files to and from chunks:

    import os
    from pathlib import Path
    from hashlib import sha256
    
    def split_file(infile: str | Path, outdir: str | Path | None = None, n_chunks: int = 5):
        infile = Path(infile).absolute()
        if outdir is None:
            outdir = infile.parent
        else:
            outdir = Path(outdir).absolute()
    
        outfile_pattern = f'{os.path.basename(infile)}.chunk{{}}'
    
        inhash = sha256()
        file_size = os.stat(infile).st_size
        assert file_size >= n_chunks, f'file too small ({file_size}) to chunk into {n_chunks}'
        assert n_chunks >= 2
        chunk_size = (file_size // n_chunks) or 1
        outpaths = []    
        with open(infile, 'rb') as in_f:
            for i in range(1, n_chunks + 1):
                outfile_name = outfile_pattern.format(i)
                outfile_path = outdir / outfile_name
                outpaths.append(outfile_path)
                with open(outfile_path, 'wb') as out_f:
                    if i == n_chunks:
                        chunk_contents = in_f.read()
                    else:
                        chunk_contents = in_f.read(chunk_size)
                    out_f.write(chunk_contents)
                    inhash.update(chunk_contents)
        return outpaths, inhash.hexdigest()
    
    
    def join_files(chunk_paths, outfile, expected_sha_256_digest=None):
        new_hash = sha256()
        with open(outfile, 'wb') as out:
            for fp in chunk_paths:
                with open(fp, 'rb') as f:
                    chunk_contents = f.read()
                    out.write(chunk_contents)
                    new_hash.update(chunk_contents)
        if expected_sha_256_digest is not None:
            assert expected_sha_256_digest == new_hash.hexdigest()
        return new_hash.hexdigest()
    
    # splitting usage:
    source_file = 'myfile.bin'
    chunk_files, digest = split_file(source_file)
    print(f'split {source_file} (digest: {digest}) into chunks:', *chunk_files, sep='\n\t')
    
    # joining usage:
    new_hash = join_files(chunk_files, 'myfile-again.bin', digest)
    print(f'Created myfile-again.bin with matching digest ({new_hash})')
    

    Maximum compression: Utilized the zip -mx9 option for maximum compression

    Unclear whether this will get you to the size you need, but you can try using a better compression mechanism. 7zip with specific options, zlib, or LZMA2 may get you better results than what is possible with zip, depending on the nature of your data.