Search code examples
github-actionscicdartifact

Compare build artifacts from two different commits via github actions


I've got a workflow in github actions that automatically creates build artifacts and updates a single release with these new build artifacts every time I merge a PR into main (here's the repo).

I want to know if a new PR will cause a change in the build artifacts (specifically, there's just one CSV file that I care about). Sometimes these changes will be intentional, sometimes not, so I want something like a git diff between the CSV file before the PR and the CSV file after the PR.

I know I could setup a github action to:

  1. checkout the old version of the code.
  2. Run the code to generate the build artifacts
  3. save the files of interest to disc
  4. checkout the proposed version of the code from the PR
  5. Run the PR code to generate the build artifacts
  6. git diff the version before the PR to the version after the PR.
  7. Format and write the git diff output as a comment to the PR, letting me know about what changes there were so I can check that everything's ok manually.

But this seems like a really common problem and I can't believe there's not a simple tool/solution out there already? Maybe some github action where you give it two SHAs, a command to run, and a list of files to git diff.

To be clear, these are build artifacts, so aren't tracked by git, and so solutions like git diff pullrequest main -- myfile.csv won't work.


Solution

  • Here is a solution that leverages git notes:

    enter image description here

    (In a nutshell, git notes allow you to CRUD metadata to a commit without touching the commit itself — and thus preserving history. Cf. § References below.)

    Essentially, we want our workflow to:

    1. Build the artefacts
      We emulate this by running make build — to be adapted to your own scenario. For the sake of the example, we also assume that the build/ directory contains all and only the artefacts generated.
    2. “Remember” the artefacts and their content (a so-called “artefacts summary”)
      We use the sha512sum shell command to create a mapping of artefacts' content (represented through their SHA sum) to their file name.
      We retrieve all artefacts via find results/ -type f, and then convert the mapping to a CSV with headers using sed 's/ /,/' | cat <(echo 'sha512,file_name') -
    3. Attach the artefacts summary to the commit
      That's where we leverage git notes, which allows us to add metadata to the commit ex-post, without modifying the history.

    These steps should be executed for any commit on your main branch.

    In case of a PR, you also want to repeat these steps on the branch's HEAD, plus:

    1. Retrieve the artefacts summary of your PR's target branch
      So you now have two artefacts summaries to compare: base (your main/master branch's one) and head (the branch of your PR). In the example below, the base is hard coded to main, but you could refine this by letting the workflow retrieve the target branch's name automatically.
    2. Compare both artefacts summaries
      I've created the artefactscomparison Python package for that purpose. (Note: it's very much tailored to my use case and desiderata.)
    3. Add the artefact comparison report to your PR
      Beebop, a bot will do that for you.

    In the end, you should see something like on the screenshot above.

    name: Artefacts Comparison
    
    on:
      push:
        branches:
          - main
    
      pull_request:
        branches:
    
    permissions: write-all
    
    jobs:
      build_artefacts:
        runs-on: ubuntu-latest
        steps:
          - name: Checkout
            uses: actions/checkout@v3
            with:
              fetch-depth: 0
              token: ${{ github.token }}
          - name: Build artefacts
            run: make build
          - name: Generate artefacts summary
            id: artefacts-summary
            run: |
              echo "ARTEFACTS_SUMMARY<<EOF" >> $GITHUB_OUTPUT
              find build/ -type f -exec sha512sum {} \; | sed 's/  /,/' | cat <(echo 'sha512,file_name') - >> $GITHUB_OUTPUT
              echo "EOF" >> $GITHUB_OUTPUT
          - name: Add the artefacts summary as a git notes
            run: |
              git fetch origin refs/notes/*:refs/notes/*
              git config user.name "github-actions"
              git config user.email "[email protected]"
              git notes add -m "${{ steps.artefacts-summary.outputs.ARTEFACTS_SUMMARY }}"
              git notes show
              git push origin refs/notes/*
      # In case of PR, add report of artefacts comparison
      compare_artefacts:
        runs-on: ubuntu-latest
        if: ${{ github.event_name == 'pull_request' }}
        steps:
          - name: Checkout
            uses: actions/checkout@v3
            with:
              fetch-depth: 0
              token: ${{ github.token }}
          - name: Pull artefacts summaries (i.e., git notes) from upstream
            run: |
              git fetch origin refs/notes/*:refs/notes/*
          - name: Retrieve PR's head branch's artefacts summary
            id: artefact-summary-head
            run: |
              echo "ARTEFACTS_SUMMARY<<EOF" >> $GITHUB_OUTPUT
              git notes show >> $GITHUB_OUTPUT
              echo "EOF" >> $GITHUB_OUTPUT
          - name: Retrieve PR's target branch's artefacts summary
            id: artefact-summary-base
            run: |
              git checkout ${{ github.base_ref }}
              echo "ARTEFACTS_SUMMARY<<EOF" >> $GITHUB_OUTPUT
              git notes show >> $GITHUB_OUTPUT
              echo "EOF" >> $GITHUB_OUTPUT
          - name: Setup Python
            uses: actions/setup-python@v4
            with:
              python-version: "3.10"
          - name: Install artefactscomparison package
            run: pip install -U artefactscomparison
          - name: Generate artefact comparison report
            id: artefact-comparison-report
            run: |
              echo "${{ steps.artefact-summary-head.outputs.ARTEFACTS_SUMMARY }}" > head.csv
              echo "${{ steps.artefact-summary-base.outputs.ARTEFACTS_SUMMARY }}" > base.csv
              echo "ARTEFACTS_REPORT<<EOF" >> $GITHUB_OUTPUT
              artefacts_comparison -b base.csv -h head.csv >> $GITHUB_OUTPUT
              echo "EOF" >> $GITHUB_OUTPUT
          - name: Comment PR with artefact comparison report
            uses: thollander/actions-comment-pull-request@v2
            with:
              message: ${{ steps.artefact-comparison-report.outputs.ARTEFACTS_REPORT }}
              comment_tag: artefact_comparison_report
              mode: recreate
        needs: build_artefacts
    

    References: