Search code examples
pythongitworkflow

Track manual changes to autogenerated files while allowing for automatic updates


I want to allow manual changes to automatically generated files (from templates), while still allowing for updates to the templates or to data. I envision to use git to track the human-made and code-generated portions of the generated files and merge them intelligently.

Somewhat along the lines of

  • human writes templates
  • run code to generate files from templates and fill in data
  • human applies changes to the file
  • code collects new data and generates a new file from template
  • code merges human-made changes into the new file

What I want is something like often done when handling embedded code where the IDE autogenerates code and for example fills register addresses, but still allows manual changes to said code. However, they usually restrict changes only to small segments - which is a restriction that I cannot allow.

I think that it can be done with separate branches and a suitable merge/rebase strategy, but when thinking about it I identify many difficult corner cases. Therefore I am looking for references where something similar has been done already or a more detailed strategy that avoids at least the following issues:

  • if the templates live in the same git repository as the generated files, just checking out a "autogenerated" branch does not track the template changes
  • after the templates or data was changed, all human-made changes have to be applied (via rebase?) onto the new generated files. I am looking for a good merge strategy that avoids conflicts as much as possible
  • the newly merged file has to be pushed back onto the "work" branch, if possible without rewriting history

I can separate the templates from the autogenerated files into different directories, but that does not fix the checkout problem.

Ideally I'd want to write some python code that handles the complete workflow: git checkouts, template generation, merges and user feedback with a single command.


Solution

  • I see two ways of handling this.

    1

    Either you rebase your manual changes on top of "upstream" changes from time to time when applicable in a similar way as making temporary commits, only that in your case the distinction is not temporary but manual (say with a "Manual: " prefix).

    2

    The other way to do it if you want to keep the manual changes kept in the history over time as normal commits is do the same1 as when you have your /etc directory under version control (e.g. using Etckeeper) and some upstream package update either creates a *.rpmnew or *.rpmsave file.

    So to pull in a rpmnew update of say /etc/hosts for instance, I need to figure out what version the update is relative to, e.g. distinguishing between the automatic upstream changes and my local changes. Running git log -p hosts might show some custom "10.0.0.50 www.example.com" entries I have added until the newest upstream change shows up:

    commit 769700d4ea13c89959333eeda66574507d5a3237
    Author: Mr Root <root@localhost>
    Date:   Fri May 27 17:34:04 2022 +0200
    
        committing changes in /etc made by "-bash"
        
        Package changes:
        ...
        -0:setup-2.13.9.1-2.fc35.noarch
        +0:setup-2.13.10-1.fc35.noarch
        ...
    
    diff --git a/hosts b/hosts
    index 849c10d..740a59a 100644
    --- a/hosts
    +++ b/hosts
    @@ -1,2 +1,7 @@
    +# Loopback entries; do not change.
    +# For historical reasons, localhost precedes localhost.localdomain:
     127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
     ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    +# See hosts(5) for proper format and other examples:
    +# 192.168.1.10 foo.mydomain.org foo
    +# 192.168.1.13 bar.mydomain.org bar
    

    Thus I know that the new hosts file update should be relative to commit 769700d4. Therefore I

    • create a branch from that commit
    • apply the npmnew update on that branch
    • merge that branch back to main (and resolve conflicts if any)
    cd /root/etc.worktree
    git checkout main.worktree
    git merge --ff main
    git branch rpmnew/hosts 769700d4
    git checkout rpmnew/hosts
    cp /etc/hosts.rpmnew hosts
    git add hosts 
    git commit -m /etc/hosts.rpmnew
    git checkout main.worktree
    git merge rpmnew/hosts   # Resolve any conflicts using KDiff3, https://github.com/hlovdal/git-resolve-conflict-using-kdiff3
    cd /etc
    git checkout main
    git merge main.worktree
    rm hosts.rpmnew
    

    The next time there is a /etc/hosts.rpmnew file created you of course skip the branch creation and just re-use the existing branch.

    The above example is a bit more complicated being /ect where you do not want to disturb the main worktree while updating, but I include the full details since it should not be too hard to grasp.


    1 Specifically for /etc you want to avoid checking out older versions because that could make programs misbehave when the content of /etc changes. So for that scenario you definitely want to do the rpmnew/rpmsave recovery in a separate worktree and then only merge in the result into the main /etc directory at the end. This should not be an issue for your average source code repository.