Search code examples
bashgitgit-rebase

How to automate git history squash by date?


I've a git repository that I use as folder sync system: any time I change something in a file in the laptop, pc or mobile the changes are automatically committed. No branches, single user.

This leads to plenty of commits, like 50 per day. I would like to write a bash cron script to automate the history squashing, having a single commit per day, no matters about the comments but preserving the date.

I tried git-rebase -i SHA~count, but I can't figure out how to automate the process, i.e. pick the first commit and squashing the other count commits.

Any suggestions?

I've no problem about writing the bash that find the first SHA of the date and the counts the commits to merge, some loop over this would do the trick:

git log --reverse|grep -E -A3 ^commit| \
  grep -E -v 'Merge|Author:|--|^$'|paste - -| \
  perl -pe 's/commit (\w+)\s+Date:\s+\w+\s+(\w+)\s+(\d+).+/\2_\3 \1/'

Solution

  • I share the resulsts based on Alderath suggstions: I've used git filter-branch to parse the history and keep just the last commit of the day. A first loop on git log will write the commit timestamps that needs to be preserved (the last in the day) in a temporary file; then with git filter-branch I keep only the commit with the timestamp present in the file.

    #!/bin/bash
    
    # extracts the timestamps of the commits to keep (the last of the day)
    export TOKEEP=`mktemp`
    DATE=
    for time in `git log --date=raw --pretty=format:%cd|cut -d\  -f1` ; do
       CDATE=`date -d @$time +%Y%m%d`
       if [ "$DATE" != "$CDATE" ] ; then
           echo @$time >> $TOKEEP
           DATE=$CDATE
       fi
    done
    
    # scan the repository keeping only selected commits
    git filter-branch -f --commit-filter '
        if grep -q ${GIT_COMMITTER_DATE% *} $TOKEEP ; then
            git commit-tree "$@"
        else
            skip_commit "$@"
        fi' HEAD
    rm -f $TOKEEP