Search code examples
gitversion-controlgit-lfsbfg-repo-cleaner

git lfs bfg: after that, resolve conflicts how?


We have a repository in which we committed PDF snapshots of reports. I want to try out git lfs, see if it improves the quality of life.

I followed the procedures here (https://github.com/rtyley/bfg-repo-cleaner/releases) to use BFG to clean out the old binaries and transition to lfs. I wound my way through a couple of wrinkles related to the usage of Gitlab server for the repository, but in the end I believe this went well.

I'm writing to show what we did and ask a question about cleaning up merge conflicts at the very end.

I'll show you the transcript. We check out a "--mirror" clone (a bare repo) and BFG does its work on that, then we push it back after fiddling about:

guides-to-lfs$ git clone --mirror [email protected]:crmda/guides.git
Cloning into bare repository 'guides.git'...
X11 forwarding request failed on channel 0
remote: Counting objects: 865, done.
remote: Compressing objects: 100% (527/527), done.
remote: Total 865 (delta 318), reused 834 (delta 303)
Receiving objects: 100% (865/865), 151.75 MiB | 25.74 MiB/s, done.
Resolving deltas: 100% (318/318), done.
Checking connectivity... done.

guides-to-lfs$ cd guides.git/

guides.git$ java -jar ~/bin/bfg-1.12.13.jar --convert-to-git-lfs '*.{pdf,ogv,tar.gz,zip}' --no-blob-protection

Using repo : /home/pauljohn/GIT/CRMDA/guides-to-lfs/guides.git

Found 0 objects to protect
Found 3 commit-pointing refs : HEAD, refs/heads/master, refs/tmp/fd782dd8787a3ffb673455d1eafb9869/head

Protected commits
-----------------

You're not protecting any commits, which means the BFG will modify the contents of even *current* commits.

This isn't recommended - ideally, if your current commits are dirty, you should fix up your working copy and commit that, check that your build still works, and only then run the BFG to clean up your history.

Cleaning
--------

Found 124 commits
Cleaning commits:       100% (124/124)
Cleaning commits completed in 1,933 ms.

Updating 2 Refs
---------------

    Ref                                              Before     After   
    --------------------------------------------------------------------
    refs/heads/master                              | e3327ef1 | e4ac76a2
    refs/tmp/fd782dd8787a3ffb673455d1eafb9869/head | 74ccc454 | 6639b246

Updating references:    100% (2/2)
...Ref update completed in 19 ms.

Commit Tree-Dirt History
------------------------

    Earliest                                              Latest
    |                                                          |
    .......DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD

    D = dirty commits (file tree fixed)
    m = modified commits (commit message or parents changed)
    . = clean commits (no changes to file tree)

                            Before     After   
    -------------------------------------------
    First modified commit | cdd8f486 | 5e6b64eb
    Last dirty commit     | e3327ef1 | e4ac76a2

Changed files
-------------

    Filename                                               Before & After                                               
    --------------------------------------------------------------------------------------------------------------------
    01.LISREL.Syntax.pdf                                 | 71a17dcc ⇒ 7f217f4d                                          
    02.ReadingDataIntoLISREL.pdf                         | c05c3fe6 ⇒ e7238e11                                          
    03.InterpretingLISRELOutput.pdf                      | 6ef054c8 ⇒ a2a63813                                          
    04.StartingValuesInLISREL.pdf                        | 335d7a09 ⇒ c86439ee, 9f6fc232 ⇒ 05182a86                     
    05.WhatToReport.pdf                                  | 2bee7a8d ⇒ 1106d2f4, 3d30b103 ⇒ ce27382c                     
    06.Satorra-BentlerChi-Sq.pdf                         | 94ec6fd2 ⇒ b81d08b4, 7cd29d48 ⇒ 704d5f30                     
    ...

In total, 375 object ids were changed. Full details are logged here:

guides.git.bfg-report/2016-10-05/14-03-18

BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive

guides.git$ git reflog expire --expire=now --all

guides.git$ git gc --prune=now

In case you try this, you should be ready for some trouble pushing back into the repo. One issue is that Gitlab before 8.12 did not integrate password management between the SSH transfers for git and the HTTPS transfers for git lfs. Another problem is Gitlab project "protection", which you may have seen if you use Gitlab. I saw this the first time I pushed:

guides.git$ git push
X11 forwarding request failed on channel 0
Git LFS: (0 of 105 files) 0 B / 140.38MB                          
http: Post https://gitlab.kucenter.edu/crmda/guides.git/info/lfs/objects/batch: x509: certificate signed by unknown authority
http: Post https://gitlab.kucenter.edu/crmda/guides.git/info/lfs/objects/batch: x509: certificate signed by unknown authority
error: failed to push some refs to '[email protected]:crmda/guides.git'

We made several changes to get around the problem. We needed the absolutely newest version of Gitlab (8.12.4). I needed to tell Git to ignore the out-of-date-certificates. On the Gitlab server, the project had to be "unprotected" so that developers could push. I don't understand why that was necessary because I'm the owner and I can push regular git changes, but apparently the lfs integration is different. After that fussing about, we have success pushing back to repository:

guides.git$ GIT_SSL_NO_VERIFY=true git push
X11 forwarding request failed on channel 0
Git LFS: (0 of 0 files, 105 skipped) 0 B / 0 B, 140.38 MB skipped                                                                   Counting objects: 866, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (520/520), done.
Writing objects: 100% (866/866), 32.94 MiB | 26.41 MiB/s, done.
Total 866 (delta 311), reused 866 (delta 311)
To [email protected]:crmda/guides.git
 + e3327ef...e4ac76a master -> master (forced update)
 + 74ccc45...6639b24 refs/tmp/fd782dd8787a3ffb673455d1eafb9869/head -> refs/tmp/fd782dd8787a3ffb673455d1eafb9869/head (forced update)

Success!

Then I went back to the working directory of this repository, the one that had the PDF files saved inside it, and tried a git pull. I see a lot of merge conflicts that I'll have to address:

guides$ git pull
X11 forwarding request failed on channel 0
remote: Counting objects: 792, done.
remote: Compressing objects: 100% (491/491), done.
remote: Total 792 (delta 294), reused 791 (delta 293)
Receiving objects: 100% (792/792), 32.92 MiB | 54.09 MiB/s, done.
Resolving deltas: 100% (294/294), done.
From gitlab.kucenter.edu:crmda/guides
 + e3327ef...e4ac76a master     -> origin/master  (forced update)
warning: Cannot merge binary files: keyword_guide/guide_keywords.pdf (HEAD vs. e4ac76a2561fd4dc3ca52971e8ee3d5cbe930a0c)
warning: Cannot merge binary files: Spanish_KUant_Guides/PDFs/9._opcion_RP_en_LISREL.pdf (HEAD vs. e4ac76a2561fd4dc3ca52971e8ee3d5cbe930a0c)
warning: Cannot merge binary files: Spanish_KUant_Guides/PDFs/8._Imputacion_de_datos.pdf (HEAD vs. e4ac76a2561fd4dc3ca52971e8ee3d5cbe930a0c)
warning: Cannot merge binary files: Spanish_KUant_Guides/PDFs/7._bootstrap.pdf (HEAD vs. e4ac76a2561fd4dc3ca52971e8ee3d5cbe930a0c)
warning: Cannot merge binary files: Spanish_KUant_Guides/PDFs
[...snip out hundreds of those ...]
Automatic merge failed; fix conflicts and then commit the result.

I think I'll probably just make a clean clone of the remote and go on from there. The instructions I find on the Internet don't help too much with that, they are mostly about getting started with lfs, not so much about dealing with on-going lfs and clones of lfs. I worry a little bit about what would happen if somebody cloned this thing and they did not have lfs. Oh, well, we'll see.

Here's my question. If I did want to deal with all of those binary conflicts, what would I do? If I simply want to accept all of the changes from the server, it appears I just need to run this over and over again, once for each conflicted "fn.pdf".

$ git checkout --theirs -- fn.pdf
$ git add fn.pdf

Doing that over and over seems tedious, but I suppose I can do it.

I also found advice in here (Resolving a Git conflict with binary files) to try

$ git mergetool

but I can't tell for sure how to interact with it. The diff thing launches an gvim frame with 3 columns of buffers, but I have not successfully navigated it. It appears to me that's landing me in editor hell.


Solution

  • I think I'll probably just make a clean clone of the remote and go on from there.

    Right, this is the most important step after using any tool, be it BFG, filter-branch etc. which rewrites history (and usually in doing-so is removing unwanted files referenced in that history). BFG home page says:

    At this point, you're ready for everyone to ditch their old copies of the repo and do fresh clones of the nice, new pristine data. It's best to delete all old clones, as they'll have dirty history that you don't want to risk pushing back into your newly cleaned repo.

    The new/rewritten history will be from some-point in history forwards (the point of the first change in the rewrite) completely divergent from the old history as far as Git is concerned, because all commit-hashes from that point forward will change. The only sane way to proceed is for all developers to retire their current clones of the old history and obtain new clones. In theory you could update these, but it would require a lot of care with not much value.

    Removing old clones reduces the chance of someone pushing references to the old history, thereby reintroducing the old history and the unwanted files it contained.