Search code examples
git-svn

Get SVN URL of removed git-svn file


I would like to track a removed file as far back in history as possible, while using git-svn on a subdirectory of the SVN repository.

  1. Using git log --full-history -- path/to/removed_file.py, I can get see the history starting with the time the file was moved into the subdirectory I checked out using git-svn.
  2. I can see which SVN revision that was in the git-svn commit message postfix, so I would now like to use svn log <full_url>@revision to see the rest of the history.
  3. I know that I could use git svn info --url path/to/existing_file.py to see the required full SVN url, but what is a quick (ideally scriptable) way of getting the SVN URL of a file that is no longer in the repository?

Solution

  • To git, it doesn't matter much that a file foo/bar.py is removed in HEAD — as long as you have it in history, you can view every past version of it.

    For clarity of concreteness, I'll take this git-svn repo from the LLVM project as an example. There, the file docs/todo.rst has been deleted in svn revision 308987, git commit fb572868… and is absent in master.

    Let's first init a local clone.

    $ git clone https://github.com/llvm-mirror/lnt && cd lnt
    Cloning into 'lnt'...
    ...
    $ git svn init https://llvm.org/svn/llvm-project/lnt/trunk
    $ git update-ref refs/remotes/git-svn refs/remotes/origin/master
    $ 
    $ #-- ask svn info of anything to check setup and/or force laziness
    $ git svn info --url README.md
    Rebuilding .git/svn/refs/remotes/git-svn/.rev_map.91177308-0d34-0410-b5e6-96231b3b80d8 ...
    r154126 = 3c3062527ac17b5fac440c55a3e1510d0ab8c9d9
    r154135 = 82a95d29ac7d25c355fbd0898a44dc3e71a75fd8
    ...
    r374687 = 446f9a3b651086e87684d643705273ef78045279
    r374824 = 8c57bba3687ada10de5653ae46c537e957525bdb
    Done rebuilding .git/svn/refs/remotes/git-svn/.rev_map.91177308-0d34-0410-b5e6-96231b3b80d8
    https://llvm.org/svn/llvm-project/lnt/trunk/README.md
    

    So it gives back the README.md URL as expected. Now let's try the case of a deleted file:

    $ git svn info --url docs/todo.rst
    svn: 'docs/todo.rst' is not under version control
    

    Fails, just like you say. man git-svn says that info Does not currently support a -r/--revision argument.

    OK then, let's try emulating what it does, first by hand.

    https://llvm.org/svn/llvm-project/lnt/trunk/README.md?r=374824 — this is the URL for given file at given revision.

    Our vanished docs/todo.rst is available at https://llvm.org/svn/llvm-project/lnt/trunk/docs/todo.rst?p=308986 Notice the decrement: per git show fb572868 | grep git-svn-id, docs/todo.rst is already deleted in r308987 — so we request r308986.

    On to scripting it... rather simple job.

    git-svn-oldinfo () {
      relfname="$1"
      git log -n1 -- "$relfname" \
        | awk '/git-svn-id:/ {sub(/@/, " ", $2); print $2}' \
        | { read baseurl rev; echo "${baseurl}/${relfname}?p=$((rev-1))"; }
    }
    
    #-- test:
    $ git-svn-oldinfo docs/todo.rst
    https://llvm.org/svn/llvm-project/lnt/trunk/docs/todo.rst?p=308986
    

    Quick-n-dirty but tested — you're welcome to adjust & extend as needed.


    Edit

    Despite git log being a "porcelain" command (i.e. not really designed for scripting), it's quite possible to parse out the filenames from it too, if you're to query by globs like **/removed_file.py:

    git-svn-oldinfo-glob () {
      fileglob="$1"
      git log -n1 --stat --format=oneline -- "$fileglob" \
        | { read commit msg; \
            read fullname _remainder_dummy; \
            git cat-file -p $commit \
              | tail -n1 \
              | awk '/git-svn-id:/ {sub(/@/, " ", $2); print $2}' \
              | { read baseurl rev; echo "${baseurl}/${fullname}?p=$((rev-1))"; } \
          }
    }
    
    #-- test:
    $ git-svn-oldinfo-glob '**/todo.rst'
    https://llvm.org/svn/llvm-project/lnt/trunk/docs/todo.rst?p=308986
    

    Take it with a grain of salt: it'll probably break in hilarious ways or output garbage if the glob matches multiple files, non-removed files, files with whitespace in the name, etc.

    As always, check out man git-log and customize as needed.