Search code examples
gitgit-plumbing

How do I verify if the given argument is a syntactically legal commit-ish or a syntactically legal revision range?


Disclaimer: This question is developed from lack of knowledge of basic Git CLI concepts, therefore it is actually an XY problem.


I have a bunch of Bash scripts that are git wrappers. Most of the scripts should use commit-ish and revision ranges. I have hard time to figure out how I can verify if the given arguments meet these needs syntactically, otherwise the scripts should reject the given arguments and terminate with a non-zero exit code, for instance not to clash with git-rev-list parameters and not to be able to be injected as other commands switches.

The closest commands I've found are:

  • commit-ish:

    • git cat-file with the --batch-check option that consumes stdin and produces %(objecttype) %(objectname)-formatted output that can be verified if the object type is always commit or tag (using grep, Bash read whatever) and exit the script once it's not, but I don't feel it's an optimal solution;
    • git rev-parse with the --revs-only (and possibly --symbolic) option set: this filters out dash-prefixed arguments and that is totally fine (except it silently consumes non rev arguments and I find this behaviour not useful in all cases I need), it does not care if the given argument is only a commit or a tag, and this is fine as long as I need commit-ish to be passed to commands that accept commit-ish objects and fail if these are not.
  • revision ranges:

    • none yet; git rev-parse --revs-only --symbolic master..feature produces feature and then ^master in two lines which I find incorrect.

Is it possible to verify if the given set of arguments are legal commit-ish or revision ranges, and fail fast if any of them are illegal syntactically? If my scripts design is broken, I'd also be happy to fix it reworking my shell scripts.


Edit 1.

As suggested in the first comment, git rev-list $(git rev-parse master..feature) should work. Indeed, this is the point I missed and I was not aware of. Actually it expands to git rev-list <feature_OBJECTNAME> '^<master_OBJECTNAME>'. I was not aware of this syntax, and I always belived that it must be ..-delimited like <master_OBJECTNAME>..<feature_OBJECTNAME>. That's really nice even if it does not fail fail (I'm still not yet sure whether I need it to fail fast or to fail once rev-related args are passed right to git commands).

Now, please suppose that my script parses its arguments, shifts to the rest, and passes all varargs to the git commit using "$@":

#!/bin/bash
...
# parse the script-related args here
...
set -x # debug it
# note if the subshell $(...) is quoted, this may not work with `unknown revision or path not in the working tree`
git rev-list $(rev-parse "$@")

Example:

$ ./script master..feature
7b98875c67a78f976b6bad24a7366c23db0f0725
35c2d1f84d8fa3b72396962b181cf797672c689d
3472c8b350cc967470450f0fb6f00c3af26c8378

Really great that it works with rev-parse! Now suppose I pass more arguments to the script

# `rev-list` can process `--skip` but I don't want this option to be injected to `rev-list`
# BUT I'm totally fine with `master..feature~4`
$ ./script --skip 4 master..feature -- path1 path

fails because git rev-parse is trying to resolve 4 as an object name

fatal: ambiguous argument '4': unknown revision or path not in the working tree.

The next command

# I don't seem to be able to detect the `--` separator
# as I would like this to be detected by the underlying git command
$ ./script master..feature -- path1 path

fails with

^f0deddee7cd577969f8b5771137b8be6973e4117
path1
fatal: ambiguous argument 'path1': unknown revision or path not in the working tree.

If I add the --revs-only to the git rev-parse command, it depends on the arguments:

  • ./script --skip master..feature produces object names like <feature_OBJECTNAME>\n^<master_OBJECTNAME>, but keeps silence on --skip
  • ./script --skip 4 master..feature fails silently at all
  • ./script --skip 4 master..feature path1 path2 fails as the above
  • ./script --skip master..feature path1 path2 produces object names like there are no path1 and path2 producing no warnings for path1 and path2
  • ./script --skip master..feature -- path1 path2 produces object names like there are no path1 and path2 producing no warnings for path1 and path2 (please note the -- separator)

Maybe I confuse these things around or I'm too confused, but what I need is a(n easy) way to detect if the given arguments are legal comitt-ish or revision ranges. If paths are provided to the script, I think the scripts can detect paths if they are separated with -- if the underlying command is not allowed to work with paths.


Edit 2.

I found an interesting case. If I create a new reference like git update-ref refs/heads/--foo master, the refs/heads/--foo becomes incompatible with git rev-list because of ambiguous command options and ref names:

$ git rev-list --foo
usage: git rev-list [<options>] <commit>... [--] [<path>...]

however specifying the full ref names works fine as expected:

$ git rev-list refs/heads/--foo
<OBJECTNAME_n>
<OBJECTNAME_n-1>
<OBJECTNAME_n-2>
...
<OBJECTNAME_1>

This requires my scripts user only use reference full names for such cases and seems to break git rev-parse --revs-ohly if a reference short name looks like this (adding -- to git rev-parse makes no output at all).

Edit 2.2.

Having the refs/heads/--symbolic-full-name reference, the following command:

$ git rev-parse --symbolic-full-name master --symbolic-full-name

only produces refs/heads/master but ignores the --symbolic-full-name branch that might be expected to be converted to refs/heads/--symbolic-full-name.

The following works as expected:

$ git rev-parse --abbrev-ref refs/heads/--symbolic-full-name
--symbolic-full-name

Currently I have no idea how to distinguish legal commit-ish, legal revision ranges and git commands options.


Edit 3.

I seem to have found a solution for the commit-ish.

# git-show-ref (it accepts `--` and does not require ref full names) conjucted with git-cat-file seems to work,
# but in a suboptimal way since it requires multiple running for git-show-ref and git-cat-file
#
# this can work for object names, commits, tags, and it can fail as expected for any other non-commit-ish string like a command line option
normalize_commit_ish() {
    declare OBJECT_NAME=
    declare FULL_NAME=
    declare __FULL_NAME=
    declare OBJECT_TYPE=
    declare __=
    for NAME; do
        {
            read -r OBJECT_NAME FULL_NAME || true
            if [[ -n "$FULL_NAME" ]]; then
                read -r __ __FULL_NAME || true
# TODO this check totally ignores the rules described at:
# https://git-scm.com/docs/gitrevisions#Documentation/gitrevisions.txt-emltrefnamegtemegemmasterememheadsmasterememrefsheadsmasterem
                if [[ -n "$__FULL_NAME" ]]; then
                    echo "$0: error: ambiguous refs detected: $FULL_NAME and $__FULL_NAME" >&2
                    return 1
                fi
                printf '%s\n' "$FULL_NAME"
                continue
            fi
        } < <(git show-ref -- "$NAME")
        read -r OBJECT_NAME < <(git rev-parse "$NAME" 2> || true) || true
        if [[ -n "$OBJECT_NAME" ]]; then
            read -r TYPE < <(git cat-file -t -- "$OBJECT_NAME" || true) || true
            case "$TYPE" in
            'commit'|'tag')
                printf '%s\n' "$OBJECT_NAME"
                continue
                ;;
            'tree'|'blob')
                echo "$0: error: $NAME must be commit-ish but was $TYPE" >&2
                return 1
                ;;
            '')
                # nothing resolved, go to the next step (currently the error dead-end)
                ;;
            *)
                echo "$0: error: $NAME is of unknown type $TYPE" >&2
                return 1
                ;;
            esac
        fi
        echo "$0: error: cannot resolve commit-ish for $NAME" >&2
        return 1
    done
}

An example run:

COMMIT_ISH=($(normalize_commit_ish 'tag' '@' 'master~1' 'refs/heads/--symbolic-full-name' 'heads/--symbolic-full-name' '--foo' 'master' '--symbolic-full-name')) && printf '%s\n' ${#COMMIT_ISH[@]} "${COMMIT_ISH[@]}"

outputs:

refs/tags/tag
14cd17e4539f6881abeb7629f249fac12fb197ac
a74945dee0f61aaedf2b43c9a43f55600f118706
refs/heads/--symbolic-full-name
refs/heads/--symbolic-full-name
refs/heads/--foo
refs/heads/master
refs/heads/--symbolic-full-name

Also this function can detect invalid ref names and objects:

normalize_commit_ish '--this-is-other-git-command-option-it-may-clash-with' || true
normalize_commit_ish 'this-ref-does-not-exist' || true

Two commands above produce:

fatal: Not a valid object name --this-is-other-git-command-option-it-may-clash-with
./script: error: cannot resolve commit-ish for --this-is-other-git-command-option-it-may-clash-with
fatal: Not a valid object name this-ref-does-not-exist
./script: error: cannot resolve commit-ish for this-ref-does-not-exist

as expected.

Still looking for a normalize_revision_ranges implementation.


Solution

  • I turns out that I don't need any of these checks: neither commit-ish syntax verification, nor revision range syntax verification. Both are just useless in scripting since I can pass all "weird" commit-ish names like --symbolic-full-name (an abbreviated name for refs/heads/--symbolic-full-name in the experiments above) right to git commands and let git do the job itself.

    The key I came across accidentally, described in the gitcli document, is the --end-of-options command line switch that makes git accept "weird" commit-ish names as the switch is essentially a delimiter between the command options and a commit-ish, or a revision range, or a path, or whatever that might clash with the options:

    $ git update-ref refs/heads/--symbolic-full-name master
    
    # an example command that cannot distinguish the branch name and an invalid option
    $ git rev-list --symbolic-full-name
    # ... git-rev-list help goes here because of the error ...
    
    # an example command that accepts a single commit-ish I don't need "syntax check" for anymore
    $ git rev-list --end-of-options --symbolic-full-name
    # ... git-rev-list object names go here ...
    
    # an example command that accepts a revision range I don't need the "syntax check" for anymore too
    $ git rev-list --end-of-options --symbolic-full-name~2..--symbolic-full-name
    # object name 2
    # object name 1
    
    $ git update-ref -d refs/heads/--symbolic-full-name
    

    I spent too much time for wrong way research and assumptions, but I'm happy that it turned out it is so easy to use, and I hope it will cover all my needs and won't bring surprises.