Only use gitlab light-weight tags also if there is a annotated tag - GitPython

Is there any possibillity to say that the light-weight tag will always be preferred?

My problem: I use the git tag to get the hash of the tag. With this hash I will add a file into my database. The problem is, that I always have to be sure I save the hash of the light-weight tag. If someone adds a message awhile creating a tag, it will be an annotated tag. so I want to change my git.describe function, that it only returns me the light-weight tag.

I think that it's possible but I can't get any example for that. the only thing I read is the documentation and using ref/tags. But I don't know how.

What I used GitPython for that: https://gitpython.readthedocs.io/en/stable/tutorial.html
right now I do it like this and now the annotated tag is preferred automatically:

repo_dir = "example-git-repo-url.com"

repo = git.Repo(repo_dir)

tag_name = repo.git.describe(["--tags", "--abbrev=0", "--first-parent"])
tag_hash = repo.git.rev_parse(["--short=8", tag_name])
logger.info("Latest tag: %s (%s)", tag_name, tag_hash)

I don't get this part of the documentation:

--tags

Instead of using only the annotated tags, use any tag found in refs/tags namespace. This option enables matching a lightweight (non-annotated) tag.

what does it mean ti use any tag found in refs/tags? how?

Solution

You're using a Python wrapper that either invokes Git, or re-implements Git (or both—some of these wrappers can be told whether to use their internal implementation, or to invoke Git using subprocess). If your particular wrapper implements its own git describe, it might have a way to do what you want.

If you're using Git's own built-in git describe, there is no direct way to get what you want. The --exclude option might let you get close enough, though. See the long section for thoughts on how you could use this from your Python wrapper.

The basic problem is this: git describe tries to find an annotated tag, by default. The option --tags simply adds the ability to use lightweight-only tags. It never removes the ability to use annotated tags.

Long

I don't get this part of the documentation:

--tags

Instead of using only the annotated tags, use any tag found in refs/tags namespace. This option enables matching a lightweight (non-annotated) tag.

what does it mean to use any tag found in refs/tags?

Ciro Santelli linked to How can I list all lightweight tags? This question has my answer about how to enumerate all lightweight tags, using git for-each-ref. The background missing here (and in the git tag documentation from which the above is quoted) is that in Git, all names—branch names, tag names, and so on—are specific forms of refs or references. These references live in namespaces.

See the linked Wikipedia article for a more complete definition and set of examples, but one common name-space example in the real world has to do with humans and "given names" vs "surnames". If you find yourself at a party with too many people named Bruce, you can usually use a full name or a signifying letter ("Bruce A", "Bruce J"), and so on to tell them apart.

The same idea works in Git: if you have a branch named xyz and a tag named xyz, you can use a full name, or fuller name, to spell out which one you mean: refs/heads/xyz is the branch, and refs/tags/xyz is the tag. All tags live under refs/tags/. We just normally leave out the refs/tags/ part and say "tag xyz".

What I note in my answer is that an annotated tag atag actually consists of two parts: a lightweight tag named atag, and an internal Git object of type annotated tag. The names of internal Git objects are hash IDs. That is, Git finds this annotated tag object via its hash ID:

The lightweight tag portion of the pair is a Git ref or reference, and
all Git refs hold one hash ID.

So the lightweight tag refs/tags/atag holds the hash ID of the annotated tag object. The annotated tag object in turn holds, as part of its data, the hash ID of the target of the tag, which is normally a commit. We can see all of this by looking at a tag in the Git repository for Git itself, e.g., v2.30.0:

$ git rev-parse v2.30.0
2d9685d47a7e516281aa093bf0cddc8aafa72448
$ git cat-file -p 2d9685d47a7e516281aa093bf0cddc8aafa72448 | sed 's/@/ /'
object 71ca53e8125e36efbda17293c50027d31681a41f
type commit
tag v2.30.0
tagger Junio C Hamano <gitster pobox.com> 1609110954 -0800

Git 2.30
-----BEGIN PGP SIGNATURE-----
[snipped]

The first hash ID, 2d9685d47a7e516281aa093bf0cddc8aafa72448, is that of the annotated tag object. The git cat-file -p command prints out that annotated tag object's content, which begins with an object line, then a type line, then a tag line, and so on. The main purpose of this particular annotated tag is to carry the PGP signature (snipped here).

The object line holds the hash ID of the target of this particular annotated tag: 71ca53e8125e36efbda17293c50027d31681a41f, which is a commit object. Had v2.30.0 been a lightweight tag, rather than an annotated tag, the name refs/tags/v2.30.0 would contain 71ca53e8125e36efbda17293c50027d31681a41f directly. But instead, refs/tags/v2.30.0 refers to the annotated tag object 2d9685d47a7e516281aa093bf0cddc8aafa72448, so we call v2.30.0 an annotated tag. It's built out of this pair: the name referring to the first object, and the first object referring to another object.

With that in mind, consider `git describe`

For git describe to do its default job, it must enumerate the annotated tags. This means looking at every refs/tags/* name. Each such name is either a lightweight tag, referring directly to some object that is not an annotated tag object, or an annotated tag because it refers to an annotated tag object that refers to some target object. That's a bit of a mouthful—or a head-ful—so go over it a few times if needed.

Since git describe wants to look only at the annotated tags, what it does is look at every tag, then throw away those tags that refer directly to something that's not an annotated tag. That means git describe now has a table of just the annotated tags. It can then go on to do its work and figure out which of these tag names, if any, is suitable.

The --tags flag just tells git tag: Don't throw out the lightweight-only tags. That leaves it looking at all tags in the refs/tags namespace, i.e., all tags.

Using `--exclude`

The --exclude option is intended to toss out some kind of tag-name-pattern. For instance, suppose some repository author keeps their tags organized into "releases", "early", and "experiments": tag releases/1.0 is version 1.0, while early/1.0-alpha and early/1.0-beta are the alpha and beta release versions of version 1.0. Meanwhile experiments/featureX might have some feature that is being experimented on, that might or might not go into release 2.0.

You might be interested only in the various experiments, not counting early releases. In this case, you could exclude all releases/* and early/* with:

git describe --exclude 'releases/*' --exclude 'early/*'

(the quotes here are to prevent the * from being eaten by your shell; this may be unnecessary under various conditions, but rarely hurts).

While the exclude option takes a "glob pattern", any actual tag is a valid glob pattern. It's just that it only matches that one tag.¹ So, in your Python program, you could:

Find all tag names, using whatever facilities you have available to enumerate all names in the refs/tags/ space. Note that the trailing slash is not required when using git for-each-ref, as it adds '/*' itself, but might be required by whatever API you are using. Check the documentation.²
For each tag that is an annotated tag, add --exclude and the tag.

Then use git describe --tags, with these --excludes, to come up with a descriptive tag, if there is any. By excluding all the annotated tags, you'll get only a lightweight tag.

¹If a the tag's name includes the special glob characters, i.e., *, ?, and [, we might get in trouble here. One could:

hope that no tags use those characters, or
check and quote them if needed.

But see also the git check-ref-format documentation, which explicitly forbids such characters. I've seen questions on StackOverflow implying that some branch name contains forbidden characters, so they do actually come up in real cases, probably from non-Git software writing to files it should not. It then becomes up to the programmer to decide whether to check for these, and if found, what to do about them.

²This often results in discovering that the documentation is incomplete. At this point you can check the implementation—read the source, in other words—and/or experiment to discover the actual behavior. However, this often indicates that whoever wrote the API did not think carefully about edge cases, and that different versions of the program or library may behave differently.

Only use gitlab light-weight tags also if there is a annotated tag - GitPython

Long

With that in mind, consider git describe

Using --exclude

With that in mind, consider `git describe`

Using `--exclude`