Search code examples
pythongitsubprocessgit-log

option --decorate-refs is ignored when calling git-log from python subprocess


I am stuck at this error. I have tried to search a bunch of things, i tried following the call using debugger. I am none the wiser.

My problem:

I run this command from command line

git log --format=format:%D --simplify-by-decoration --decorate-refs=*platVer*

and i get the expected list of tags

tag: platVer/222.3.4123, tag: myplatVer-222.3.4123
tag: platVer-20.07.000
tag: platVer-20.06.000
tag: platVer-20.05.000

if I run this from python on command line, i also get the expected list

>>> from subprocess import call, Popen, PIPE
>>> pp = Popen(['git', 'log', '--decorate-refs=*platVer*', '--format=format:%D', '--simplify-by-decoration'])
tag: platVer/222.3.4123, tag: myplatVer-222.3.4123
tag: platVer-20.07.000
tag: platVer-20.06.000
tag: platVer-20.05.000

Running this line in idle or in a script the output is not captured (as expected), to enable capture of stdout, popen needs stdout parameter set to PIPE.

but if I run with stdout=PIPE, it appears to ignore the '--decorate-refs=*platVer*' and just list the entire set of refs

>>> pp = Popen(['git', 'log', '--decorate-refs=*platVer*', '--format=format:%D', '--simplify-by-decoration'], stdout=PIPE)
>>> pp.stdout.read()
b'HEAD -> feature/ps2python, origin/feature/ps2python\ntag: platVer/222.3.4123, tag: myplatVer-222.3.4123, tag: mao_test ....

I get the same when I run this from a script or in idle.

from subprocess import Popen, PIPE

pp = Popen(['git', 'log', '--decorate-refs=*platVer*', '--format=format:%D', '--simplify-by-decoration'], stdout=PIPE)
print( pp.stdout.read().decode('ascii' ) )

gives me this

HEAD -> feature/ps2python, origin/feature/ps2python
tag: platVer/222.3.4123, tag: myplatVer-222.3.4123, tag: mao_test
show-current, develop
tag: platVer-20.07.000,
... (cut the remaining many many lines of refs)

I am running on windows 10 (Version 10.0.18363.778) git version 2.29.2.windows.2 python version 3.8.5

I tried with shell=Tre/False, universal_newlines=True/False I tried it in WSL (ubuntu) All gave same result

Then I tried in a virtual ubuntu 18.LTS. with git version 2.17. And here I got the wired results, where '--decorate-refs=*platVer*' is ignored. from command line.

I then updated the git to newer version (2.29.2) on this ubuntu. And now the command work exactly as expected....

I then tried the same commands from python, same result as on the win10 machine.

Please help. Can't figure out how setting stdout=PIPE can change the behaviour of the git command.

edit: I did check that the same version of git is called with and without PIPE

Edit2:

I marked @torek 's answer as the accepted as it solves my question perfectly.

However I should have stated the goal of my use of git-log to allow for broader answers.

My goal is to find the tag that is the first tag found when traveling back in history (topological or graph ordering) and that matches a regular expression.

I was previously using rev-list, but found no documentation that this would deliver tags in the order i wanted, maybe i missed something.

The reason I use a simple glob pattern in my command when I at the same time state that I need a regex match, is that I assume the globbing to be faster, and therefore use it as a prefilter to shorten the list that needs to be parsed by the regular expression in python. I expect that the list of tags, in a few years, to contain 1000+ tags and growing. where the tags with the word 'platVer' will be around 1% of that list.


Solution

  • Add --decorate=full or --decorate=short to your git log arguments. You can also use --decorate=true or --decorate=1, but full and short are the documented value these days. Full includes the full name (e.g., refs/heads/somebranch) while short shortens to branch or tag names.

    Long (but optional) useful background info

    The default log.decorate setting is auto (since Git 2.9 anyway; before that it was no/0/false, and at various points various bugs were introduced and then fixed in later versions; it's been stable since Git 2.13). The auto setting mean short if a human is reading the output, no if a program is reading the output.1

    The decorations themselves are required (i.e., must be turned on) for --simplify-by-decoration --decorate-refs=... to work. Probably either of these options should imply --decorate=short if it's currently still auto from being unset in the Git configuration.2

    This all points to a more general problem with using git log programmatically, e.g., from Python with subprocess: git log is what Git calls a porcelain command, which means it obeys user configurations. If the user has a log.decorate setting, that overrides any defaults. Now that you know about log.decorate and the --decorate= argument, you can force correct behavior in your program using the --decorate= argument (which overrides any user configuration). But what other user-configurable items exist in git log that could break your program? What about future versions of Git, where git log might acquire new configuration items? There is nothing you can do about this more-general problem today, unfortunately, but since some things that git log does cannot be done by any of the so-called plumbing commands—these are commands that don't change behavior based on user configuration, and hence are useful from other programs as they have a known, fixed output format—git log needs an option to make it behave well. (The git status command has the --porcelain option for this; git log just needs its own version of that.)


    1Git doesn't actually know if a human is reading the output. Instead, it approximates this by examining the standard output stream: if the standard output (file descriptor 1) responds with a true value for the isatty C library call, or git log output is being fed to a pager, it assumes a human is reading the output. Use of pipes in subprocess means that stdout is not a tty, which by default disables the pager too. However, there's a user configuration setting that forces the pager to be used: see the "more general problem" paragraph.

    2In general, the way Git configurations work is this:

    • First, the program sets any automatic defaults, such as log.decorate=auto (this is typically just open-coded, rather than using the configuration mechanism).

    • Next, Git reads the system configuration file. If this has a setting such as log.decorate=short in it, that setting applies, overriding the automatic default. (This usually works through callbacks, from the configuration mechanism to the program.)

    • Next, Git reads your personal global configuration file. If this has a setting such as log.decorate=auto in it, that setting applies. If the previous configuration had a setting, this overwrites that previous setting.

    • Next, Git reads the configuration file for this particular Git repository. If this has a setting such as log.decorate=full, that setting applies, overwriting any previous setting as before.

    • Last, Git applies command-line argument settings. These therefore override any settings picked up in any of the previous steps.

    This is how, for instance, you can arrange your user.name and/or user.email to be different for one particular Git repository. You set these in your global config, which Git reads before it reads the per-repository config; then you set them to the different value in the per-repository config, and that overrides the global config.

    In relatively recent versions of Git, you can also set up a per-worktree configuration: git config --worktree. This is read after the per-repository config file, but used before command line arguments, so it has the second-highest priority. For the per-worktree setting to take effect, you must enable extensions.worktreeConfig. Be careful here as there were some bugs with this extension for a little while.