Why is Mercurial matching a nonexistent local revision number?

Quick intro: In Mercurial there are two different ways to numerically refer to a changeset.

First, there's the node ID hash. It is global and functions like a git commit hash. It consists of 40 hexadecimal digits.

Second, there's the local revision number. It is a decimal number that starts at 0 and counts up. Unlike the node hash, this is local, meaning the same changeset can have different local revision numbers in two different repos. This depends on what other changesets are present in each repo and depends even on the order each repo received their changesets.

A revision can be specified numerically to Mercurial as a local revision number, a full 40-digit hash, or "a short-form identifier". The latter gives a unique prefix of a hash; that is, if only one full hash starts with the given string then the string matches that changeset.

I found that in certain cases, Mercurial commands (such as hg log with an -r switch), given plain decimal numbers, will match some revision even though there aren't enough local revisions for the given number to match as a local revision number.

Here's an example I constructed after coming across such a case by chance:

test$ hg --version
Mercurial Distributed SCM (version 6.1)
(see https://mercurial-scm.org for more information)

Copyright (C) 2005-2022 Olivia Mackall and others
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
test$ hg init
test$ touch a
test$ hg add a
test$ hg ci -d "1970-01-01 00:00:00 +0000" -u testuser -m a
test$ touch b
test$ hg add b
test$ hg ci -d "1970-01-01 00:00:00 +0000" -u testuser -m b
test$ hg log
changeset:   1:952880b76ae5
tag:         tip
user:        testuser
date:        Thu Jan 01 00:00:00 1970 +0000
summary:     b

changeset:   0:d61f66df66f9
user:        testuser
date:        Thu Jan 01 00:00:00 1970 +0000
summary:     a

test$ hg log -r 2
abort: unknown revision '2'
test$ hg log -r 9
changeset:   1:952880b76ae5
tag:         tip
user:        testuser
date:        Thu Jan 01 00:00:00 1970 +0000
summary:     b

test$

As is evident, hg log -r 9 matches a changeset even though there aren't that many changesets to match the 9 as a local revision number.

The question: Why is this? Additionally, how can we avoid matching a nonexistent local revision number?

Solution

This is due to how Mercurial parses revision specifiers. Here's how Olivia Mackall explains it in a mail from 2014:

Here is a hexadecimal identifier:

60912eb2667de45415eff601bfc045ae0fe8db42

See how it starts with 6? If you ask for revision 6, Mercurial will:

a) look for revision 6

b) if that fails, look for a hex identifier starting with "6"

c) if we find more than one match, complain

d) if we find no matches, complain

e) we found one match: success!

That is, if hg log -r 9 doesn't match any local revision number (because there are less than ten changesets in the repo), Mercurial next will match a node hash that happens to start with a 9.

To avoid this ambiguity, she responded that one should use hg log -r 'rev(9)' to match only local revision numbers, and hg log -r 'id(9)' to match only prefixes or full hashes.

In the documentation on revsets, these predicates are listed as:

"id(string)"
Revision non-ambiguously specified by the given hex string prefix.

And:

"rev(number)"
Revision with the given numeric identifier.

Unfortunately, both this page and the help page on revisions do not (as of version 6.1) explicitly point out the ambiguity between numbers that can match either as local revision numbers or node hash prefixes. The 2014 mailing list thread I quoted does contain suggestions to clarify this but it appears nothing came off it.

Additionally, here is a changeset message in which I explained the entire affair and how it came to affect the operation of a script of mine:

fix to use 'rev(x)' instead of just x to refer to local rev number

The revsets syntax to unambiguously refer to a local revision number is to wrap the number in rev(). Without this, a number that doesn't exist (eg -r 2) may be misinterpreted to refer to a changeset that has a node hash starting with the requested number.

In our case this bug happened to act up after the revision on at 2022-04-02 16:10:42 2022 Z "changed encoding from cp850 to utf8" which the day after it was added was converted as the second (local revision number 1) changeset from the svn repo. The particular hg convert command was:

hg convert svn-mirror DEST \
--config hooks.pretxncommit.checkcommitmessage=true \
--config convert.svn.startrev=1152

This created a changeset known as 1:2ec9f101bc31 from that svn revision. Lacking a local revision number 2, the -r 2 picked up this changeset because its hash started with the digit "2". Tnus the NEWNODE variable received the changeset hash for this changeset. Because our hg rebase command is configured to keep empty changesets, the changeset got added atop its already existing copy in the destination repo.

Ever since the akt.sh script would pick up the wrong revision number from the destination repo and abort its run with the message indicating "Revisions differ!".