Search code examples
versioningsemantic-versioning

Semantic Versioning 2 zero cases


After looking at the SemVer 2.0 guidelines, I have the following questions:

  • Can Major, Minor and Patch be all 0 at the same time? I.E. is 0.0.0 considered valid?
  • Can you have trailing 0 in the prerelease or build metadata labels? I.E. is 1.0.0-alpha.0 considered valid?

According to their BNF grammar, both examples provided are considered valid, but does that make sense?


Solution

  • Yes, 0.0.0 is valid. Yes, -alpha.0 is valid.

    To the best of my knowledge the entire spec, including the BNF "makes sense", but you have to live in the space for a while to really understand why. The spec is not verbose for a reason. It says only what it needs to say, such that tool makers can write compliant code and tool users can have as much freedom as possible. In fact, before the spec was written, there were already tools in use, that used similar syntax and semantics.

    So, 0.0.0 is valid, because their are no strong arguments against it. It's like the endless arguments around whether a programming language should use zero or one based array indexing. There's reasons to do, and not to do, it either way. Specs should not be any more opinionated than they absolutely need to be, and this is one of those cases where it's just better to leave it up to the end users. Tooling must accept 0.0.0, but doesn't ever have to generate it. It would actually require extra words in the spec, and extra code in the tooling to interpret 0.0.0 as invalid input. There's no objective reason to carry out the exercise.

    As for the zero field in the prerelease tag, it's fine, but -alpha.00 is not. The spec forbids "leading zeroes", because they screw up sorting. Prerelease tags are made up of a hyphen '-' and a series of dot separated numeric, or alphanumeric fields. The '.0' in your example is a valid numeric field as defined by the spec.

    Consider the initial state of a project. Maybe your DevOps team has a project template setup that has all the build configuration ready for you to clone and then start working on your project. There's two options at this point. Either there is no version associated with the initial project state, or it is initialized to some starting value. In the later case, what version number to start with? 0.0.0 seems like it might be a good idea. Perhaps build automation reads that version, reads the commit messages, and choses an appropriate next version. That tooling is more complicated if it has to deal with the missing version case. Why pay the run-time penalty of carrying that code when a valid static value can be stored somewhere instead?


    The SemVer spec achieves two out of three primary goals:

    1. It defines the syntax & semantics of the version triple + tags.
    2. It defines how version strings should be sorted.
    3. Attempts to alleviate dependency hell (only partial success).

    Some elements of #1 are there to support #2. The "semantic" bits of the spec are there to alleviate some of the pain around dependency hell, but the syntax and other rules are about sorting, as much as anything else. Let's start with #2:

    #2 A normal version number MUST take the form X.Y.Z where X, Y, and Z are non-negative integers, and MUST NOT contain leading zeroes. X is the major version, Y is the minor version, and Z is the patch version. Each element MUST increase numerically. For instance: 1.9.0 -> 1.10.0 -> 1.11.0.

    First sentence: "A normal version number MUST take the form X.Y.Z where X, Y, and Z are non-negative integers, and MUST NOT contain leading zeroes."

    The "non-negative integers" part, explicitly allows zero in any or all of the fields. See 0 Wikipedia, specifically "As a digit, 0 is used as a placeholder in place value systems". SemVer falls into the category of "place value systems".

    The "MUST NOT contain leading zeroes" bit is there in support of sorting. While the spec says later that numeric fields are "compared numerically", the ASCII code points for the numeric digits are 48..57 (0..9), so one does not have to convert each field to integer, in order to sort it. When sorting, we compare each code point one at a time, until one of them is different, or we find that they have different lengths. The prefix being equal, the longer of the two strings is greater than the shorter one. If we allowed leading zeroes then we would be required to convert the field to an integer first (because "00" > "0" wouldn't make sense), and that would require more than twice as much work.

    #9 A pre-release version MAY be denoted by appending a hyphen and a series of dot separated identifiers immediately following the patch version. Identifiers MUST comprise only ASCII alphanumerics and hyphens [0-9A-Za-z-]. Identifiers MUST NOT be empty. Numeric identifiers MUST NOT include leading zeroes. Pre-release versions have a lower precedence than the associated normal version. A pre-release version indicates that the version is unstable and might not satisfy the intended compatibility requirements as denoted by its associated normal version. Examples: 1.0.0-alpha, 1.0.0-alpha.1, 1.0.0-0.3.7, 1.0.0-x.7.z.92, 1.0.0-x-y-z.–.

    The second sentence: "Identifiers MUST comprise only ASCII alphanumerics and hyphens [0-9A-Za-z-]" constrains prerelease tag fields to the ASCII code point ranges 48..57, 65..90 and 97..122. This rule, combined with #11, helps render sorting nearly trivial.

    #11 Precedence refers to how versions are compared to each other when ordered.

    1. Precedence MUST be calculated by separating the version into major, minor, patch and pre-release identifiers in that order (Build metadata does not figure into precedence).

    The reason for this rule is simple. If you applied standard ASCII collation to the entire string without regard to the fields, there would be numerous anomalies, such as 1.999.0 sorting higher than 2.0.0.

    1. Precedence is determined by the first difference when comparing each of these identifiers from left to right as follows: Major, minor, and patch versions are always compared numerically.

    Example: 1.0.0 < 2.0.0 < 2.1.0 < 2.1.1.

    Now here's where "Identifiers MUST NOT be empty" bit in #9 comes into play, and one reason why zero is not banned from some fields. If you were given the string "1.1." or even "1.1", you could probably infer the minor field is empty, but it's more convenient if you can count on a value being present, and not have to implement the special case where a field is NULL (C/C++) or some other "not initialized" state. So the spec requires the place holder "0" in that field, and for some languages that you might implement your parse and compare code in, it's just easier to process.

    Note that some tools have always supported the short forms "1" and "1.1", but that's an implementers choice and non-compliant. The spec requires the full triple for interop as well as reduced cognitive load on humans viewing the string. The fixed triple format is less likely to be misinterpreted by humans. The full triple also provides a modicum of disambiguation of SemVer strings from other versioning schemes (perhaps a bit of opinionated specmanship).

    1. When major, minor, and patch are equal, a pre-release version has lower precedence than a normal version:

    Example: 1.0.0-alpha < 1.0.0.

    There's actually two flavors of prerelease versions, but the authors apparently didn't find it necessary to point out that 0.1.0-experimental < 0.1.0. Placing a prerelease tag on "0.1.0" may seem redundant, but the spec doesn't forbit it, and there may be cases where this sort of thing is practical to somebody.

    1. Precedence for two pre-release versions with the same major, minor, and patch version MUST be determined by comparing each dot separated identifier from left to right until a difference is found as follows:

    Identifiers consisting of only digits are compared numerically.

    Identifiers with letters or hyphens are compared lexically in ASCII sort order.

    Numeric identifiers always have lower precedence than non-numeric identifiers.

    A larger set of pre-release fields has a higher precedence than a smaller set, if all of the preceding identifiers are equal.

    Example: 1.0.0-alpha < 1.0.0-alpha.1 < 1.0.0-alpha.beta < 1.0.0-beta < 1.0.0-beta.2 < 1.0.0-beta.11 < 1.0.0-rc.1 < 1.0.0.

    Things get interesting here. Given -alpha.1 and alpha.a, you might think that you can treat the last field as alphanumeric (one of them is right?), so we could interpret the field to be alpha-numeric and sort by ASCII code points, in which case -alpha.1 < alpha.a, but then there's the rule that "Numeric identifiers always have lower precedence than non-numeric identifiers", but that also gives us -alpha.1 < alpha.a. Even so, we can't dispose of this rule because what if we have -alpha.9999 and -alpha.a?

    It's not specified anywhere, but I suspect the following tag history seemed more natural than other ways the spec could have been defined:

    -0 < -1 < -2 ... < -alpha < -beta

    Because that has been a standard collation since the earliest days of computing.

    Now most implementations attempt to convert a field to an integer or other numeric type, and treat the field as numeric on success or alpha-numeric on failure. If both fields are numeric, they'll wind up comparing the values directly, which seems like if might be faster than comparing individual code points, but that's ignoring the conversion overhead. Besides the non-optimal space and time required for these conversions, they often lead to non-compliant implementations. Some programming languages, at least on some systems, can't convert "-20210313" to an integer.

    In fact, there are legal SemVer fields that cannot be represented exactly, by any integer or float value that fits into a register of any current systems. The spec does no place limits on the length of any field. The FAQ expresses an opinion on this point, but the spec does not. Even adhering to the FAQ's suggestion that a 256 byte version string is long enough, it's easy to contrive a prerelease tag field that could fill most of that space. So, to be fully compliant with the spec, we must:

    • Scan the fields being compared from left to right, one code point at a time,
    • note whether either contains non-numeric characters,
    • and apply the above rules as soon as we encounter a difference.

    Of course, due to the fact that particular comparison algorithm will work for every conceivable field in a SemVer string, it also happens to be both space and time optimal.