Search code examples
nixnixos

Can the same code and same compiler produce a different binary on different machines?


The idea of Nixos binary caches led me to considering this question.

In nix, every compiled binary is associated with a hash key which is obtained from hashing all the dependencies and build script, i.e. a 'derivation' in nix-speak. That is my understanding, anyway.

But couldn't the same derivation lead to different binaries, when compiled on different machines? If machine A's processor has a slightly different instruction set than machine B's processor, and the compiler took this different instruction set into account, wouldn't the binary produced by compiling the derivation on machine A be distinguishable from the binary produced by compiling the derivation on machine B? If so, then couldn't different binaries could have the same derivation and thus the same nix hash?

Does the same derivation built on machines with different instruction sets always produce the same binary?


Solution

  • This depends on the compiler implementation and options passed to it. For example, GCC by default does not seems to pay attention to the specifics of the current processor, unless you specify -march=native or -mtune=native.

    So yes, if you use flags like these or a compiler with default behavior like these flags, you will get a different output on a machine with a different model of cpu.

    A build can be non-reproducible for other reasons as well, such as inappropriate use of clock values or random values or even counters that are accessed in non-deterministically interleaved patterns by threads.

    Nix does provide a sandbox that removes some sources of entropy; primarily the supposedly unrelated software that may be present on a machine. It does not remove all of these sources for practical reasons.

    For these reasons, reproducibility will have to be a consideration, even when packaging with Nix; not something that is solved completely by it.

    I'll quote the menu "Achieve deterministic builds " from https://reproducible-builds.org/docs/ and annotate it with the effect of Nix to the best of my knowledge. Don't quote me on this.

    • SOURCE_DATE_EPOCH: solved; set by Nixpkgs
    • Deterministic build systems: partially solved; Nixpkgs may include patches
    • Volatile inputs can disappear: solvable with Nix if you upload sources to the (binary) cache. Hercules CI does this.
    • Stable order for inputs: mostly solved. Nix language preserves source order and sorts attributes.
    • Value initialization: low-level problem not solved by Nix
    • Version information: not solved; clock is accessible in sandbox
    • Timestamps: same as above
    • Timezones: solved by sandbox
    • Locales: solved by sandbox
    • Archive metadata: not solved
    • Stable order for outputs: use of randomness not solvable by sandbox
    • Randomness: same
    • Build path: partially; linux uses /build; macOS may differ depending on installation method
    • System images: broad issue taking elements from previous items
    • JVM: same