Search code examples
embedded-linuxyoctobitbakeopenembedded

Complex programs built with Bitkake have different checksums based on path, simple programs yield always the same checksums


Executive summary: Some binaries build for our yocto images have different md5sums based on build path. Path does not affect checksums of a minimal project built.

I am using TI-supplied Yocto platform to build our product images. It is working fine. We have a couple of our proprietary software projects hooked up there as additional layer to make our actual application image.

Recently we discovered that the checksums of a couple of helper programs we are building for our application image yield different checksums based on where they are build.

Example when building under /tmp/Project-xxxxx_Yocto and ~/repos/Project-xxxxx_Yocto:

user@MACHINE:/tmp/Project-xxxxx_Yocto$ md5sum ./workdir/arago-tmp-default-glibc/work/armv7at2hf-neon-oe-linux-gnueabi/canbus-handler/git+AUTOINC+b99a8424b6-r0/packages-split/canbus-handler/usr/bin/canbus-handler
4f1b270a374c14bcd95d093095f4354d ./workdir/arago-tmp-default-glibc/work/armv7at2hf-neon-oe-linux-gnueabi/canbus-handler/git+AUTOINC+b99a8424b6-r0/packages-split/canbus-handler/usr/bin/canbus-handler

user@MACHINE:~/repos/Project-xxxxx_Yocto$ md5sum ./workdir/arago-tmp-default-glibc/work/armv7at2hf-neon-oe-linux-gnueabi/canbus-handler/git+AUTOINC+b99a8424b6-r0/packages-split/canbus-handler/usr/bin/canbus-handler
db35646094dc2f05022e012666973d7f md5sum ./workdir/arago-tmp-default-glibc/work/armv7at2hf-neon-oe-linux-gnueabi/canbus-handler/git+AUTOINC+b99a8424b6-r0/packages-split/canbus-handler/usr/bin/canbus-handler

Ok, so next I wanted to create a minimal compilable debug project ( https://github.com/usvi/helloyoctoworld ) to demonstrate. Guess what? Checksums from this project are always to same, the path does not affect!

user@MACHINE:/tmp/Project-xxxxx_Yocto$ md5sum ./workdir/arago-tmp-default-glibc/work/armv7at2hf-neon-oe-linux-gnueabi/helloyoctoworld/git+AUTOINC+6716589062-r0/packages-split/helloyoctoworld/usr/bin/helloyoctoworld
40ae08fc09eb08ef8f519ee9312659c9  ./workdir/arago-tmp-default-glibc/work/armv7at2hf-neon-oe-linux-gnueabi/helloyoctoworld/git+AUTOINC+6716589062-r0/packages-split/helloyoctoworld/usr/bin/helloyoctoworld

user@MACHINE:~/repos/Project-xxxxx_Yocto$ md5sum ./workdir/arago-tmp-default-glibc/work/armv7at2hf-neon-oe-linux-gnueabi/helloyoctoworld/git+AUTOINC+6716589062-r0/packages-split/helloyoctoworld/usr/bin/helloyoctoworld
40ae08fc09eb08ef8f519ee9312659c9  ./workdir/arago-tmp-default-glibc/work/armv7at2hf-neon-oe-linux-gnueabi/helloyoctoworld/git+AUTOINC+6716589062-r0/packages-split/helloyoctoworld/usr/bin/helloyoctoworld

So to re-iterate: helloyoctoworld build in /tmp vs. build in /home/USER/repos

40ae08fc09eb08ef8f519ee9312659c9 vs. 40ae08fc09eb08ef8f519ee9312659c9

canbus-handler build in /tmp vs build in /home/USER/repos

4f1b270a374c14bcd95d093095f4354d vs. db35646094dc2f05022e012666973d7f

So what is going on? How to debug this? Ok, well maybe I'll first take the canbus-handler and peel stuff off to see when it starts to have stable checksums on both locations.

EDIT1: I ran strings + diff on the binaries:

< /tmp/Project-xxxxx_Yocto/workdir/arago-tmp-default-glibc/work/armv7at2hf-neon-oe-linux-gnueabi/canbus-handler/git+AUTOINC+deffada7ed-r0/recipe-sysroot/usr/include/nlohmann/json.hpp
---
> /home/USER/repos/Project-xxxxx_Yocto/workdir/arago-tmp-default-glibc/work/armv7at2hf-neon-oe-linux-gnueabi/canbus-handler/git+AUTOINC+deffada7ed-r0/recipe-sysroot/usr/include/nlohmann/json.hpp

So, nlohmann is imprinting it's header location to the actual binary. When I build the binary without Yocto, directly on the host machine natively with "make" the location is also there:

cannot use operator[] with
/usr/include/nlohmann/json.hpp
m_object != nullptr

EDIT2: Made self-contained test case: https://github.com/usvi/nlohmannjsontest

Results are strange:

janne@shell:/tmp$  git clone [email protected]:usvi/nlohmannjsontest.git
Cloning into 'nlohmannjsontest'...
remote: Enumerating objects: 18, done.
remote: Counting objects: 100% (18/18), done.
remote: Compressing objects: 100% (14/14), done.
remote: Total 18 (delta 2), reused 18 (delta 2), pack-reused 0 (from 0)
Receiving objects: 100% (18/18), 204.74 KiB | 605.00 KiB/s, done.
Resolving deltas: 100% (2/2), done.
janne@shell:/tmp$ cd nlohmannjsontest/
janne@shell:/tmp/nlohmannjsontest$ make
g++  -I inc -Wno-deprecated  -c -o main_3.11.3.o src/main_3.11.3.cpp
g++   main_3.11.3.o   -o main_3.11.3
g++  -I inc -Wno-deprecated  -c -o main_2.1.1.o src/main_2.1.1.cpp
g++   main_2.1.1.o   -o main_2.1.1
strip main_3.11.3
strip main_2.1.1
strings main_3.11.3 | grep "json.hpp"
inc/nlohmann3.11.3/json.hpp
strings main_2.1.1 | grep "json.hpp"
inc/nlohmann2.1.1/json.hpp

So, the path is imprinted.


Solution

  • This is going to be exactly the same stuff I told you over IRC, but at least here the answer will be archived.

    This is a classic reproducible builds (https://reproducible-builds.org) problem. There are most likely build paths in the generated output: it could be other non-deterministic sources, until you've verified its build paths you can't be sure.

    Lots of things to try:

    • Strip the binaries and see if they're now identical. If they are then you know you've isolated the problem to the debug symbols.
    • Run strings on the (unstripped) binaries and look for paths. Check for path components in case there is code generation that transforms the path, e.g. _home_ross_yocto could be a symbol name.
    • Get two binaries built (ideally on the same machine, just different build trees) and run diffoscope over them. This will tell you exactly where the differences are.

    When you know what exactly is causing the difference, it's normally quite simple to fix.

    Some common problems:

    • A subset of the build forgets to respect the passed in CFLAGS so doesn't build with -fdebug-prefix-map (set in DEBUG_PREFIX_MAP). This is common with assembler code which can still emit debug symbols but doesn't often get passed the right flags. Easily fixed by ensuring the right flags are passed.
    • Generated source embeds the build path somehow, maybe in variable names or references to the real source.
    • Code may embed the value of CC CFLAGS etc as strings for informative purposes.

    Worked example: packages using cython are non-reproducible. Diffoscope showed that there are two causes:

    1. the source package contains generated code which embed build paths
    2. the binaries have strings and symbols containing the build path

    For (1) the references are in comments that are not needed after the build, so we can just strip them out. For (2) the string is the path to the original source file, and luckily the variable name is generated from the string value. So I implemented path remapping à la -fdebug-prefix-map here and remap S and B.