Search code examples
pythondockerpipfacebook-prophet

Discrepancy between two hosts running the same docker commands


A colleague and I have a big Docker puzzle.

When we run the following commands we get different results.

docker run -it python:3.8.6 /bin/bash
pip install fbprophet

For me, it installs perfectly, while for him it produces an error and fails to install. I thought the whole point of docker is to prevent this kind of issue, so I'm really puzzled.

I'm giving more details below, but my main question is:

  • How is it possible that we get different results?

More details:

We both are running Docker in a new MacBook Pro with similar specs, on Catalina. His Docker engine version 20.x.x is slightly newer than mine 19.X.X. Also:

  • He tried all the commands he could think of to clean up things in Docker.
  • We verified that the hashes of the image IDs were the same.
  • Our resource settings were also the same.
  • He tried reinstalling Docker and changing to other versions of python (3.7).
  • We tried simultaneously on multiple occasions during the last three days.

The result was always the same: He gets the error and I don't.

The error he gets is the following.

Error:
Installing collected packages: six, pytz, python-dateutil, pymeeus, numpy, pyparsing, pillow, pandas, korean-lunar-calendar, kiwisolver, ephem, Cython, cycler, convertdate, tqdm, setuptools-git, pystan, matplotlib, LunarCalendar, holidays, cmdstanpy, fbprophet
    Running setup.py install for fbprophet ... error
    ERROR: Command errored out with exit status 1:
     command: /usr/local/bin/python -u -c ‘import sys, setuptools, tokenize; sys.argv[0] = ‘“’”‘/tmp/pip-install-l516b8ts/fbprophet_80d5f400081541a2bf6ee26d2785e363/setup.py’“‘”’; __file__=‘“’”‘/tmp/pip-install-l516b8ts/fbprophet_80d5f400081541a2bf6ee26d2785e363/setup.py’“‘”’;f=getattr(tokenize, ‘“’”‘open’“‘”’, open)(__file__);code=f.read().replace(‘“’”‘\r\n’“‘”’, ‘“’”‘\n’“‘”’);f.close();exec(compile(code, __file__, ‘“’”‘exec’“‘”’))' install --record /tmp/pip-record-7n8tvfkb/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.8/fbprophet
         cwd: /tmp/pip-install-l516b8ts/fbprophet_80d5f400081541a2bf6ee26d2785e363/
    Complete output (10 lines):
    running install
    running build
    running build_py
    creating build
    creating build/lib
    creating build/lib/fbprophet
    creating build/lib/fbprophet/stan_model
    Importing plotly failed. Interactive plots will not work.
    INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_dfdaf2b8ece8a02eb11f050ec701c0ec NOW.
    error: command ‘gcc’ failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/local/bin/python -u -c ‘import sys, setuptools, tokenize; sys.argv[0] = ‘“’”‘/tmp/pip-install-l516b8ts/fbprophet_80d5f400081541a2bf6ee26d2785e363/setup.py’“‘”’; __file__=‘“’”‘/tmp/pip-install-l516b8ts/fbprophet_80d5f400081541a2bf6ee26d2785e363/setup.py’“‘”’;f=getattr(tokenize, ‘“’”‘open’“‘”’, open)(__file__);code=f.read().replace(‘“’”‘\r\n’“‘”’, ‘“’”‘\n’“‘”’);f.close();exec(compile(code, __file__, ‘“’”‘exec’“‘”’))' install --record /tmp/pip-record-7n8tvfkb/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.8/fbprophet Check the logs for full command output.

Note that running the two commands I provided always produce errors, but they are not critical. Upgrading setuptools and installing the dependencies before fbprophet solves those minor errors. The error shown above is different, related to gcc, and only happens to some people.

Optional additional questions:

  • How do we fix it?
  • How do we prevent non-reproducible results like this one?
  • Can upgrading the docker engine version break a container?

Solution

  • How do we fix it?

    Your error reports a GCC / compilation problem.
    A quick search shows mostly problems related to python / gcc version (one, two, three).
    But you are right, this doesn't look like as it could happen inside a one particular container.

    What it does look like is some kind of OOM problem.

    Also, is this a VM? Stan requires a significant amount of memory to compile the models, and this error can occur if you run out of RAM while it is compiling.

    I did a bit of testing.
    On my machine the compilation process consumed up to 2.4 Gb of RAM.

    cat /etc/redhat-release
    CentOS Linux release 7.9.2009 (Core)
    
    uname -r
    3.10.0-1160.6.1.el7.x86_64
    
    docker --version
    Docker version 20.10.1, build 831ebea
    
    # works fine
    docker run --rm -it -m 3G python:3.8.6 /bin/bash
    
    # fails with error: command 'gcc' failed with exit status 1
    # actually it was killed by OOM killer
    docker run --rm -it -m 2G python:3.8.6 /bin/bash
    
    # yes, here he is
    tail -f /var/log/messages | grep -i 'killed process'
    Dec 22 08:34:09 cent7-1 kernel: Killed process 5631 (cc1plus), UID 0, total-vm:2073600kB, anon-rss:1962404kB, file-rss:15332kB, shmem-rss:0kB
    Dec 22 08:35:56 cent7-1 kernel: Killed process 5640 (cc1plus), UID 0, total-vm:2056816kB, anon-rss:1947392kB, file-rss:15308kB, shmem-rss:0kB
    

    Check OOM killer log on problematic machine.
    Is there enough RAM available for Docker?


    Can upgrading the docker engine version break a container?

    Generally, it shouldn't be the case.
    But for v20.10.0 Docker introduced a very big set of changes related to memory and cgroups.

    After you rule out all obvious reasons (like your friend's machine just not having enough RAM), you might need to dig into your docker daemon settings related to memory / cgroups / etc.


    How can the same container produce different results on two computers?

    Well, technically it's quite possible.
    Containerized programs still use host OS kernel.
    Not all kernel settings are "namespaced", i. e. can be set exclusively for one particular container.
    A lot of them (actually, most) are still global and can affect your program's behavior.

    Though I don't think it's related to your problem.
    But for complicated programs relying on specific kernel setting that must be taken into account.