Search code examples
rreticulate

reticulate segfaults with call to plt.plot()


I am encountering a segfault when I make a reticulated call to matplotlib.pyplot.plot().


Steps to produce error:

  1. Create a Dockerfile with the contents:

    FROM rocker/r-ver:latest
    
    RUN apt update && apt install -y python3.8-venv python3.8-dev
    
    RUN install2.r --error reticulate
    
    COPY test.R /root/
    
  2. Create a file test.R (in the same location) with the contents:

    reticulate::virtualenv_create(
      envname = "./venv",
      packages = c("matplotlib")
    )
    
    reticulate::use_virtualenv("./venv")
    
    reticulate::py_run_string("import matplotlib.pyplot as plt; plt.plot([1, 2, 3], [1, 2, 3])")
    
  3. Build an image from the Dockerfile: docker build . --tag="segfault-reprex"

  4. Try to run test.R in the running container: docker run segfault-reprex Rscript /root/test.R. This gives the full traceback listed below.


Full traceback

Using Python: /usr/bin/python3.8
Creating virtual environment './venv' ... Done!
Installing packages: 'pip', 'wheel', 'setuptools', 'matplotlib'
Collecting pip
  Downloading pip-21.3.1-py3-none-any.whl (1.7 MB)
Collecting wheel
  Downloading wheel-0.37.1-py2.py3-none-any.whl (35 kB)
Collecting setuptools
  Downloading setuptools-60.5.0-py3-none-any.whl (958 kB)
Collecting matplotlib
  Downloading matplotlib-3.5.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.3 MB)
Collecting kiwisolver>=1.0.1
  Downloading kiwisolver-1.3.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB)
Collecting fonttools>=4.22.0
  Downloading fonttools-4.28.5-py3-none-any.whl (890 kB)
Collecting packaging>=20.0
  Downloading packaging-21.3-py3-none-any.whl (40 kB)
Collecting cycler>=0.10
  Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
Collecting numpy>=1.17
  Downloading numpy-1.22.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.8 MB)
Collecting pillow>=6.2.0
  Downloading Pillow-9.0.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB)
Collecting python-dateutil>=2.7
  Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting pyparsing>=2.2.1
  Downloading pyparsing-3.0.6-py3-none-any.whl (97 kB)
Collecting six>=1.5
  Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: pip, wheel, setuptools, kiwisolver, fonttools, pyparsing, packaging, cycler, numpy, pillow, six, python-dateutil, matplotlib
  Attempting uninstall: pip
    Found existing installation: pip 20.0.2
    Uninstalling pip-20.0.2:
      Successfully uninstalled pip-20.0.2
  Attempting uninstall: setuptools
    Found existing installation: setuptools 44.0.0
    Uninstalling setuptools-44.0.0:
      Successfully uninstalled setuptools-44.0.0
Successfully installed cycler-0.11.0 fonttools-4.28.5 kiwisolver-1.3.2 matplotlib-3.5.1 numpy-1.22.0 packaging-21.3 pillow-9.0.0 pip-21.3.1 pyparsing-3.0.6 python-dateutil-2.8.2 setuptools-60.5.0 six-1.16.0 wheel-0.37.1
Virtual environment './venv' successfully created.

 *** caught segfault ***
address 0x7ffaeabe1100, cause 'memory not mapped'

Traceback:
 1: py_run_string_impl(code, local, convert)
 2: reticulate::py_run_string("import matplotlib.pyplot as plt; plt.plot([1, 2, 3], [1, 2, 3])")
An irrecoverable exception occurred. R is aborting now ...

Things I have noted:

  1. A minimal example inovling eg. the pandas package, rather than matplotlib, runs successfully. ie. if test.R contains:

    reticulate::virtualenv_create(
      envname = "./venv",
      packages = c("pandas")
    )
    
    reticulate::use_virtualenv("./venv")
    
    reticulate::py_run_string("import pandas as pd; df = pd.DataFrame()")
    
  2. If you enter the container interactively (docker run -it segfault-reprex /bin/bash), run test.R (Rscript /root/test.R), activate the resulting virutalenv (source /root/venv/bin/activate), you can use matplotlib fine from python (python -c "import matplotlib.pyplot as plt; plt.plot([1, 2, 3], [1, 2, 3])")

  3. The reticulate documentation states that:

    for reticulate to bind to a version of Python it must be compiled with shared library support (i.e. with the --enable-shared flag)

    docker run -it segfault-reprex /usr/bin/python3 -c "import sysconfig; print(sysconfig.get_config_vars('Py_ENABLE_SHARED'))" shows that the container's Python was compiled with shared library support


Solution

  • The problem is that the R binary in rocker/r-ver:latest is compiled against a different BLAS library to the one which the numpy on PyPI is compiled against.

    This was explained to me by Tomasz Kalinowski here.

    The solution is to ensure numpy uses the same BLAS libraries as rocker/r-ver's R binary does. An easy way to ensure this is to compile numpy from source. This compilation could be performed at either image build-time or container runtime.

    Compiling numpy at runtime

    To compile numpy at container runtime we can leave our Dockerfile as is, and add a call to system2() after our initial call to reticulate::virtualenv_create(). Altering test.R to become:

    reticulate::virtualenv_create(
      envname = "./venv",
      packages = c("matplotlib")
    )
    
    system2("./venv/bin/pip3", c("install",
                                 "--no-binary='numpy'",
                                 "numpy",
                                 "--ignore-installed"))
    
    reticulate::use_virtualenv("./venv")
    
    reticulate::py_run_string("import matplotlib.pyplot as plt;plt.plot([1, 2, 3], [1, 2, 3])")
    

    After rebuilding our image, we can run test.R in this container without segfault!

    Compiling numpy at build-time

    Compiling numpy at runtime adds ~3 mins to every call of our R script!

    A better solution could be to perform this compilation at image build-time. This would mean we'd only have to wait those ~3 minutes once (at image build time), rather than every time we run our script!

    A Dockerfile to do so could look like:

    FROM rocker/r-ver:latest
    
    RUN apt update && apt install -y python3 python3-dev python3-venv
    
    RUN install2.r --error reticulate
    
    # Create a venv
    RUN python3 -m venv /root/venv
    
    # Compile numpy from source into venv
    RUN /root/venv/bin/pip3 install --no-binary="numpy" numpy --ignore-installed
    
    COPY test.R /root/
    

    The accompanying test.R file would then make use of reticulate::virtualenv_install() as:

    reticulate::virtualenv_install(
      envname = "/root/venv",
      packages = c("matplotlib")
    )
    
    reticulate::use_virtualenv("/root/venv")
    
    reticulate::py_run_string("import matplotlib.pyplot as plt;plt.plot([1, 2, 3], [1, 2, 3])")
    

    NB. when running a container from the image with numpy already compiled, you'll need to run as either root (-u="root"), or else change the permissions on the compiled numpy version in the Dockerfile; otherwise you will encounter a permissions error.