Search code examples
rcompressionapache-arrow

R Arrow finds no compress even when it is installed


Currently I am trying to generate zstd compressed Apache Parquet files in R in an docker container.

Even when I install all dependencies and arrow itself works fine it does not find the zstd (or brotli for that matter) compression. This is the MWE version of my Dockerfile:

FROM r-base

RUN apt-get update
RUN apt-get -y install --no-install-recommends \
     libcurl4-openssl-dev \
     libssl-dev \
     libxml2-dev \
     libgit2-dev \
     libgsl0-dev \
     libfontconfig1-dev \
     libharfbuzz-dev \
     libfribidi-dev \
     libpng-dev \
     libtiff5-dev \
     git \
     curl \
     build-essential \
     libboost-system-dev \
     libboost-thread-dev \
     libboost-program-options-dev \
     libboost-test-dev \
     libboost-filesystem-dev \
     libsnappy-dev \
     libthrift-dev \
     libutf8proc-dev \
     rapidjson-dev \
     libxsimd-dev \
     liblz4-dev \
     libre2-dev \
     cmake \
     zstd \
     brotli

ARG ARROW_R_DEV=true
RUN R -e 'install.packages(Ncpus = 64, pkgs = c("arrow"))'

When i start the container and test the availability I see:

> arrow::codec_is_available("snappy")
[1] TRUE
> arrow::codec_is_available("zstd")
[1] FALSE
> arrow::codec_is_available("brotli")
[1] FALSE

which showes me that arrow itself is working fine (I tested it too), but cant find zstd or brotli.

How can I write an zstd compressed Parquet file in R in an docker container?


Solution

  • I had a deeper look in the documentation and found that its not only about the system packages beeing installed but, also about how you build arrow itself. Adding the NOT_CRAN=true environment variables sets (among other things) LIBARROW_MINIMAL to false which builds arrow with zstd brotli support.

    This MWE Dockerfile works for me

    FROM r-base
    
    RUN apt-get update
    RUN apt-get -y install --no-install-recommends \
         libcurl4-openssl-dev \
         libssl-dev \
         libxml2-dev \
         libgit2-dev \
         libgsl0-dev \
         libfontconfig1-dev \
         libharfbuzz-dev \
         libfribidi-dev \
         libpng-dev \
         libtiff5-dev \
         git \
         curl \
         build-essential \
         libboost-system-dev \
         libboost-thread-dev \
         libboost-program-options-dev \
         libboost-test-dev \
         libboost-filesystem-dev \
         libsnappy-dev \
         libthrift-dev \
         libutf8proc-dev \
         rapidjson-dev \
         libxsimd-dev \
         liblz4-dev \
         libre2-dev \
         cmake \
         zstd \
         brotli
    
    ARG ARROW_R_DEV=true
    ARG NOT_CRAN=true
    RUN R -e 'install.packages(Ncpus = 64, pkgs = c("arrow"))'