Currently I am trying to generate zstd compressed Apache Parquet files in R in an docker container.
Even when I install all dependencies and arrow itself works fine it does not find the zstd (or brotli for that matter) compression. This is the MWE version of my Dockerfile:
FROM r-base
RUN apt-get update
RUN apt-get -y install --no-install-recommends \
libcurl4-openssl-dev \
libssl-dev \
libxml2-dev \
libgit2-dev \
libgsl0-dev \
libfontconfig1-dev \
libharfbuzz-dev \
libfribidi-dev \
libpng-dev \
libtiff5-dev \
git \
curl \
build-essential \
libboost-system-dev \
libboost-thread-dev \
libboost-program-options-dev \
libboost-test-dev \
libboost-filesystem-dev \
libsnappy-dev \
libthrift-dev \
libutf8proc-dev \
rapidjson-dev \
libxsimd-dev \
liblz4-dev \
libre2-dev \
cmake \
zstd \
brotli
ARG ARROW_R_DEV=true
RUN R -e 'install.packages(Ncpus = 64, pkgs = c("arrow"))'
When i start the container and test the availability I see:
> arrow::codec_is_available("snappy")
[1] TRUE
> arrow::codec_is_available("zstd")
[1] FALSE
> arrow::codec_is_available("brotli")
[1] FALSE
which showes me that arrow
itself is working fine (I tested it too), but cant find zstd
or brotli
.
How can I write an zstd
compressed Parquet file in R in an docker container?
I had a deeper look in the documentation and found that its not only about the system packages beeing installed but, also about how you build arrow itself. Adding the NOT_CRAN=true
environment variables sets (among other things) LIBARROW_MINIMAL
to false
which builds arrow with zstd
brotli
support.
This MWE Dockerfile works for me
FROM r-base
RUN apt-get update
RUN apt-get -y install --no-install-recommends \
libcurl4-openssl-dev \
libssl-dev \
libxml2-dev \
libgit2-dev \
libgsl0-dev \
libfontconfig1-dev \
libharfbuzz-dev \
libfribidi-dev \
libpng-dev \
libtiff5-dev \
git \
curl \
build-essential \
libboost-system-dev \
libboost-thread-dev \
libboost-program-options-dev \
libboost-test-dev \
libboost-filesystem-dev \
libsnappy-dev \
libthrift-dev \
libutf8proc-dev \
rapidjson-dev \
libxsimd-dev \
liblz4-dev \
libre2-dev \
cmake \
zstd \
brotli
ARG ARROW_R_DEV=true
ARG NOT_CRAN=true
RUN R -e 'install.packages(Ncpus = 64, pkgs = c("arrow"))'