docker operating-system aws-batch tarfile

AWS Batch extract tar.gz file to /opt/ml/model fails with OSError: [Errno 30] Read-only file system

We have python code which does the following inside a docker container

import boto3
import tarfile


s3 = boto3.client('s3')

s3.download_file("dev-bucket", "test/model.tar.gz", "/opt/ml/model/model.tar.gz")

tar = tarfile.open("/opt/ml/model/model.tar.gz", 'r:gz')
tar.extractall(path="/opt/ml/model")

However, the job fails while extracting with "OSError: [Errno 30] Read-only file system" . Complete stack trace is :

Traceback (most recent call last):

>   File "inference.py", line 6
>     tar.extractall(path="/opt/ml/model")   File "/opt/conda/lib/python3.7/tarfile.py", line 2002, in extractall
>     numeric_owner=numeric_owner)   File "/opt/conda/lib/python3.7/tarfile.py", line 2044, in extract
>     numeric_owner=numeric_owner)   File "/opt/conda/lib/python3.7/tarfile.py", line 2114, in _extract_member
>     self.makefile(tarinfo, targetpath)   File "/opt/conda/lib/python3.7/tarfile.py", line 2163, in makefile
>     copyfileobj(source, target, tarinfo.size, ReadError, bufsize)   File "/opt/conda/lib/python3.7/tarfile.py", line 250, in copyfileobj
>     dst.write(buf) OSError: [Errno 30] Read-only file system

Dockerfile is as follows:

FROM continuumio/miniconda3

# use python3.7
RUN /opt/conda/bin/conda install python=3.7

# Update conda
RUN /opt/conda/bin/conda update -n base conda

# Install build-essential
RUN apt-get update && apt-get install -y build-essential \
    wget \
    nginx \
    ca-certificates \
    && rm -rf /var/lib/apt/lists/*

# Install Python dependencies
RUN conda install -y pandas==0.25.1 scikit-learn==0.21.2 s3fs==0.4.2
RUN pip install pyarrow==1.0.0 mxnet joblib==0.13.2 boto3

CMD [ "/bin/bash" ]

ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"


RUN mkdir -p /opt/ml/model
RUN chmod -R +w /opt/ml/model
RUN mkdir -p /opt/ml/input/data
# Set up the program in the image
COPY helloworld /opt/program
WORKDIR /opt/program

Solution

The data size was huge and the 10GB default volume was getting filled resulting in the volume becoming read-only. The solution was to use launch template and attach additional volume. This solved the issue for me.

Detailed explanation:

Please note that by default, Docker allocates 10 gibibytes (GiB) of storage for each volume it creates on an Amazon ECS container instance. If a volume reaches the 10-GiB limit, then you can't write any more data to that volume without causing the container instance to crash or the filesystem turns to read only mode. This is applicable only f you're using Amazon Linux 1 AMIs to launch container instances in your ECS cluster. Amazon Linux 2 AMIs use the Docker overlay2 storage driver, which gives you a base storage size of the space left on your disk. [Batch by default launches Amazon Linux 1 AMIs.]
To increase the default storage allocation for Docker volumes, you need to set the dm.basesize storage option to a value higher than 10 GiB in the Docker daemon configuration file /etc/sysconfig/docker on the container instance. This dm.basesize value can be increased upto your EBS volume size and will allow container/batch job to utilise the full space for execution. After setting the dm.basesize value, any new images that are pulled by Docker use the new storage value that you set. Any containers/batch job that were created or running before you changed the value still use the previous storage value.
To apply the dm.basesize option to all your containers, set the value of the option before the Docker service starts.  You can use a launch template to build a configuration template that applies to all your Amazon Elastic Compute Cloud (Amazon EC2) instances launched by AWS Batch. The following example MIME multi-part file overrides the default Docker image settings for a compute resource:
==============================
Content-Type: multipart/mixed; boundary="==BOUNDARY=="
MIME-Version: 1.0
--==BOUNDARY==
Content-Type: text/cloud-boothook; charset="us-ascii"
#cloud-boothook
#!/bin/bash
cloud-init-per once docker_options echo 'OPTIONS="${OPTIONS} --storage-opt dm.basesize=20G"' >> /etc/sysconfig/docker
--==BOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
# Set any ECS agent configuration options
echo ECS_CLUSTER=default>>/etc/ecs/ecs.config
echo ECS_IMAGE_CLEANUP_INTERVAL=60m >> /etc/ecs/ecs.config
echo ECS_IMAGE_MINIMUM_CLEANUP_AGE=60m >> /etc/ecs/ecs.config
--==BOUNDARY==--
==============================