amazon-s3 aws-fargate snakemake aws-batch

Snakemake running as an AWS Batch or an AWS Fargate task raises MissingInputException on the inputs stored on a S3 bucket

We have a Dockerized Snakemake pipeline with the input data stored on a S3 bucket snakemake-bucket:

Snakefile:

rule bwa_map:
    input:
        "data/genome.fa"
    output:
        "results/mapped/A.bam"
    shell:
        "cat {input} > {output}"

Dockerfile

FROM snakemake/snakemake:v8.15.2
RUN mamba install -c conda-forge -c bioconda snakemake-storage-plugin-s3
WORKDIR /app
COPY ./workflow ./workflow
ENV PYTHONWARNINGS="ignore:Unverified HTTPS request"
CMD ["snakemake","--default-storage-provider","s3","--default-storage-prefix","s3://snakemake-bucket","results/mapped/A.bam","--cores","1","--verbose","--printshellcmds"]

When we run the container with the following command, it downloads the input file, runs the pipeline and stores the output on the bucket successfully:

docker run -it -e SNAKEMAKE_STORAGE_S3_ACCESS_KEY=**** -e SNAKEMAKE_STORAGE_S3_SECRET_KEY=****  our-snakemake:v0.0.10

However, when we deploy it as an AWS Batch Job or AWS Fargate Task, it gives the following error immediately:

Assuming unrestricted shared filesystem usage.
Building DAG of jobs...
Full Traceback (most recent call last):
  File "/opt/conda/envs/snakemake/lib/python3.12/site-packages/snakemake/cli.py", line 2103, in args_to_api
    dag_api.execute_workflow(
  File "/opt/conda/envs/snakemake/lib/python3.12/site-packages/snakemake/api.py", line 594, in execute_workflow
    workflow.execute(
  File "/opt/conda/envs/snakemake/lib/python3.12/site-packages/snakemake/workflow.py", line 1081, in execute
    self._build_dag()
  File "/opt/conda/envs/snakemake/lib/python3.12/site-packages/snakemake/workflow.py", line 1037, in _build_dag
    async_run(self.dag.init())
  File "/opt/conda/envs/snakemake/lib/python3.12/site-packages/snakemake/common/__init__.py", line 94, in async_run
    return asyncio.run(coroutine)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/snakemake/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/snakemake/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/snakemake/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/opt/conda/envs/snakemake/lib/python3.12/site-packages/snakemake/dag.py", line 183, in init
    job = await self.update(
          ^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/snakemake/lib/python3.12/site-packages/snakemake/dag.py", line 1013, in update
    raise exceptions[0]
  File "/opt/conda/envs/snakemake/lib/python3.12/site-packages/snakemake/dag.py", line 970, in update
    await self.update_(
  File "/opt/conda/envs/snakemake/lib/python3.12/site-packages/snakemake/dag.py", line 1137, in update_
    raise MissingInputException(job, missing_input)
snakemake.exceptions.MissingInputException: Missing input files for rule bwa_map:
    output: results/mapped/A.bam
    wildcards: sample=A
    affected files:
        s3://snakemake-bucket/data/genome.fa (storage)

MissingInputException in rule bwa_map in file /app/workflow/Snakefile, line 10:
Missing input files for rule bwa_map:
    output: results/mapped/A.bam
    wildcards: sample=A
    affected files:
        s3://snakemake-bucket/data/genome.fa (storage)

Any ideas would be appreciated

The image works fine on local and alos on an external VPS but it doesn't work on an AWS Fargate.
The file on the bucket is accessible and downloadable from inside the container on the AWS task, checked by: /opt/conda/envs/snakemake/bin/python -c "import os ;import boto3 ;s3 = boto3.resource('s3',aws_access_key_id=os.environ.get('SNAKEMAKE_STORAGE_S3_ACCESS_KEY'), aws_secret_access_key=os.environ.get('SNAKEMAKE_STORAGE_S3_SECRET_KEY')) ;my_bucket = s3.Bucket('snakemake-bucket') ; [ my_bucket.download_file(d.key,d.key) for d in my_bucket.objects.all()];print(os.listdir())"
We added --use-conda and --software-deployment-method conda, no change.
The environment variables passed to the containers are the same. Only some AWS_* and ECS_* related variables are added.
A volume mount to .snakemake hasn't changed the outcome.
Kernel Versions: AWS: 5.10.219-208.866.amzn2.x86_64 Local: 5.15.0-97-generic
Changing the Snakefile to use Storage Support Within Workflow has no effect
The Job/task runs the pipeline successfully with local input/output files.
Snakemake Docker tag: snakemake/snakemake:v8.15.2

Solution

It seems in AWS Fargate sets some environment variables including AWS_CONTAINER_CREDENTIALS_RELATIVE_URI on which boto3 decide that it needs AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID in addition to SNAKEMAKE_STORAGE_S3_SECRET_KEY and SNAKEMAKE_STORAGE_S3_ACCESS_KEY. if you want to run Snakemake in AWS Fargate you have to set all 4 variables, or you have to unset AWS_CONTAINER_CREDENTIALS_RELATIVE_URI in your docker entrypoint.sh.