We have a Dockerized Snakemake pipeline with the input data stored on a S3 bucket snakemake-bucket
:
Snakefile:
rule bwa_map:
input:
"data/genome.fa"
output:
"results/mapped/A.bam"
shell:
"cat {input} > {output}"
Dockerfile
FROM snakemake/snakemake:v8.15.2
RUN mamba install -c conda-forge -c bioconda snakemake-storage-plugin-s3
WORKDIR /app
COPY ./workflow ./workflow
ENV PYTHONWARNINGS="ignore:Unverified HTTPS request"
CMD ["snakemake","--default-storage-provider","s3","--default-storage-prefix","s3://snakemake-bucket","results/mapped/A.bam","--cores","1","--verbose","--printshellcmds"]
When we run the container with the following command, it downloads the input file, runs the pipeline and stores the output on the bucket successfully:
docker run -it -e SNAKEMAKE_STORAGE_S3_ACCESS_KEY=**** -e SNAKEMAKE_STORAGE_S3_SECRET_KEY=**** our-snakemake:v0.0.10
However, when we deploy it as an AWS Batch Job or AWS Fargate Task, it gives the following error immediately:
Assuming unrestricted shared filesystem usage.
Building DAG of jobs...
Full Traceback (most recent call last):
File "/opt/conda/envs/snakemake/lib/python3.12/site-packages/snakemake/cli.py", line 2103, in args_to_api
dag_api.execute_workflow(
File "/opt/conda/envs/snakemake/lib/python3.12/site-packages/snakemake/api.py", line 594, in execute_workflow
workflow.execute(
File "/opt/conda/envs/snakemake/lib/python3.12/site-packages/snakemake/workflow.py", line 1081, in execute
self._build_dag()
File "/opt/conda/envs/snakemake/lib/python3.12/site-packages/snakemake/workflow.py", line 1037, in _build_dag
async_run(self.dag.init())
File "/opt/conda/envs/snakemake/lib/python3.12/site-packages/snakemake/common/__init__.py", line 94, in async_run
return asyncio.run(coroutine)
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/snakemake/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/opt/conda/envs/snakemake/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/snakemake/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/opt/conda/envs/snakemake/lib/python3.12/site-packages/snakemake/dag.py", line 183, in init
job = await self.update(
^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/snakemake/lib/python3.12/site-packages/snakemake/dag.py", line 1013, in update
raise exceptions[0]
File "/opt/conda/envs/snakemake/lib/python3.12/site-packages/snakemake/dag.py", line 970, in update
await self.update_(
File "/opt/conda/envs/snakemake/lib/python3.12/site-packages/snakemake/dag.py", line 1137, in update_
raise MissingInputException(job, missing_input)
snakemake.exceptions.MissingInputException: Missing input files for rule bwa_map:
output: results/mapped/A.bam
wildcards: sample=A
affected files:
s3://snakemake-bucket/data/genome.fa (storage)
MissingInputException in rule bwa_map in file /app/workflow/Snakefile, line 10:
Missing input files for rule bwa_map:
output: results/mapped/A.bam
wildcards: sample=A
affected files:
s3://snakemake-bucket/data/genome.fa (storage)
Any ideas would be appreciated
/opt/conda/envs/snakemake/bin/python -c "import os ;import boto3 ;s3 = boto3.resource('s3',aws_access_key_id=os.environ.get('SNAKEMAKE_STORAGE_S3_ACCESS_KEY'), aws_secret_access_key=os.environ.get('SNAKEMAKE_STORAGE_S3_SECRET_KEY')) ;my_bucket = s3.Bucket('snakemake-bucket') ; [ my_bucket.download_file(d.key,d.key) for d in my_bucket.objects.all()];print(os.listdir())"
AWS_*
and ECS_*
related variables are added..snakemake
hasn't changed the outcome.snakemake/snakemake:v8.15.2
It seems in AWS Fargate sets some environment variables including AWS_CONTAINER_CREDENTIALS_RELATIVE_URI on which boto3 decide that it needs AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID in addition to SNAKEMAKE_STORAGE_S3_SECRET_KEY and SNAKEMAKE_STORAGE_S3_ACCESS_KEY. if you want to run Snakemake in AWS Fargate you have to set all 4 variables, or you have to unset AWS_CONTAINER_CREDENTIALS_RELATIVE_URI in your docker entrypoint.sh.