Search code examples
dockergoogle-cloud-platformgoogle-cloud-run

Running different ETL scripts with the same container on GCP Cloud Run


I have a set of ETL tasks that I'd like to run within Google Cloud Run jobs. There are five Python jobs I'd like to submit, namely:

  • all_dividends_history.py
  • all_ticker_types.py
  • all_tickers.py
  • all_tickers_detail.py
  • all_tickers_history.py

Using this Dockefile

FROM python:3.11
RUN apt-get update -y
#RUN apt-get install -y python-pip python-dev build-essential
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
#ENTRYPOINT ["python"]
CMD ["/bin/bash"]

I am successfully able to run on my local machine using the following. The critical thing to note is that my Dockerfile only issues a CMD for bash and I'm invoking the run command followed by python {name of job}. This allows me to use the same Dockerfile and container to execute each one of these tasks as independent jobs that can run in parallel. If possible I'd like to avoid building five separate containers.

docker run -v \
$GOOGLE_APPLICATION_CREDENTIALS:/tmp/keys/new-life-400922-cd595a9f5804.json:ro \
-e GOOGLE_APPLICATION_CREDENTIALS=/tmp/keys/new-life-400922-cd595a9f5804.json \
-e POLYGON_API_KEY=$POLYGON_API_KEY \
test python scrapers/polygon/all_ticker_types.py

I'm trying to port this over to Google Cloud run, and I from the GCP GUI I thought I'd be able to issue a command like python scrapers/polygon/all_ticker_types.py. This is not the case, however, and instead it complains that it has no clue what to do with /app/python scrapers/polygon/all_ticker_types.py

I noticed here https://cloud.google.com/run/docs/reference/rest/v1/Container that commands are not issued inside a shell, which makes me wonder if what I'm trying to do is possible within Cloudrun. Is it possible for me to share the same Dockerfile / container for multiple scripts and call them using python {name of job}? If so, can you help me understand what I'm doing wrong here, or what additional information would be needed to answer that? If it's not possible / advisable to do what I'm trying to do, would you please correct me and advise me as to a better approach for this problem?


Solution

  • In your question, I understand you run jobs. So, you are talking about Cloud Run, which handle HTTP request with an HTTP server, but it seems inaccurate.

    I suppose that you talk about Cloud Run jobs to be able to run your ETL jobs on a serverless platform but outside a HTTP request context.

    In that context, there is a brand new features on Cloud Run Jobs named "parameter override". It allows the possibility to change the arguments and the environments variables value for a specific execution.


    Based on the recommendation of Nilden to update your docker file, you can even do something simpler like that

    FROM python:3.11
    RUN apt-get update -y
    COPY . /app
    WORKDIR /app
    RUN pip install -r requirements.txt
    ENTRYPOINT ["python"]
    

    i.e. use Python as entrypoint.


    Then, in your Cloud Run Jobs parameter override execution, add the argument that you want to your "python" executable (the entrypoint), with that command

    gcloud beta run jobs execute <JOB_NAME> \
         --args all_ticker_types.py
    

    The result will run this whole command

    (The entrypoint)   (the argument(s))
    python             all_ticker_types.py