I'm trying to submit my pyspark code through cron job. When I run manually, its working fine. Through cron its not working.
Here is the project structure I have:
my-project
|
|--src
|----jobs
|------execute_metrics.py
|----utils
|------get_spark_session.py
The main code lies in execute_metrics.py
from src/jobs
. I'm using get_spark_session.py
in execute_metrics.py
using from src.utils import get_spark_session
.
I created a shell script execute_metric.sh
with below content for executing the cron job
#!/bin/bash
PATH=<included entire path here>
spark-submit <included required options> src/jobs/execute_metrics.py
my-project
|
|--src
|----jobs
|------execute_metrics.py
|----utils
|------get_spark_session.py
|--execute_metric.sh
When I run this shell script using ./execute_metric.sh
, I'm able to see the results.
Now, I need this to run the job every minute. So, I created a cron file with below content and copied in the same directory
* * * * * ./execute_metric.sh > execute_metric_log.log
my-project
|
|--src
|----jobs
|------execute_metrics.py
|----utils
|------get_spark_session.py
|--execute_metric.sh
|--execute_cron.crontab
This cron is running for every minute, but giving me the error:
ModuleNotFoundError: No module named 'src'
Can someone please tell me what went wrong here?
Thanks in advance
I got it fixed by adding a main.py
file in the project directory and changed my cron to execute main.py
. The project structure now looks like:
my-project
|
|--src
|----jobs
|------execute_metrics.py
|----utils
|------get_spark_session.py
|--execute_metric.sh
|--execute_cron.crontab
|--main.py
In main.py
, I'm invoking the functions of execute_metrics.py
.