When developing python code, I make use of the package ipdb
.
This halts the execution of the python code there, where I have inserted ipdb.set_trace()
, and presents me with a python interpreter command line.
However, in the python code that I develop for pyspark, and which I send off using spark-submit
, the ipdb
package does not work.
So my question is: is there a way, in which I can debug my pyspark code in a manner similar to using the ipdb
package?
Note: Obviously, for python code executed on remote nodes, this would not be possible. But when using spark-submit
with option --master local[1]
I have hopes that it might be possible.
PS. There is a related question, but with a narrower scope, here: How to PySpark Codes in Debug Jupyter Notebook
PYSPARK_DRIVER_PYTHON=ipython pyspark
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.3
/_/
Using Python version 3.7.1 (default, Jun 16 2019 23:56:28)
SparkSession available as 'spark'.
In [1]: sc.stop()
In [2]: run -d main.py
Breakpoint 1 at /Users/andrii/work/demo/main.py:1
NOTE: Enter 'c' at the ipdb> prompt to continue execution.
> /Users/andrii/work/demo/main.py(1)<module>()
1---> 1 print(123)
2 import ipdb;ipdb.set_trace()
3 a = 2
4 b = 3
or
In [3]: run main.py
123
> /Users/andrii/work/demo/main.py(3)<module>()
2 import ipdb;ipdb.set_trace()
----> 3 a = 2
4 b = 3