Can I run a normal python code using regular ML libraries (e.g., Tensorflow or sci-kit learn) in a Spark cluster? If yes, can spark distribute my data and computation across the cluster? if no, why?
Spark use RDD(Resilient distributed dataset) to distribute work among workers or slaves , I dont think you can use your existing code in python without dramatically adapting the code to spark specification , for tensorflow there are many options to distribute computing over multiple gpus.