Search code examples
pythonapache-sparkdistributed-computing

Using regular python code on a Spark cluster


Can I run a normal python code using regular ML libraries (e.g., Tensorflow or sci-kit learn) in a Spark cluster? If yes, can spark distribute my data and computation across the cluster? if no, why?


Solution

  • Spark use RDD(Resilient distributed dataset) to distribute work among workers or slaves , I dont think you can use your existing code in python without dramatically adapting the code to spark specification , for tensorflow there are many options to distribute computing over multiple gpus.