Search code examples
pysparkpippypi

How can we find all extra dependencies for PySpark when deploying via pip?


I am trying to deploy PySpark locally using the instructions at

https://spark.apache.org/docs/latest/api/python/getting_started/install.html#using-pypi

I can see that extra dependencies are available, such as sql and pandas_on_spark that can be deployed with

pip install pyspark[sql,pandas_on_spark]

But how can we find all available extras?

Looking in the json of the pyspark package (based on https://wiki.python.org/moin/PyPIJSON)

https://pypi.org/pypi/pyspark/json

I could not find the possible extra dependencies (as described in What is 'extra' in pypi dependency?); the value for requires_dist is null.

Many thanks for your help.


Solution

  • As far as I know, you can not easily get the list of extras. If this list is not clearly documented, then you will have to look at the code/config for the packaging. In this case, here which gives the following list: ml, mllib, sql, and pandas_on_spark.