The following ran successfully on a Cloudera CDSW cluster gateway.
import pyspark
from pyspark.sql import SparkSession
spark = (SparkSession
Which produces this output.
Ivy Default Cache set to: /home/cdsw/.ivy2/cache
The jars for the packages stored in: /home/cdsw/.ivy2/jars
:: loading settings :: url = jar:file:/opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354/lib/spark2/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
JohnSnowLabs#spark-nlp added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found JohnSnowLabs#spark-nlp;1.2.3 in spark-packages
found com.typesafe#config;1.3.0 in central
found org.fusesource.leveldbjni#leveldbjni-all;1.8 in central
downloading ...
[SUCCESSFUL ] JohnSnowLabs#spark-nlp;1.2.3!spark-nlp.jar (3357ms)
downloading ...
[SUCCESSFUL ] com.typesafe#config;1.3.0!config.jar(bundle) (348ms)
downloading ...
[SUCCESSFUL ] org.fusesource.leveldbjni#leveldbjni-all;1.8!leveldbjni-all.jar(bundle) (382ms)
:: resolution report :: resolve 3836ms :: artifacts dl 4095ms
:: modules in use:
JohnSnowLabs#spark-nlp;1.2.3 from spark-packages in [default]
com.typesafe#config;1.3.0 from central in [default]
org.fusesource.leveldbjni#leveldbjni-all;1.8 from central in [default]
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
| default | 3 | 3 | 3 | 0 || 3 | 3 |
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
3 artifacts copied, 0 already retrieved (5740kB/37ms)
Setting default log level to "ERROR".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
But when I try to import sparknlp as described on John Snow Labs for pyspark...
import sparknlp
# or
from sparknlp.annotator import *
I get this:
ImportError: No module named sparknlp
ImportError: No module named sparknlp.annotator
What do I need to do to use sparknlp? Of course this could be generalized for any Spark package.
I figured it out. The jar files that were correctly loaded were only the compiled Scala files. I still had to put the Python files that contained the wrapper code in a location that I could import from. Once I did that, everything worked great.