I have a class in java that builds some sophisticated Spark DataFrame.
package companyX;
class DFBuilder {
public DataFrame build() {
...
return dataframe;
}
}
I add this class to the pyspark/jupiter classpath so its callable by py4j. Now when I call it I get strange type:
b = sc._jvm.companyX.DFBuilder()
print(type(b.build()))
#prints: py4j.java_gateway.JavaObject
VS
print(type(sc.parallelize([]).toDF()))
#prints: pyspark.sql.dataframe.DataFrame
Is there a way to convert this JavaObject into proper pyspark dataframe? One of the problems I have is that when I want to call df.show() on a DataFrame build in java is that it gets printed in spark logs, and not in notebook cell.
You can use DataFrame
initializer:
from pyspark.sql import DataFrame, SparkSession
spark = SparkSession.builder.getOrCreate()
DataFrame(b.build(), spark)
If you use outdated Spark version replace SparkSession
instance with SQLContext
.
Refference Zeppelin: Scala Dataframe to python