Search code examples
scalaapache-sparkrdd

spark.debug.maxToStringFields doesn't work


I tried setting "spark.debug.maxToStringFields" as described in the message WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.. Please find the code below

val sparkConf= new SparkConf()
//here
sparkConf.set("spark.debug.maxToStringFields", "100000")
sparkConf.set("spark.sql.debug.maxToStringFields", "100000")
val spark = SparkSession.builder.config(sparkConf).getOrCreate() 
//here
spark.conf.set("spark.debug.maxToStringFields", 100000)
spark.conf.set("spark.sql.debug.maxToStringFields", 100000)

val data = spark.read
        .option("header", "true")
        .option("delimiter", "|")
        .format("csv")
        .csv(path_to_csv_file)
        .toDF()
        .repartition(col("country"))

data.rdd.toDebugString

I only get the partial output of toDebugString with the above warning message. As you can see I have tried both options. Why is it not printing full RDD Lineage?


Solution

  • Can you check here :

    https://www.programcreek.com/scala/org.apache.spark.SparkEnv

    I think you have to set the value like

    val sparkenv = SparkEnv.get sparkenv.conf.set("spark.oap.cache.strategy", "not_support_cache")