Search code examples
scalaapache-sparkspark-streaming

Broadcasting TypeSafe Config throws exception User class threw exception: java.io.UTFDataFormatException: encoded string too long: 70601 bytes?


Just as the question title says I am trying to broadcast a TypeSafe config to the executors so my code there can access the config. Unfortunately, I'm getting an exception

object AppConfigUtility {

  var config: Config = ConfigFactory.empty()

  var brConfig: Broadcast[Config] = _

  /**
    * Broadcast the config so it can be available for executors to use
    * @param sc
    */
  def broadCastConfig(sc: SparkContext): Unit = {

    brConfig = sc.broadcast(config)

  }

  def loadConfig(): Unit = {
    //some actual implementation of loading my application.conf file
  }
}

When I call broadCastConfig in my main method it throws the below exception

User class threw an exception: java.io.UTFDataFormatException: encoded string too long: 70601 bytes

The final size of my application.conf is only 3KB or 3000 Bytes which is not anywhere near the 64KB limit so I'm not why I'm hitting this error.


Solution

  • seems like you have long strings in application.conf which is causing the issue..

    with this :64KB String limit in Java data streams example you could prove that.

    public static void main(String[] args) throws Exception {
        // generate string longer than 64KB
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < 10000; i++)
            sb.append("1234567890");
        String s = sb.toString();
    
        // write the string into the stream
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        DataOutputStream dos = new DataOutputStream(baos);
        dos.writeUTF(s);
        dos.close();
    }
    

    similar but not same issue faced was fixed here

    I'd suggest parse the application.conf and set it to an object and send it to all executors using broadcast.

    Means.. you can send any serializable object to broadcast... but is in the form of application.conf