mongodbapache-spark

Spark Mongo Connector Writer OBJECTORARRAYONLY


I am using the latest version of the Spark Mongo Connector (10.2.1) and not sure why i am getting the following Error.

    Caused by: com.mongodb.spark.sql.connector.exceptions.ConfigException: 'objectOrArrayOnly' is not a valid Convert Json Type
    at com.mongodb.spark.sql.connector.config.WriteConfig$ConvertJson.fromString(WriteConfig.java:96)
    at com.mongodb.spark.sql.connector.config.WriteConfig.convertJson(WriteConfig.java:298)
    at com.mongodb.spark.sql.connector.write.MongoDataWriter.<init>(MongoDataWriter.java:74)
    at com.mongodb.spark.sql.connector.write.MongoDataWriterFactory.createWriter(MongoDataWriterFactory.java:53)
    at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:408)
    at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:360)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:131)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.IllegalArgumentException: No enum constant com.mongodb.spark.sql.connector.config.WriteConfig.ConvertJson.OBJECTORARRAYONLY
    at java.lang.Enum.valueOf(Enum.java:238)
    at com.mongodb.spark.sql.connector.config.WriteConfig$ConvertJson.valueOf(WriteConfig.java:73)
    at com.mongodb.spark.sql.connector.config.WriteConfig$ConvertJson.fromString(WriteConfig.java:94)

I am passing the jar with --jars option in my spark-submit.

Looks like it is not using this version. As in 10.2.1 version i see the Enum has "objectOrArrayOnly"

public static enum ConvertJson {
        FALSE("false"),
        ANY("any"),
        OBJECT_OR_ARRAY_ONLY("objectOrArrayOnly");

        private final String value;
        private static final String TRUE = "true";

        private ConvertJson(String operationType) {
            this.value = operationType;
        }

        static ConvertJson fromString(String jsonType) {
            if (jsonType.equalsIgnoreCase("true")) {
                WriteConfig.LOGGER.warn("{}: '{}' is deprecated. Use: '{}' instead.", new Object[]{"convertJson", "true", ANY});
                return ANY;
            } else {
                try {
                    return valueOf(jsonType.toUpperCase(Locale.ROOT));
                } catch (IllegalArgumentException var2) {
                    throw new ConfigException(String.format("'%s' is not a valid Convert Json Type", jsonType), var2);
                }
            }
        }

        public String toString() {
            return this.value;
        }
    }

Looks like it is not supported yet..

Thanks Sateesh


Solution

  • You need to set the option to "OBJECT_OR_ARRAY_ONLY"

    dataset.writeStream()
           .format("mongodb")
           .option("spark.mongodb.read.outputExtendedJson", "OBJECT_OR_ARRAY_ONLY")
           .start