Search code examples
javaapache-sparksalesforceaws-glue

SpringML-Salesforce, cannot create xmlstreamreader from org.codehaus.stax2.io.Stax2


I'm using https://github.com/springml/spark-salesforce to query against a salesforce api. It works fine for standard queries, but when I add the bulk options they've listed it hits the error I've listed below. Let me know if I'm making any basic mistakes, based on their documentation I believe this is the correct approach

Trying to use a bulk query against our API. Using the below SOQL statement

val account_soql = "select industry from account" 

I get the following error when the bulk flag is attached and the object is set to account

Exception in User Class: java.lang.UnsupportedOperationException : Cannot create XMLStreamReader or XMLEventReader from a org.codehaus.stax2.io.Stax2ByteArraySource

I've tried both of the below as source queries and see the same issue

val account_data = sparkSession.read.format("com.springml.spark.salesforce").option("soql",account_soql).option("username", "username").option("password","password").option("sfObject","account").option("bulk","true").load()


val account_data = sparkSession.read.format("com.springml.spark.salesforce").option("soql",account_soql).option("username", "username").option("password","password").option("multiLine","true").option("sfObject","account").option("inferSchema","true").option("bulk","true").option("version","latest-version").load()

I am using the following api versions

force-partner-api-40.0.0.jar
force-wsc-40.0.0.jar
salesforce-wave-api-1.0.9.jar
spark-salesforce_2.11-1.1.1.jar

These are sourced from this article

https://aws.amazon.com/blogs/big-data/extracting-salesforce-com-data-using-aws-glue-and-analyzing-with-amazon-athena/

I did try updating to the latest version of spark-salesforce (feb 2021) and got the following error

Command failed with exit code 1 - INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V)

Let me know if I can provide any other detail to assist


Solution

  • This is a problem with stax2 librery add woodstox-core-asl-4.4.1.jar file in dependet jars in glue job configurarion and it will sove this error.