Search code examples
pythonsoapaws-glue

SOAP Response XML Error in AWS Glue job (Python)


I am very new to AWS Glue. I have coded the following script in Glue which sends a SOAP request to a website and it's response is stored in S3. Even though the job is running successfully, the xml response which is being received (and saved on s3 object) is throwing error. However, the same program is running perfectly from PyCharm. The glue script is given below also.

XML response (Error):

<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<soap:Fault>
<soap:Code>
<soap:Value>soap:Receiver</soap:Value>
</soap:Code>
<soap:Reason>
<soap:Text xml:lang="en">Server was unable to process request. ---> Unexpected XML declaration. The XML declaration must be the first node in the document, and no white space characters are allowed to appear before it. Line 2, position 10.</soap:Text>
</soap:Reason>
<soap:Detail/>
</soap:Fault>
</soap:Body>
</soap:Envelope>

The glue job is as follows:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import requests
import boto3

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

print("Imported Libraries")

url = "https://www.w3schools.com/xml/tempconvert.asmx"

data ="""
       <?xml version="1.0" encoding="utf-8"?>
        <soap12:Envelope
                xmlns:xsi="http://w3.org/2002/XMLSchema-instance"
                xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                xmlns:soap12="http://schemas.xmlsoap.org/soap/envelope/">
        <soap12:Body>
          <CelsiusToFahrenheit xmlns="https://www.w3schools.com/xml/">
          <Celsius>20</Celsius>
          </CelsiusToFahrenheit>
        </soap12:Body>
       </soap12:Envelope>"""
headers = {
    'Content-Type': 'text/xml; charset=utf-8'
}
response = requests.request("POST", url, headers=headers, data=data)

var = response.text
print(f"Response: {var}")

client = boto3.client('s3')
client.put_object(Body=var, Bucket='my-bucket', Key='data/soap_inbound.xml')

print("S3 object created")

job.commit()

Can anyone please help to fix the error.


Solution

  • Based on the error message in the XML response, it seems that the issue lies in the XML declaration in the SOAP request. The error message states:

    Unexpected XML declaration. The XML declaration must be the first node in the document, and no white space characters are allowed to appear before it. Line 2, position 10.

    This implies that there might be some white space characters before the XML declaration, which is not allowed. Please update your data variable as follows, ensuring there are no white spaces before the XML declaration:

    data ="""<?xml version="1.0" encoding="utf-8"?>
        <soap12:Envelope
                xmlns:xsi="http://w3.org/2001/XMLSchema-instance"
                xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                xmlns:soap12="http://www.w3.org/2003/05/soap-envelope">
        <soap12:Body>
          <CelsiusToFahrenheit xmlns="https://www.w3schools.com/xml/">
          <Celsius>20</Celsius>
          </CelsiusToFahrenheit>
        </soap12:Body>
       </soap12:Envelope>"""
    

    Additionally, I noticed that you have used the wrong namespace for the xsi attribute in the soap12:Envelope element. You should replace http://w3.org/2002/XMLSchema-instance with http://www.w3.org/2001/XMLSchema-instance.

    After making these changes, your Glue job should work as expected, and the error in the XML response should be resolved.