Search code examples
xmlazureazure-data-factoryazure-synapseazure-synapse-analytics

How to parse an XML embedded in a nvarchar column?


I have a problem which i'm currently working on and i can't wrap my head around it after reading the official documentation and seeing topics online, i'll try ot be as clear as possible:

I'm using Azure Synapse Analytics and i created a source which retrieves data from a SQL Server like so:

enter image description here

The mapping for my source to the sink is:

enter image description here

Now in the "request" column there's a string embeded in a XML format like so:

enter image description here

The problem that i'm trying to solve is that i need to create new columns for each node in the XML string, for example i need to create a new column named "MSP" with the value "U678202" taken from the "codiceMSP" node in the XML string, and so on. So i created a new Data Flow and configured the source and the Parse transformation like so:

enter image description here

I created a source block taking data from the one copied in my azure storage and from then i created a Parse transformation block to parse the embedded XML data in a string and create the new columns from it. The problem is that the XML is not in a conventional format and i just can't make the parse work. I tried to set up the namespaces but the value of "codiceMSP" will not parse into the new MSP column. What am i doing wrong? How can i configure the XML Parse trasnformation to read through the nodes inside the request string? There's not so much information online about this specific case, any help will be really appreciated.

Thank you in advance for any help

Best regards


Solution

  • As per your requirement you can use below approach:

    • First add source dataset where xml is stored in ADLS.
    • Add derived column transformation where you will remove namespaces from the XML string as below:
    replace(replace(coilumnname,'ns1:',''),'SOAP-ENV:','')
    

    enter image description here

    • Then add parse transformation to extract columns from the XML with below expression:
    (Envelope as (Body as (ConfermaPagamentoFattura as (codiceMSP as string,
    numerodocumento as string,
    tipodocumento as string,
    datadocumento as string
    societa as string,
    datapagamento as string,
    numerotransazione as string,
    importodocumento as double))))
    

    enter image description here

    • Then add derive column to create separate column for each entity using above column: enter image description here

    Final output :

    enter image description here