I want to extract a XML value for a matching tag stored as varchar(max) in MS Azure Synapse

I am new Azure was using SAS before now we are moving to azure synapse In current environment I want to extract a XML tag value stored in column C (varcharmax) as variable. [Dataset][1] [1]: https://i.sstatic.net/tbSIF.png Below XML is saved in the column C (PKDATA)

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:DataSet xmlns:ns2="http://www.test.com/t/cn/el">
    <EnumObject>
        <name>Inpatient</name>
        <value>262784067</value>
        <radiobutton>false</radiobutton>
    </EnumObject>
    <StringObject>
        <name>xxx</name>
        <prompt></prompt>
        <value>/widget.jsp</value>
        <width>99</width>
    </StringObject>
</ns2:DataSet>

If name is Inpatient then 262784067 as Inpatient type

output

| A | B | Inpatient type | | 11212 | 2587140 | 262784067 |

I used the following code select a,b, pkdata.value('/EnumObject/name') as Inpatient type from dbo.extdata

i get the following error Cannot find either column "pkkddata" or the user-defined function or aggregate "pkdata.value", or the name is ambiguous.

I tried using following query but gives me the error Msg 104220, Level 16, State 1, Line 26 Cannot find data type 'xml'. SELECT a,b,(pkdata).value('(/EnumObject/name/text())[1]', 'varchar(100)') FROM [dbo].extdata CROSS APPLY (SELECT CAST(pkdata AS xml)) AS x(pkdata)

i get the following error when I use the below code The XMLDT method 'nodes' can only be invoked on columns of type xml. i tried to use the following but get incorrect syntax near passing select x.* from [dbo].[EXTDATA] rt cross join xmltable( '/EnumObject/name' passing xmltype(rt.pkdata) columns name number path 'name/@value' ) x

Not sure how to proceed

Azure SQL version Microsoft Azure SQL Data Warehouse - 10.0.16003.0 Apr 28 2021 04:55:16 Copyright (c) Microsoft Corporation

Solution

Azure Synapse Analytics, specifically dedicated SQL pools does not support the XML datatype or any of the functions accompanying it including FOR XML, .nodes, .value, .query, .modify etc.

If you need this type of processing, you can either use traditional SQL Server, eg SQL Server 2019 or Azure SQL DB. One option would be to use Synapse Pipelines to move data there. As an alternative you could look at using Synapse Notebooks and some custom Python / Scala / c# code, but I have only done a simple test on this.

Simple example in Scala:

Cell 1

// Get the table with the XML column from the database and expose as temp view
val df = spark.read.synapsesql("yourPool.dbo.someXMLTable")

df.createOrReplaceTempView("someXMLTable")

Cell 2

%%sql
-- Use SparkSQL to interrogate the XML
-- https://spark.apache.org/docs/2.3.0/api/sql/index.html#xpath
SELECT
    colA,
    colB,
    xpath_string(pkData,'/DataSet/EnumObject[name="Inpatient"]/value') xvalue
FROM someXMLTable

Cell 3

val df2 = spark.sql(""" 
SELECT
    colA,
    colB,
    xpath_string(pkData,'/DataSet/EnumObject[name="Inpatient"]/value') xvalue
FROM someXMLTable
""")

df2.show

Cell 4

// Write that dataframe back to the dedicated SQL pool
df2.write.synapsesql("yourPool.dbo.someXMLTable_processed", Constants.INTERNAL)

Screenprint from sample notebook:

XML is kind of old-fashioned these days - have you thought about switching to JSON? Also if your data volumes aren't that big it would be a lot cheaper to just Azure SQL DB rather than Synapse.