I am trying to use distinct in a query for a table that has XML values. Here is my query
select distinct pID,docXml from docTable where docXml is not null and pID not like '%xyz%'
but i get
pymssql.OperationalError: (421, 'The xml data type cannot be selected as DISTINCT because it is not comparable.DB-Lib error message 20018, severity 16:\nGeneral SQL Server error: Check messages from the SQL Server\n')
how to solve this? what does it mean?
the data in the docXml column looks like
<document>
<pages>
<page>
<paragraph>XBV</paragraph>
<paragraph>GFH</paragraph>
</page>
<page>
<paragraph>ash</paragraph>
<paragraph>lplp</paragraph>
</page>
</pages>
</document>
note that if i dont use distinct, then i am easily able to get all the data. but since there are duplicates, i am only interested in unique rows.
you could try
CONVERT(NVARCHAR(MAX), docXml)
in the select statement (and perhaps put the whole thing in a CTE then convert it back to XML from the CTE) e.g.
; WITH T AS (select distinct pID,CONVERT(NVARCHAR(MAX), docXml) docXml from docTable where docXml is not null and pID not like '%xyz%')
SELECT pID, CONVERT(XML, docXml) FROM T
alternatively, if your pID is distinct in itself (such that each pID relates to a single docXml), you could use a CTE and ROW_NUMBER() to return docXml in its XML format. e.g.
; WITH T AS (select pID
,docXml
,ROW_NUMBER() OVER (PARTITION BY pID ORDER BY pID) RN
from docTable
where docXml is not null
and pID not like '%xyz%')
SELECT pID, docXml FROM T WHERE RN = 1
EDIT: as for why it's doing it, SQL doesn't like sorting CLOBs