I've got a bunch of XML messages in a PostgreSQL 9.1.3 table, with a column content
of type XML). They're not all the same "type", so I'm trying to extract the root type using a query like this:
SELECT xpath('name(/*)', content) FROM message;
as recommended by this answer to a similar SO question.
A sample message is:
<?xml version="1.0" encoding="UTF-8"?>
<WML version="6" xmlns="http://example.com/schemas/WML">...</WML>
For which case I'd hope to get the result '{WML}'. Unfortunately it just returns an empty array. Adding the namespaces parameter to xpath
, or removing the namespace from the message, does not help.
A discussion on the PostgreSQL mailing lists seems to explain it as a bug in XPath handling in PostgreSQL. However that was in 2008, and a look at the PostgreSQL source shows that piece of code was changed in 2009. I'm not a PostgreSQL developer so I'm not confident that the bug is or is not a factor in my case.
But I'm wondering if there's a workaround, such as an alternative XPath expression that might work? I'd prefer not to have to resort to regular expressions to parse XML, though if you can suggest a short, punchy, robust RE then it would be better than nothing.
I don't know of a workaround for older versions, but this is fixed in PostgreSQL 9.2, so that's great.
(The likeliest workaround would likely to have been to write a function to parse the XML manually, but I'm glad I don't have to resort to that now!)