I have only been able to get result nodes from my DOM with XPath, which feels incorrect.
I am attempting to show a fragment of an XML Document (TEI/XML) on my HTML page. I have the URL of an XML Document and an XPath selector of the fragment. I thought I could fetch()
the document and extract the piece I wanted like so:
// Real values, for one case,
// t.source = "https://centerfordigitalhumanities.github.io/Dunbar-books/The-Complete-Poems-TEI.xml"
// t.selector.value = "//div[@type='poem'][8]"
const sampleSource = await fetch(t.source)
.then(res => res.text())
.then(docStr => (new DOMParser()).parseFromString(docStr, "application/xml"))
const poemText = sampleSource.evaluate(t.selector?.value, sampleSource, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null)
textSample.innerHTML = poemText.snapshotItem(0).innerHTML
Trying several different ways (changing contextNode
, using XPathSelector.evaluate()
instead of XMLDoc.evaluate()
, and changing XPathResult
) the result was always empty.
In frustration, I tried simpler and simpler selectors and discovered that evaluate()
was only traversing my current HTML document
despite making no references to it.
It "works" to dump the XML doc into a hidden element on the page.
const sampleSource = await fetch(t.source)
.then(res => res.text())
.then(docStr => hiddenElem.innerHTML = docStr)
const poemText = document.evaluate(t.selector?.value, hiddenElem, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null)
textSample.innerHTML = poemText.snapshotItem(0).innerHTML
evaluate()
only traverses document
?Well, it is a TEI document so its elements are in the namespace http://www.tei-c.org/ns/1.0
, don't expect to use XPath 1 against an XML DOM document and a selector like div
to select elements in any namespace, it exactly selects div
elements in no namespace. To select elements in a namespace with XPath 1.0, you need to use the third argument of evaluate
and bind a prefix you can choose (like tei
) to that namespace and use e.g. //tei:div[@type='poem'][8]
:
<script type=module>
const sampleSource = await fetch('https://centerfordigitalhumanities.github.io/Dunbar-books/The-Complete-Poems-TEI.xml')
.then(res => res.text())
.then(docStr => (new DOMParser()).parseFromString(docStr, "application/xml"));
const poemText = sampleSource.evaluate(`//tei:div[@type='poem'][8]`, sampleSource, prefix => prefix === 'tei' ? 'http://www.tei-c.org/ns/1.0' : null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
console.log(poemText.snapshotItem(0).textContent);
</script>
With XPath 2 or 3, like Saxon-JS 2 for instance supports, you can bind a default element namespace and use an unqualified named like div
to select elements in that namespace.
<script src=https://www.saxonica.com/saxon-js/documentation/SaxonJS/SaxonJS2.rt.js></script>
<script type=module>
const sampleSource = await SaxonJS.getResource({ location : 'https://centerfordigitalhumanities.github.io/Dunbar-books/The-Complete-Poems-TEI.xml', type : 'xml' });
const poemText = SaxonJS.XPath.evaluate(`//div[@type='poem'][8]`, sampleSource, { xpathDefaultNamespace : 'http://www.tei-c.org/ns/1.0' });
console.log(poemText.textContent);
</script>
There is no way in XPath 1.0, unless the DOM environment allows you to build a namespace less DOM (like Java with a non-namespace aware DocumentBuilder). But inside of a browser that is not the case, as far as I know.