I want to get all text contents from an XML file matching some selector.
I chose to use XPath selector because I already have xmllint installed on my Mac (but it is older than version 20909 which apparently has the behaviour I want by default)
$ xmllint --version
xmllint: using libxml version 20904
compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude ICU ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug Zlib
Here is my xml
<?xml version="1.0" encoding="utf-8"?>
<xml>
<foo bar="baz">Lorem</foo>
<foo bar="baz">Ipsum</foo>
<foo bar="baz">Dolor</foo>
<foo bar="qux">Sit</foo>
<foo bar="baz">Amet</foo>
</xml>
I want to get each text content of foo elements that have a certain attribute value
$ xmllint --xpath '//foo[@bar="baz"]/text()' my.xml
LoremIpsumDolorAmet
The output is not newline-delimited, nor does it seem to be NUL-delimited:
$ xmllint --xpath '//foo[@bar="baz"]//text()' my.xml | od -A n -t x1
4c 6f 72 65 6d 49 70 73 75 6d 44 6f 6c 6f 72 41
6d 65 74
How can I present the output such that matches are separated from each other by a newline, using macOS?
It can be done with xpath --shell
as follows.
If XML file is not too big, it can be optimized to load it in memory.
cnt=$(xmllint --xpath 'count(//foo[@bar="baz"])' test.xml)
(for i in $(seq 1 $cnt); do echo "cat //foo[@bar='baz'][$i]/text()"; done) | xmllint --shell test.xml | grep -Ev '\/ [<>]( cat)?| -------'
Result:
Lorem
Ipsum
Dolor
Amet
Without the grep
at the end it produces
/ > cat //foo[@bar='baz'][1]/text()
-------
Lorem
/ > cat //foo[@bar='baz'][2]/text()
-------
Ipsum
/ > cat //foo[@bar='baz'][3]/text()
-------
Dolor
/ > cat //foo[@bar='baz'][4]/text()
-------
Amet
/ >
A different version worth adding to the answer
cnt=4; (for i in $(seq 1 $cnt); do echo "cd //foo[@bar='baz'][$i]/text()"; echo "cat"; done) | xmllint --shell test.xml | grep -Ev ' > (cat|cd)?'
Without the grep
/ > cd //foo[@bar='baz'][1]/text()
text > cat
Lorem
text > cd //foo[@bar='baz'][2]/text()
text > cat
Ipsum
text > cd //foo[@bar='baz'][3]/text()
text > cat
Dolor
text > cd //foo[@bar='baz'][4]/text()
text > cat
Amet
text >