I've got an XML file with a lot of media fields. A piece of example XML is:
<root>
<item>
<name>Item 1</name>
<mediaList>
<media>
<name>Name 1</name>
<URL><![CDATA[http://example.com/image1.jpg]]></URL>
</media>
<media>
<name>Name 2</name>
<URL><![CDATA[http://example.com/image2.jpg]]></URL>
</media>
</mediaList>
</item>
<item>
<name>Item 2</name>
<mediaList>
<media>
<name>Name 3</name>
<URL><![CDATA[http://example.com/image3.jpg]]></URL>
</media>
<media>
<name>Name 4</name>
<URL><![CDATA[http://example.com/image4.jpg]]></URL>
</media>
</mediaList>
</item>
</root>
All items are built in the same way. Using XMLLint with XPath, I'm trying to get a list of all URLs. However, so far, I haven't found the best way to go about it yet. Some of the ways I've tried it are:
xmllint --xpath "string(/root/item/mediaList/URL)" file.xml >> log.txt
This one returns a nice URL, but stops after the first item (giving me only 1 image)
xmllint --xpath "/root/item/mediaList/URL" file.xml >> log.txt
This gives me all items, but everything is on the same line, and is shown as <URL><![CDATA[http://example.com/image.jpg]]></URL>
for each item.
xmllint --xpath "/root/item/mediaList/URL/text()" file.xml >> log.txt
This comes closest, but still returns the <![CDATA[]]>
tags around it, and again all in one line.
I've also tried looping through the items, but this was very slow, and didn't work as it should.
The result I'm aiming for is a txt file with all images below eachother, like so:
http://example.com/image1.jpg
http://example.com/image2.jpg
http://example.com/image3.jpg
http://example.com/image4.jpg
The xmllint
doesn't support the string(...)
for multiple XPath matches. (Therefore it shows only the 1st result).
You can use xmlstarlet
like:
xmlstarlet sel -T -t -m /root/item/mediaList/media/URL -v . -n file.xml
and it produces
http://example.com/image1.jpg
http://example.com/image2.jpg
http://example.com/image3.jpg
http://example.com/image4.jpg
or also perl (with the installed XML::LibXML module) as:
perl -MXML::LibXML -E 'say $_->to_literal for XML::LibXML->load_xml(location=>q{file.xml})->findnodes(q{/root/item/mediaList/media/URL})'
also produces same result as above.