I try to extract data from a xml file (which I named output.xml) on the command line (and then, if I manage to do it, put it in a script).
I've seen that the better tool to do that is XMLStarlet. However xmlstarlet sel -t -m "/entry/content" output.xml
doesn't work.
Note: I tried for xmlstarlet el output.xml
to check the Xpath structure of the file and it works.That means that the tool sees the elements.
I saw that there are 2 conditions to make XMLStarlet work:
1- The XML file should be well-formed. Stackoverflow related link
So I applied this command to create a well-formed file:
xmlstarlet fo -R output.xml >> good-output.xml
2- XML is very picky about the default namespace. If the document has it, declare it before selecting the elements or delete all the occurences of "xmlns" in the document. Stackoverflow related link
So I did:
$ cat good-output.xml | sed -e 's/ xmlns.*=".*"//g' >> very-good-output.xml
HOWEVER, even performing these two steps, I have another error, and don't know how to fix it... The terminal points to me the places I removed the namespaces and says "Namespace prefix app on collection is not defined". What I should do? With the namespaces it doesn't work and now it urges to put them again upon me...
Any help?
So this is the final solution to retrieve the content of a XML file with multiple namespaces:
xmlstarlet sel -t -m "//_:content" -c . good-output.xml
npostavs thank you for guiding me.
I believed the fact that my first attempt gave me the tag besides the desired content was a problem, but actually in my case no. If it is the case for someone else, this is how to proceed:
xmlstarlet sel -t -m "/_:entry/_:content/text()" -c . output.xml
OR
xmlstarlet sel -t -m "/_:entry/_:content" -v . output.xml
Simplified:
xmlstarlet sel -t -v "/_:entry/_:content" output.xml