Search code examples
xmlxml-parsingxmllint

Extract tag contents based on value of another tag qualifier using xmllint


I'm trying to use xmllint to extract data from a tag if a condition exists on a previous tag. I know there are probably better tools but I'm limited to xmllint and/or system standard commands like sed, awk, etc.

xml file:

<?xml version="1.0" encoding="UTF-8"?>
<MainGroup>
<MainGroupEntry name="aaa" function="xxx">
    <EntryType type="AAA"/>
    <EntryDescription>Capture This A</EntryDescription>
    <EntryRandomList>Just,a,random,list,of,things,to,discard</EntryRandomList>
</MainGroupEntry>
<MainGroupEntry name="aaa" function="xxx">
    <EntryType type="AAA"/>
    <EntryDescription>Capture This A</EntryDescription>
    <EntryRandomList>Just,a,random,list,of,things,to,discard</EntryRandomList>
</MainGroupEntry>
<MainGroupEntry name="bbb" function="yyy">
    <EntryType type="BBB"/>
    <EntryDescription>Capture This B</EntryDescription>
    <EntryRandomList>Just,a,random,list,of,things,to,discard</EntryRandomList>
</MainGroupEntry>
<MainGroupEntry name="bbb" function="yyy">
    <EntryType type="BBB"/>
    <EntryDescription>Capture This B</EntryDescription>
    <EntryRandomList>Just,a,random,list,of,things,to,discard</EntryRandomList>
</MainGroupEntry>
</MainGroup>

What I'm "trying to do is; for every Entry type="AAA", print the accompanying EntryDescription. I've tried different variations of: xmllint --xpath '//MainGroupEntry/EntryType[@type="AAA"]/EntryDescription/text()' my_file.xml but I always get an empty XPath set. If I drop trying to get the Description text, I can see the entries that match my 'type' condition:

xmllint --xpath '//MainGroupEntry/EntryType[@type="AAA"]' my_file.xml <EntryType type="AAA"/><EntryType type="AAA"/>

I just can't seem to figure out how to only grab the text from the Description field. Thoughts?


Solution

  • You can use the following-sibling axis and the text() function to extract only the text from the description:

    xmllint --xpath '/MainGroup/MainGroupEntry/EntryType[@type="AAA"]/following-sibling::EntryDescription/text()' file.xml
    

    To separate the texts, you can use the --shell option with cat:

    echo 'cat /MainGroup/MainGroupEntry/EntryType[@type="AAA"]/following-sibling::EntryDescription/text()' \
    | xmllint --shell file.xml
    

    You might need to | grep -v ' -----\|/ >' the output to remove the separators and prompt.