Search code examples
xmlbashxqueryxmlstarletbasex

XML: how can I search in whole xml and find nodes by id and delete them?


A simple example of a xml file

<?xml version="1.0" encoding="UTF-8"?>

<bookstore>
    <speklap name="gj">
    <book>
      <title lang="en" id="1">Harry Potter</title>
      <price>29.99</price>
    </book>
    <book>
        <title lang="en" id="2">Learning XML</title>
        <price>39.95</price>
      </book>
    <photostore>
        <photo>
             <title lang="en" id="3">Learning XPATH</title>
             <price>1.000</price>
           </photo>
       </photostore>
    </speklap>
 </bookstore>

What I want to achieve is to search for a node with attributes id =2 and id=3 and remove the only this 2 nodes. The problem is that I can found enough examples by targeting the node but not how to search the whole xml and find a node based on a id and remove only the node with this id.

So the desired output is:

<bookstore>
    <speklap name="gj">
    <book>
      <title lang="en" id="1">Harry Potter</title>
      <price>29.99</price>
    </book>
    <book>
        <price>39.95</price>
      </book>
    <photostore>
        <photo>
             <price>1.000</price>
           </photo>
       </photostore>
    </speklap>
 </bookstore>

It would be great to make a simple script but I'm a beginner. I tried XQuery. But im also interested in a bash script. Hope somebody can help me in the good direction


Solution

  • With BaseX, the following command call can be used to delete nodes in a document:

    basex -u -i test.xml "delete node //*[@id = (2, 3)]"
    

    With -u, updates will be propagated back to the original file. With -i, the input document is specified. The subsequent string is a valid XQuery expression with the requested update.

    One alternative is to directly specify the input document in the query (and I have slightly modified the predicate; it’s equivalent to the first version):

    basex -u "delete node doc('test.xml')//*[@id = 2 or @id = 3]"