Search code examples
xmlxml-parsingxmlstarlet

merge xml element attribute


I have an input xml

<IndexCatalogueRecord SeriesNumber="1" SeriesVolume="3" SeriesPage="594">
<IndexCatalogueID>10305941390</IndexCatalogueID>
<GeneralNote>[Shelved in: B.58]</GeneralNote>
<GeneralNote>[Shelved in: B.458]</GeneralNote>
<GeneralNote>[Shelved in: B.20]</GeneralNote>
<Language>fr</Language>
</IndexCatalogueRecord>

and I need a solution to combine the GeneralNote element attributes seperate by a comma delimiter so that it becomes

<IndexCatalogueRecord SeriesNumber="1" SeriesVolume="3" SeriesPage="594">
<IndexCatalogueID>10305941390</IndexCatalogueID>
<GeneralNote>[Shelved in: B.58, B.458, B.20]</GeneralNote>
<Language>fr</Language>
</IndexCatalogueRecord>

My approach was have xmlstarlet query the element for attribute value and then pipe that to process with grep or awk. I can easily grab the value of the attribute from GeneralNote using xmlstarlet

 xmlstarlet sel -t -m "//GeneralNote" -v . -n test.xml

but when I tried to pipe the console output to grep to have it strip the matching string "[Shelved in:" and "]" I am having some trouble, please let me know if there's an more elegant solution. thanks in advance


Solution

  • One possible approach is using two nested xmlstarlet commands
    (pay attenttion to the two occurrences of the filename in the expression):

    xmlstarlet ed -u "/IndexCatalogueRecord/GeneralNote[1]" \
      -v "$(xmlstarlet sel -t -o "[Shelved in: " -m "/IndexCatalogueRecord/GeneralNote" \
      -v "substring-after(substring-before(.,']'),'[Shelved in: ')" \
      --if 'position() != last()' -o ', ' -b -b -o "]" input.xml)" \
      -d "/IndexCatalogueRecord/GeneralNote[position() > 1]" input.xml
    

    The inner xmlstarlet command creates the final value from all GeneralNote elements and the outer command updates the first GeneralNote element and deletes the other. In xmlstarlet ed is the edit value mode and sel is the select value mode.

    • -u - means update value
    • -v - retrieves a value by XPath
    • -m - iterates over all nodes matching an XPath expression
    • -o - outputs a static string
    • -b - ends an iteration or an if-clause
    • -d - deletes all nodes matching the XPath

    If you want to modify the XML in-place, add an -L option right after the xmlstarlet ed.