Search code examples
xmlxml-namespacesxmlstarlet

XMLStarlet: Query for MARCXML


The following structure is given for a MARCXML file foo.xml:

<record><header><identifier>myID001</identifier><datestamp>2020-10-12</datestamp></header><metadata><marcxml:collection xmlns:marcxml="http://www.loc.gov/MARC21/slim">
      <marcxml:record>
          <marcxml:datafield ind1=" " ind2=" " tag="084">
          <marcxml:subfield code="2">rvk</marcxml:subfield>
          <marcxml:subfield code="a">MG 98092</marcxml:subfield>
        </marcxml:datafield>
        <marcxml:datafield ind1=" " ind2=" " tag="084">
          <marcxml:subfield code="2">bk</marcxml:subfield>
          <marcxml:subfield code="a">89.52</marcxml:subfield>
        </marcxml:datafield>
        <marcxml:datafield ind1=" " ind2=" " tag="084">
          <marcxml:subfield code="2">ddc</marcxml:subfield>
          <marcxml:subfield code="a">320.9439</marcxml:subfield>
        </marcxml:datafield>
      </marcxml:record>
    </marcxml:collection>
    </metadata></record>

I would like to extract only the content of <marcxml:subfield code="a"> where the previous field <marcxml:subfield code="2"> contains the string 'bk'.

So the desired output in this example would be: 89.52.

So far, I tried

xmlstarlet sel -N marcxml="http://www.loc.gov/MARC21/slim" -t -m "//marcxml:collection/marcxml:record/marcxml:datafield/marcxml:subfield[text()='bk']" -v '//marcxml:collection/marcxml:record/marcxml:datafield/marcxml:subfield[text()]' -nl foo.xml

which results in

rvk

MG 98092

bk

89.52

ddc

320.9439

How can this be done with XMLStarlet?


Solution

  • Try something along these lines:

    xmlstarlet sel -N marcxml="http://www.loc.gov/MARC21/slim" -t -v '//marcxml:subfield[@code="2"][text()="bk"]/following-sibling::marcxml:subfield[@code="a"]' -nl foo.xml