Search code examples
xmlbashparsingxpathxmlstarlet

XMLStarlet: Printing one line per item, while using datum from parent element


I have XML data formatted in this fashion:

<XML>
    <Waveforms Time="01/01/2009 3:00:02 AM">
        <WaveformData Channel="I">1, 2, 3, 4, 5, 6 </WaveformData>
        <WaveformData Channel="II">9, 8, 7, 6, 5, 4 </WaveformData>
    </Waveforms>
    <Waveforms Time="01/01/2009 3:00:04 AM">
        <WaveformData Channel="I">1, 2, 3, 4, 5, 6 </WaveformData>
        <WaveformData Channel="II">9, 8, 7, 6, 5, 4 </WaveformData>
    </Waveforms>
</XML>

I am trying to use xmlstarlet to parse this data to a text file (comma delimited). The desired output would look like this:

Time Attribute, Channel Attribute, Data
01/01/2009 3:00:02 AM, I, 1, 2, 3, 4, 5, 6
01/01/2009 3:00:02 AM, II, 9, 8, 7, 6, 5, 4
01/01/2009 3:00:02 AM, I, 1, 2, 3, 4, 5, 6
01/01/2009 3:00:02 AM, II, 9, 8, 7, 6, 5, 4

The best I can come up with is:

 xmlstarlet sel -T -t -m //XML/Waveforms -v @Time -o "," -m Waves -v WaveformData/@Channel -o "," -v WaveformData -o "," -b -n testwave2.xml > testwave.txt

Which gives a result like this:

 01/01/2009 3:00:02 AM, I, 1, 2, 3, 4, 5, 6, II, 9, 8, 7, 6, 5, 4
 01/01/2009 3:00:04 AM, I, 1, 2, 3, 4, 5, 6, II, 9, 8, 7, 6, 5, 4

It's clear how to print one line per Waveforms, but not how to print one line per WaveformData if I want to include the time attribute from its parent. Can this be done? Alternately, should I work around and do some slicing and pasting to fix it on the back end afterwards?


Solution

  • Search for the WaveformData -- given as it's what you want one line per each of -- and just traverse upwards in the tree to find your time element.

    $ xmlstarlet sel -T -t -m /XML/Waveforms/WaveformData \
         -v ../@Time -o "," \
         -v @Channel -o "," \
         -v . -n <in.xml
    01/01/2009 3:00:02 AM,I,1, 2, 3, 4, 5, 6 
    01/01/2009 3:00:02 AM,II,9, 8, 7, 6, 5, 4 
    01/01/2009 3:00:04 AM,I,1, 2, 3, 4, 5, 6 
    01/01/2009 3:00:04 AM,II,9, 8, 7, 6, 5, 4 
    

    Alternately, if you know that each Waveforms will have exactly two WaveformData children, you could do the following:

    $ xmlstarlet sel -T -t -m /XML/Waveforms \
        -v ./@Time -o ",I,"  -v './WaveformData[@Channel="I"]' -n \
        -v ./@Time -o ",II," -v './WaveformData[@Channel="II"]' -n <in.xml
    01/01/2009 3:00:02 AM,I,1, 2, 3, 4, 5, 6
    01/01/2009 3:00:02 AM,II,9, 8, 7, 6, 5, 4
    01/01/2009 3:00:04 AM,I,1, 2, 3, 4, 5, 6
    01/01/2009 3:00:04 AM,II,9, 8, 7, 6, 5, 4