Search code examples
bashawkscriptingxmlstarletxmllint

bash script extract XML data into column format


Trying to extract xml data from multiple string outputs dynamically (the data changes) into a column format.

About 100 of these XML bits echo out when I run a query against an SQL database.

<?xml version="1.0"?>
<Connection>
  <ConnectionType>Putty</ConnectionType>
  <CreatedBy>Someone</CreatedBy>
  <CreationDateTime>2014-10-27T11:53:59.8993492-04:00</CreationDateTime>
  <Events>
    <OpenCommentPrompt>true</OpenCommentPrompt>
    <WarnIfAlreadyOpened>true</WarnIfAlreadyOpened>
  </Events>
  <Group>Cloud Services Client Delivery\Willis\Linux\Test - SJC</Group>
  <ID>77e96d52-f165-482f-8389-ffb95b9d8ccd</ID>
  <KeyboardHook>InFullScreenMode</KeyboardHook>
  <MetaInformation />
  <Name>Hostname-H-A10D</Name>
  <OpenEmbedded>true</OpenEmbedded>
  <PinEmbeddedMode>False</PinEmbeddedMode>
  <Putty>
    <PortFowardingArray />
    <Scripting />
    <SessionHost>10.0.0.100</SessionHost>
    <SessionName>10.0.0.100</SessionName>
    <TelnetEncoding>IBM437</TelnetEncoding>
  </Putty>
  <ScreenColor>C24Bits</ScreenColor>
  <SoundHook>DoNotPlay</SoundHook>
  <Stamp>771324d1-0c59-4f12-b81e-96edb5185ef7</Stamp>
</Connection>

And what I need is the and in a column format. And essentially where the hostname equal Hostname-H-A10D, I want to be able to match the D at the end and mark the First column with Dev, Q as Test and no letter at the end as Prod. So the output would look like -->

Dev Hostname-H-A10D 10.0.0.100
Dev Hostname-H-A11D 10.0.0.101
Prod Hostname-H-A12 10.0.0.201
Test Hostname-H-A13Q 10.0.0.10

I have played around with sed/awk/etc and not just cannot get the format I want without writing out temp flat files. I would prefer to get this into an array using something like xmlstarlet or xmllint. Of course better suggestions can be made and that is why I am here :) Thanks folks.


Solution

  • It would be better to use an XML parser.

    Using awk:

    $ awk -F'[<>]' 'BEGIN{a["D"]="Dev";a["Q"]="Test"} /Name/{name=$3; type=a[substr(name,length(name))]; if (length(type)==0) type="Prod";} /SessionHost/{print type, name, $3;}' s.xml
    Dev Hostname-H-A10D 10.0.0.100
    

    How it works

    • BEGIN{a["D"]="Dev";a["Q"]="Test"}

      This defines associative array a.

    • /Name/{name=$3; type=a[substr(name,length(name))]; if (length(type)==0) type="Prod";}

      On the line that has the host name, this captures the host name and, from it, determines the host type.

    • /SessionHost/{print type, name, $3;}

      On the line that contains the host IP, this prints the type, name, and IP.