Search code examples
xmlshellawkxmlstarlet

Get Specific Tags from Multiple Similar Tags of XML


I have an XML file of the format:

<classes>

 <subject>
  <name>Operating System</name>
  <credit>3</credit>
  <type>Theory</type>
  <faculty>Prof. XYZ</faculty> 
 </subject>

 <subject>
  <name>Web Development</name>
  <credit>3</credit>
  <type>Lab</type>
 </subject>

</classes>

I want to get the result of only those classes which are of 'type' = 'Theory' using Shell Script.

I tried using :

awk -F'[<>]' '/<name>|<credit>|<type>|<faculty>/{print $3}' file.xml

But this command is returning every field of the xml tags.

i.e.

Operating System
3
Theory
Prof. XYZ
Web Development
3
Lab

Looking for the solution to get only specific tag values if multiple tags are present.

TIA.


Solution

  • Could you please try following, I am not an expert of xamlstarlet giving it a try here.

    xmlstarlet sel -t -v  "classes" Input_file |
    awk '
    NF{
      gsub(/^[[:space:]]+|[[:space:]]+$/,"")
      print
    }'
    

    Brief explanation: Processing xml file with xmlstarlet then for output formatting I am using awk(after xml processing) which removes unnecessary spaces and new lines from output.



    EDIT: Since OP told he can't install xmlstarlet in system so adding awk solution but fair warning awk is NOT tool for xml following solution been added by seeing shown samples only.

    awk -F"[><]" '
    /<\/subject>/{
      if(found){
        print val
      }
      found=val=""
      next
    }
    /<subject>/{
      found=1
      next
    }
    found{
      val=(val?val ORS:"")$3
    }
    '   Input_file