Search code examples
rubyxpathnokogirirexml

How do handle control flow better and nil objects in ruby


I have this script that is a part of a bigger one. I have tree diffrent XML files that looks a litle diffrent from each other and I need some type of control structure to handle nil-object and xpath expressions better

The script that I have right now, outputs nil objects:

require 'open-uri'
require 'rexml/document'
include REXML

@urls = Array.new()
@urls << "http://testnavet.skolverket.se/SusaNavExport/EmilObjectExporter?id=186956355&amp;strId=info.uh.kau.KTADY1&amp;EMILVersion=1.1"
@urls << "http://testnavet.skolverket.se/SusaNavExport/EmilObjectExporter?id=184594606&amp;strId=info.uh.gu.GS5&amp;EMILVersion=1.1"
@urls << "http://testnavet.skolverket.se/SusaNavExport/EmilObjectExporter?id=185978100&amp;strId=info.uh.su.ARO720&amp;EMILVersion=1.1"

@urls.each do |url|
  doc = REXML::Document.new(open(url).read)
  doc.elements.each("/educationInfo/extensionInfo/nya:textualDescription/nya:textualDescriptionPhrase | /ns:educationInfo/ns:extensionInfo/gu:guInfoExtensions/gu:guSubject/gu:descriptions/gu:description | //*[name()='ct:text']"){
      |e| m = e.text 
      m.gsub!(/<.+?>/, "")
      puts "Description: " + m 
      puts ""   
    }
end

OUTPUT:

Description: bestrykning, kalandrering, tryckning, kemiteknik

Description: Vill du jobba med internationella och globala frågor med... Description: The study of globalisation is becoming ever more important for our understanding of today´s world and the School of Global Studies is a unique environment for research.

Description:

Description:

Description: Kursen behandlar identifieringen och beskrivningen av sjukliga förändringar i mänskliga skelett. Kursen ger en ämneshistorisk bakgrund och skelettförändringars förhållanden till moderna kliniska data diskuteras.


Solution

  • See this post on how to skip over entries when using a block in ruby. The method each() on doc.elements is being called with a block (which is you code containing gsub and puts calls). The "next" keyword will let you stop executing the block for the current element and move on to the next one.

    
    doc.elements.each("/educationInfo/extensionInfo/nya:textualDescription/nya:textualDescriptionPhrase | /ns:educationInfo/ns:extensionInfo/gu:guInfoExtensions/gu:guSubject/gu:descriptions/gu:description | //*[name()='ct:text']"){
          |e| m = e.text 
          m.gsub!(//, "")
    
          next if m.empty?
    
          puts "Description: " + m 
          puts ""   
        }
    

    We know that "m" is a string (and not nil) when using the "next" keyword because we just called gsub! on it, which did not throw an error when executing that line. That means the blank Descriptions are caused by empty strings, not nil objects.