Search code examples
rubyregexrubular

How to access the various occurences of the same match group in Ruby Regular expressions ?


I have a regular expression which has multiple matches. I figured out that $1 , $2 etc .. can be used to access the matched groups. But how to access the multiple occurences of the same matched group ?

Please take a look at the rubular page below.

http://rubular.com/r/nqHP1qAqRY

So now $1 gives 916 and $2 gives NIL. How can i access the 229885 ? Is there something similar to $1[1] or so ?


Solution

  • To expand on my comment and respond to your question:

    If you want to store the values in an array, modify the block and collect instead of iterate:

    > arr = xml.grep(/<DATA size="(\d+)"/).collect { |d| d.match /\d+/ }
    > arr.each { |a| puts "==> #{a}" }
    ==> 916
    ==> 229885
    

    The |d| is normal Ruby block parameter syntax; each d is the matching string, from which the number is extracted. It's not the cleanest Ruby, although it's functional.

    I still recommend using a parser; note that the rexml version would be this (more or less):

    require 'rexml/document'
    include REXML
    doc = Document.new xml
    arr = doc.elements.collect("//DATA") { |d| d.attributes["size"] }
    arr.each { |a| puts "==> #{a}" }
    

    Once your "XML" is converted to actual XML you can get even more useful data:

    doc = Document.new xml
    arr = doc.elements.collect("//file") do |f|
      name = f.elements["FILENAME"].attributes["path"]
      size = f.elements["DATA"].attributes["size"]
      [name, size]
    end
    
    arr.each { |a| puts "#{a[0]}\t#{a[1]}" }
    
    ~/Users/1.txt   916
    ~/Users/2.txt   229885