Search code examples
rubystringparsinghashtable

How to parse a string into a hash table in Ruby


I am getting data as a string from a remote device. I need to parse the data. The data usually come like this:

MO                SCGR  SC         RSITE           ALARM_SITUATION
RXOTG-59            59  0          EK0322          ABIS PATH FAULT
RXOCF-59                           EK0322          LOCAL MODE
RXOTRX-59-0         4              EK0322          LOCAL MODE
RXOTRX-59-1                        EK0322          LOCAL MODE
RXOTRX-59-4             0          EK0322          LOCAL MODE
RXOTRX-59-5         1   3          EK0322          LOCAL MODE
RXOTRX-59-8                        EK0322          LOCAL MODE
RXOTRX-59-9                        EK0322          LOCAL MODE

I will love to have the data as an array of arrays or any other programmatically sensible structure.

I am splitting the data into an array using:

str.split("\r\n")

and then removing the extra space on each element in the array with:

tsgs.map! {|tsg| tsg.gsub(/\s+/, " ").split(" ") }

but this has limitation in that the empty cells are not considered. I expect the array to contain five elements, but it instead contains less than five.

Case 1: In this case, I get the expected result:

RXOTG-59            59  0          EK0322          ABIS PATH FAULT

converts to

["RXOTG-59", "59", "0", "EK0322", "ABIS PATH FAULT"]

Case 2: In this case, I get an unexpected result:

RXOTRX-59-9                        EK0322          LOCAL MODE

converts to

["RXOTRX-59-9", "EK0322", "LOCAL MODE"]
   def getCommandResult(tgdatas)
        tgdatas_arr = tgdatas.split("\r\n")
        tsgs = tgdatas_arr[5..tgdatas_arr.index("END")-2]
        tsgs.map! {|tsg| tsg.gsub(/\s+/, " ").split(" ")[0] }
        return tsgs
    end

Solution

  • Your string1, modified slightly:

    data = <<END
    MO                SCGR  SC         RSITE           ALARM_SITUATION
    RXOTG-59            59  0          EK0322          ABIS PATH FAULT
    RXOCF-59                           EK0322          LOCAL MODE
    RXOTRX-59-0         4              EK0322          LOCAL MODE
    RXOTRX-59-1                        EK0322          LOCAL MODE
    RXOTRX-59-4             0
    RXOTRX-59-5         1   3          EK0322          LOCAL MODE
    RXOTRX-59-8                        EK0322          LOCAL MODE
    RXOTRX-59-9                        EK0322          LOCAL MODE
    END
    

    This string looks very much like CSV data structure, so we might be tempted to convert it to a CSV string, thereby allowing us to bring to bear the methods provided by the CSV class.

    Convert string to CSV string

    Code

    def convert_to_csv(data)
      cols = data[/.+?\n/].gsub(/ \S/).map { |s| Regexp.last_match.begin(0) }
      data.each_line.map do |s|
        cols.each { |i| s[i] = ',' if s.size > i+1 }
        s.gsub(/ *, */, ',')
      end.join
    end
    

    Convert string

    Now convert the string data to a CSV string.

    csv_data = convert_to_csv(data)
    
    puts csv_data
    MO,SCGR,SC,RSITE,ALARM_SITUATION
    RXOTG-59,59,0,EK0322,ABIS PATH FAULT
    RXOCF-59,,,EK0322,LOCAL MODE
    RXOTRX-59-0,4,,EK0322,LOCAL MODE
    RXOTRX-59-1,,,EK0322,LOCAL MODE
    RXOTRX-59-4,,0
    RXOTRX-59-5,1,3,EK0322,LOCAL MODE
    RXOTRX-59-8,,,EK0322,LOCAL MODE
    RXOTRX-59-9,,,EK0322,LOCAL MODE
    

    Explanation

    The steps are as follows.

    s = data[/.+?\n/]
      #=> "MO                SCGR  SC         RSITE           ALARM_SITUATION\n" 
    e0 = s.gsub(/ \S/)
      #=> #<Enumerator: "MO ... ALARM_SITUATION\n":gsub(/ \S/)>
    cols = e0.map { Regexp.last_match.begin(0) - 1 }
      #=> [17, 23, 34, 50] 
    e1 = data.each_line
      #=> #<Enumerator: "MO ... LOCAL MODE\n":each_line> 
    a = e1.map do |s|
      cols.each { |i| s[i] = ',' if s.size > i+1 }
      s.gsub(/ *, */,',')
    end
      #=> ["MO,SCGR,SC,RSITE,ALARM_SITUATION\n",
      #    "RXOTG-59,59,0,EK0322,ABIS PATH FAULT\n",
      #    ...
      #    "RXOTRX-59-9,,,EK0322,LOCAL MODE\n"] 
    a.join
      #=> < return value above >
    

    Let's have a closer look at the calculation of a. First, the block variable s is assigned to the first element generated by the enumerator e1:

    s = e1.next
      #=> "MO                SCGR  SC         RSITE           ALARM_SITUATION\n"
    

    The block calculation is then performed:

    cols.each { |i| s[i] = ',' }
    s #=> "MO               ,SCGR ,SC        ,RSITE          ,ALARM_SITUATION\n"
    s.gsub(/ *, */,',')
      #=> "MO,SCGR,SC,RSITE,ALARM_SITUATION\n"
    

    The regular expression used with gsub reads, "match zero or more spaces followed by a comma, followed by zero or more spaces".

    When the short line is passed to the block the following calculation is performed.

    s = "RXOTRX-59-4             0"
    s.size
      #=> 25
    cols
      #=> [17, 23, 34, 50] 
    cols.each { |i| s[i] = ',' if s.size > i+1 }
    s #=> "RXOTRX-59-4      ,     ,0" 
    s.gsub(/ *, */,',')
      #=> "RXOTRX-59-4,,0" 
    

    The remaining elements of e1 are processed similarly.

    Convert the CSV string to a hash

    We may now make use of CSV methods. For example, suppose we wish to create an array of hashes whose keys are the header elements, downcased and converted to symbols and values of "SCGR" and "SC" are to be converted to integers. To do that we make use of the class method CSV::new, specifying appropriate values for method options.

    Construct the hash

    require 'csv'
    
    CSV.new(csv_data, headers: true, header_converters: :symbol,
      converters: :all).to_a.map(&:to_h)
      #=> [{:mo=>"RXOTG-59",    :scgr=>59,  :sc=>0,   :rsite=>"EK0322",
      #     :alarm_situation=>"ABIS PATH FAULT"},
      #    {:mo=>"RXOCF-59",    :scgr=>nil, :sc=>nil, :rsite=>"EK0322",
      #     :alarm_situation=>"LOCAL MODE"},
      #    {:mo=>"RXOTRX-59-0", :scgr=>4,   :sc=>nil, :rsite=>"EK0322",
      #     :alarm_situation=>"LOCAL MODE"},
      #    {:mo=>"RXOTRX-59-1", :scgr=>nil, :sc=>nil, :rsite=>"EK0322",
      #     :alarm_situation=>"LOCAL MODE"},
      #    {:mo=>"RXOTRX-59-4", :scgr=>nil, :sc=>0,   :rsite=>nil,
      #     :alarm_situation=>nil},
      #    {:mo=>"RXOTRX-59-5", :scgr=>1,   :sc=>3,   :rsite=>nil"EK0322",
      #     :alarm_situation=>"LOCAL MODE"},
      #    {:mo=>"RXOTRX-59-8", :scgr=>nil, :sc=>nil, :rsite=>"EK0322",
      #     :alarm_situation=>"LOCAL MODE"},
      #    {:mo=>"RXOTRX-59-9", :scgr=>nil, :sc=>nil, :rsite=>"EK0322",
      #     :alarm_situation=>"LOCAL MODE"}]
    

    Explanation

    The steps are as follows.

    csv = CSV.new(csv_data, headers: true, header_converters: :symbol,
      converters: :all)
      #=> <#CSV io_type:StringIO encoding:UTF-8 lineno:0 col_sep:",
      #         " row_sep:"\n" quote_char:"\"" headers:true> 
    a = csv.to_a
      #=> [#<CSV::Row mo:"RXOTG-59" scgr:59 sc:0 rsite:"EK0322" alarm_situation:"ABIS PATH FAULT">,
      #    #<CSV::Row mo:"RXOCF-59" scgr:nil sc:nil rsite:"EK0322" alarm_situation:"LOCAL MODE">,
      #    ...
      #    #<CSV::Row mo:"RXOTRX-59-9" scgr:nil sc:nil rsite:"EK0322" alarm_situation:"LOCAL MODE">] 
    a.map(&:to_h)
      #=> < hash shown above >
    

    1 To run the code you will need to un-indent this heredoc (or change the first line to data = <<-END.lines.map(&:lstrip).join).