Search code examples
rubyxmlnokogirifile-conversionactivesupport

Hyphen (-) in XML element is becoming Underscore (_) when converting it into JSON in Ruby


My XML elements have dates separated with hyphens and starting with an underscore to keep the XML valid.

<?xml version="1.0" encoding="UTF-8" ?>
<root>
  <_2018-12-11>
    <USD>1.1379</USD>
    <JPY>128.75</JPY>
    <BGN>1.9558</BGN>
  </_2018-12-11>
  <_2018-12-10>
    <USD>1.1425</USD>
    <JPY>128.79</JPY>
    <BGN>1.9558</BGN>
  </_2018-12-10>
</root>

I converted the document into JSON using Nokogiri and Active Support. Here is the code:-

require 'json'
require 'nokogiri'
require 'active_support/core_ext/hash/conversions'
s = File.read('../data/currency_source.xml')
x = Nokogiri::XML(s)
newX = Hash.from_xml(x.to_s) #Even though I use Hash.from_xml(x).to_json, it throws an error "does not have a valid root"

puts JSON.pretty_generate(newX)

The above code prints information in Hash, the dates should be _2018-12-11 but it is displaying as _2018_12_11.

When I only print the XML, it displays the date element correctly but not after converting it into JSON.

Is there something I can work around to get the Dates in the correct format i.e. separated by hyphens, not underscores?

P.S. I have also tried Crack and CobraVsMongoose. Crack is also producing the same undesired date format. However, CobraVsMongoose is maintaining the hyphens correctly but it is appending $ symbol unnecessarily.


Solution

  • This conversion is explicitly done in the ActiveSupport code for .from_xml. From the source, the keys of the converted hash are normalized to remove dashes:

    def normalize_keys(params)
      case params
      when Hash
        Hash[params.map { |k, v| [k.to_s.tr("-", "_"), normalize_keys(v)] } ]
      when Array
        params.map { |v| normalize_keys(v) }
      else
        params
      end
    end
    

    There is no comment to explain why this is necessary or desirable, and there is no option/switch to skip the normalization. One possible workaround would be to use a different delimiter (e.g., _2018.12.10) in your XML source keys and then convert them to dashes once you've got them safely into a Hash.