Search code examples
ruby-on-railsjsonregexrubystring-substitution

More efficient method than string replace with ruby gsub


I have a third party JSON feed which is huge - lots of data. Eg

{
   "data": [{
     "name": "ABC",
     "price": "2.50"
   },
   ...
   ]
}

I am required to strip the quotation marks from the price as the consumer of the JSON feed requires it in this way.

To do this I am performing a regex to find the prices and then iterating over the prices and doing a string replace using gsub. This is how I am doing it:

price_strings = json.scan(/(?:"price":")(.*?)(?:")/).uniq
price_strings.each do |price|
  json.gsub!("\"#{price.reduce}\"", price.reduce)
end
json

The main bottle neck appears to be on the each block. Is there a better way of doing this?


Solution

  • If this JSON string is going to be serialised into a Hash at some point in your application or in another 3rd-party dependency of your code (i.e. to be consumed by your colleagues or modules), I suggest negotiating with them to convert the price value from String to Numeric on demand when the json is already a Hash, as this is more efficient, and allows them to...

    ...handle edge-case where say if "price": "" of which my code below will not work, as it would remove the "", and will be a JSON syntax error.

    However, if you do not have control over this, or are doing once-off mutation for the whole json data, then can you try below?

    json =
    <<-eos
    {
      "data": [{
        "name": "ABC",
        "price": "2.50",
        "somethingsomething": {
          "data": [{
            "name": "DEF",
            "price": "3.25", "someprop1": "hello",
            "someprop2": "world"
          }]
        },
        "somethinggggg": {
          "price": "123.45" },
        "something2222": {
          "price": 9.876, "heeeello": "world"
        }
      }]
    }
    eos
    
    new_json = json.gsub /("price":.*?)"(.*?)"(.*?,|})/, '\1\2\3'
    
    puts new_json
    # =>
    # {
    #   "data": [{
    #     "name": "ABC",
    #     "price": 2.50,
    #     "somethingsomething": {
    #       "data": [{
    #         "name": "DEF",
    #         "price": 3.25, "someprop1": "hello",
    #         "someprop2": "world"
    #       }]
    #     },
    #     "somethinggggg": {
    #       "price": 123.45 },
    #     "something2222": {
    #       "price": 9.876, "heeeello": "world"
    #     }
    #   }]
    # }
    

    DISCLAIMER: I am not a Regexp expert.