Search code examples
rubybyte-order-mark

Is there a way to remove the BOM from a UTF-8 encoded file?


Is there a way to remove the BOM from a UTF-8 encoded file?

I know that all of my JSON files are encoded in UTF-8, but the data entry person who edited the JSON files saved it as UTF-8 with the BOM.

When I run my Ruby scripts to parse the JSON, it is failing with an error. I don't want to manually open 58+ JSON files and convert to UTF-8 without the BOM.


Solution

  • So, the solution was to do a search and replace on the BOM via gsub! I forced the encoding of the string to UTF-8 and also forced the regex pattern to be encoded in UTF-8.

    I was able to derive a solution by looking at http://self.d-struct.org/195/howto-remove-byte-order-mark-with-ruby-and-iconv and http://blog.grayproductions.net/articles/ruby_19s_string

    def read_json_file(file_name, index)
      content = ''
      file = File.open("#{file_name}\\game.json", "r") 
      content = file.read.force_encoding("UTF-8")
    
      content.gsub!("\xEF\xBB\xBF".force_encoding("UTF-8"), '')
    
      json = JSON.parse(content)
    
      print json
    end