Search code examples
xml-parsingcoffeescript

Cannot parse xml - Error: Non-whitespace before first tag. Line: 0 Column: 1 Char: 4


I am writing a simple atom package. When I send a request, server makes a xml response, so I tried parsing it with xml2js. However the error occurs:

Error: Non-whitespace before first tag. Line: 0 Column: 1 Char: 4

err

How can I resolve it? Thank you in advance.

Part of Codes:

module.exports = class HatenaBlogPost
~~~

  @hatenaBlogPost = new HatenaBlogPost()

~~~

postEntry: (callback) ->
  draft = if @isPublic then 'no' else 'yes'

  requestBody = """
    <?xml version="1.0" encoding="UTF-8"?>
    <entry xmlns="http://www.w3.org/2005/Atom"
           xmlns:app="http://www.w3.org/2007/app">
    <title>#{@entryTitle}</title>
    <author><name>#{@getHatenaId()}</name></author>
    <content type="text/plain">
      #{_.escape(@entryBody)}
    </content>
    <updated>#{moment().format('YYYY-MM-DDTHH:mm:ss')}</updated>
    <app:control>
      <app:draft>#{draft}</app:draft>
    </app:control>
    </entry>
  """

options =
  hostname: 'blog.hatena.ne.jp'
  path: "/#{@getHatenaId()}/#{@getBlogId()}/atom/entry"
  auth: "#{@getHatenaId()}:#{@getApiKey()}"
  method: 'POST'

request = https.request options, (res) ->
  res.setEncoding "utf-8"
  body = ''
  res.on "data", (chunk) ->
    body += chunk
  res.on "end", ->
    callback(body)


request.write requestBody
request.end()

View:

{parseString} = require 'xml2js'

~~~

@hatenaBlogPost.postEntry (response) =>
   parseString response, (err, result) =>
     if err
       atom.notifications.addError("#{err}", dismissable: true)
     else
       entryUrl = result.entry.link[1].$.href
       entry_Title = result.entry.title
       atom.notifications.addSuccess("Posted #{entry_Title} at #{entryUrl}", dismissable: true)

Solution

  • The culprit is the so-called Byte-Order-Mark (BOM), a 3-byte “Zero width no-break space” Unicode character which Windows systems automatically prepend to UTF-8 files. When inspecting your file with a hex editor, the BOM shows up as hex EFBBBF.

    To fix the issue:

    var cleanedString = origString.replace("\ufeff", "");
    

    See this article for more.