Search code examples
javascriptutf-8character-encodingfile-encodings

Removing non-printable character


Okay, so I've been bashing my head against the table over this one.

I am importing an XML file that was exported by Indesign. This parses it and creates a file based on the input. (I'm building a JS application with Node)

This file looks good in my PHPStorm IDE. But when I open it in gedit, i see some unwanted newlines here and there.

I've managed to track it down to this character: ->
<- (it really is there - copy it somewhere and move your cursor using the arrow keys over it. Its stuck in the middle).

This character viewed by a hex editor reveals it to be 0x80 0xE2 0xA9

When I tried to replace it using a simple javascript replace;

data = data.replace('
', ''); //There IS a character in the left one. Trust me.

I got the following parse error;

enter image description here

In vim it shows the following character at that place; ~@�

How am I going to remove that from my output? Escaping the character in the JS code caused it to compile just fine, but then the weird character is still there. I'm out of ideas.


Solution

  • You need to use '\u2029' as the search string. The sequence you are trying to replace is a "paragraph separator" Unicode character inserted by InDesign.

    So:

    string.replace('\u2029', '');
    

    instead of the character itself.