Search code examples
perlfindexport-to-csvfindstrspotfire

Exported TSV file has whitespace between each character while using text processor(Perl CSV or find/findstr DOS)


I have a TSV file exported from an application (Spotfire webplayer using Internet Explorer). While viewing that file in Notepad++ or notepad everything looks good (PFA the snapshot).

But If i input the file to a Perl based CSV Parser(TSV actually) or simply use find/findstr MS-DOS commands, each character actually appears with a whitespace.

I am trying to exclude few lines (based on specific dates) but due to this issue, I am unable to do that.

enter image description here


Solution

  • Your file is Unicode encoded. (Notepad++ is showing it as "UCS-2 Little Endian" in the status bar.) You need to tell Perl what the encoding is and decode the data while reading from the file.

    use Encode qw(decode);
    # read from file into $octets...
    my $chars = decode('UCS-2LE', $octets, Encode::FB_CROAK);