how to use uniVocity-parsers to process non-printable character

I would like to use Java with uniVocity-parsers to parse the csv data which is produced by mysql select into outfile.

Now I encounter one situation of processing non-printable characters ! The mysql table contains bit(1) column and when using select into outfile to save it's data into file, I found that the bit(1) column data become non-printable character. When using uniVocity-parsers to get line data, I get null value of the bit(1) columns. I expect to get real data of the bit(1) column. What should I do ?

Solution

The problem here is that the bit(1) values are being exported by MySQL as characters \u0000 and \u0001, and the parser by default trims all values (meaning any character <= ' '). The trimming process will wipe out the \u0000 and \u0001 as their integer representations are 0 and 1 respectively, while the integer representation of a whitespace character ' ' is 32.

You just need to configure that parser to prevent trimming the values:

    settings.trimValues(false);

Also, the file you gave has lines terminated with \r\n. If you parse this on OSX or Linux you need to define the line endings explicitly:

    settings.getFormat().setLineSeparator("\r\n");

Or enable auto-detection with:

    settings.setLineSeparatorDetectionEnabled(true);

Hope this helps