Search code examples
javanon-printing-charactersunivocity

how to use uniVocity-parsers to process non-printable character


I would like to use Java with uniVocity-parsers to parse the csv data which is produced by mysql select into outfile.

Now I encounter one situation of processing non-printable characters ! The mysql table contains bit(1) column and when using select into outfile to save it's data into file, I found that the bit(1) column data become non-printable character. When using uniVocity-parsers to get line data, I get null value of the bit(1) columns. I expect to get real data of the bit(1) column. What should I do ?


Solution

  • The problem here is that the bit(1) values are being exported by MySQL as characters \u0000 and \u0001, and the parser by default trims all values (meaning any character <= ' '). The trimming process will wipe out the \u0000 and \u0001 as their integer representations are 0 and 1 respectively, while the integer representation of a whitespace character ' ' is 32.

    You just need to configure that parser to prevent trimming the values:

        settings.trimValues(false);
    

    Also, the file you gave has lines terminated with \r\n. If you parse this on OSX or Linux you need to define the line endings explicitly:

        settings.getFormat().setLineSeparator("\r\n");
    

    Or enable auto-detection with:

        settings.setLineSeparatorDetectionEnabled(true);
    

    Hope this helps