I'm using Univocity-Parser's bean iterator to read each line of file and get the bean. I have observed a weird behavior in the library when I'm attempting to read the same file mutiple times.
Code when passing the File object to CsvParser instance: private static void testBeanIterator() throws Exception {
try {
File sampleFile = generateFile(0);
/*
System.out.println("Sample file content = " + FileUtils.readFileToString(sampleFile,
Charset.defaultCharset()));
*/
for (int i = 0; i < 1000; i++) {
BufferedReader reader =
new BufferedReader(new InputStreamReader(new FileInputStream(sampleFile),
StandardCharsets.UTF_8));
AtomicInteger atomicInteger = new AtomicInteger();
final BeanProcessor<CustomerSegmentMapping> rowProcessor =
new BeanProcessor<CustomerSegmentMapping>(CustomerSegmentMapping.class) {
@Override
public void beanProcessed(@Nonnull final CustomerSegmentMapping customerSegmentMapping,
@Nonnull final ParsingContext context) {
try {
System.out.println(OBJECT_MAPPER.writeValueAsString(customerSegmentMapping));
atomicInteger.getAndAdd(1);
} catch (Exception ex) {
throw new RuntimeException("error");
}
}
};
rowProcessor.setStrictHeaderValidationEnabled(true);
final CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setRowProcessor(rowProcessor);
parserSettings.setHeaderExtractionEnabled(true);
final CsvParser parser = new CsvParser(parserSettings);
//parser.parse(reader);
parser.parse(sampleFile);
System.out.println("Finished parser");
if (atomicInteger.get() != 10) {
throw new Exception("mismatch");
}
reader.close();
}
} catch (Exception ex) {
throw new RuntimeException("exception = " + ex.getMessage(), ex);
} finally {
}
}
On executing the code, following is the console output:
{"customerId":"6bc12a7a-2c28-4aea-a7be-6be45e16ffb2","segmentId":"S1"}
{"customerId":"da736310-e508-47ff-92b8-59d490e37a72","segmentId":"S1"}
{"customerId":"9a5d4454-e6d4-49a5-bb04-8354154d0493","segmentId":"S1"}
{"customerId":"ec2ed5cc-cd18-443b-bd69-e56fc09ba0f5","segmentId":"S1"}
{"customerId":"94ea24b0-0c83-4039-a391-1d2439c88be8","segmentId":"S1"}
{"customerId":"2baef5f9-d8cd-451d-b579-a626cb58b284","segmentId":"S1"}
{"customerId":"022a184b-1b06-49aa-b1c4-b94a6f343b04","segmentId":"S1"}
{"customerId":"bcb3984c-0495-4da8-b146-9af3983cc158","segmentId":"S1"}
{"customerId":"feef62de-1aaf-43d4-a83b-afe053db97cf","segmentId":"S1"}
{"customerId":"5825c924-55d5-4fd6-8468-ca36d47a7cae","segmentId":"S1"}
Finished parser
{"customerId":"6bc12a7a-2c28-4aea-a7be-6be45e16ffb2","segmentId":"S1"}
{"customerId":"da736310-e508-47ff-92b8-59d490e37a72","segmentId":"S1"}
{"customerId":"9a5d4454-e6d4-49a5-bb04-8354154d0493","segmentId":"S1"}
{"customerId":"ec2ed5cc-cd18-443b-bd69-e56fc09ba0f5","segmentId":"S1"}
{"customerId":"94ea24b0-0c83-4039-a391-1d2439c88be8","segmentId":"S1"}
{"customerId":"2baef5f9-d8cd-451d-b579-a626cb58b284","segmentId":"S1"}
{"customerId":"022a184b-1b06-49aa-b1c4-b94a6f343b04","segmentId":"S1"}
{"customerId":"bcb3984c-0495-4da8-b146-9af3983cc158","segmentId":"S1"}
{"customerId":"feef62de-1aaf-43d4-a83b-afe053db97cf","segmentId":"S1"}
{"customerId":"5825c924-55d5-4fd6-8468-ca36d47a7cae","segmentId":"S1"}
Finished parser
{"customerId":"6bc12a7a-2c28-4aea-a7be-6be45e16ffb2","segmentId":"S1"}
{"customerId":"da736310-e508-47ff-92b8-59d490e37a72","segmentId":"S1"}
{"customerId":"9a5d4454-e6d4-49a5-bb04-8354154d0493","segmentId":"S1"}
{"customerId":"ec2ed5cc-cd18-443b-bd69-e56fc09ba0f5","segmentId":"S1"}
{"customerId":"94ea24b0-0c83-4039-a391-1d2439c88be8","segmentId":"S1"}
{"customerId":"2baef5f9-d8cd-451d-b579-a626cb58b284","segmentId":"S1"}
{"customerId":"022a184b-1b06-49aa-b1c4-b94a6f343b04","segmentId":"S1"}
{"customerId":"bcb3984c-0495-4da8-b146-9af3983cc158","segmentId":"S1"}
{"customerId":"feef62de-1aaf-43d4-a83b-afe053db97cf","segmentId":"S1"}
{"customerId":"5825c924-55d5-4fd6-8468-ca36d47a7cae","segmentId":"S1"}
Finished parser
Exception in thread "main" java.lang.RuntimeException: exception = Could not find fields [CustomerId]' in input. Names found: [ustomerId, SegmentId]
Internal state when error was thrown: line=2, column=0, record=1, charIndex=60, headers=[ustomerId, SegmentId]
at com.poppins.cube.common.UnivocityNahiHatanaHai.testBeanIterator(UnivocityNahiHatanaHai.java:95)
at com.poppins.cube.common.UnivocityNahiHatanaHai.main(UnivocityNahiHatanaHai.java:37)
Caused by: com.univocity.parsers.common.DataProcessingException: Could not find fields [CustomerId]' in input. Names found: [ustomerId, SegmentId]
Internal state when error was thrown: line=2, column=0, record=1, charIndex=60, headers=[ustomerId, SegmentId]
at com.univocity.parsers.common.processor.core.BeanConversionProcessor.mapFieldIndexes(BeanConversionProcessor.java:414)
at com.univocity.parsers.common.processor.core.BeanConversionProcessor.mapValuesToFields(BeanConversionProcessor.java:340)
at com.univocity.parsers.common.processor.core.BeanConversionProcessor.createBean(BeanConversionProcessor.java:508)
at com.univocity.parsers.common.processor.core.AbstractBeanProcessor.rowProcessed(AbstractBeanProcessor.java:54)
at com.univocity.parsers.common.Internal.process(Internal.java:21)
at com.univocity.parsers.common.AbstractParser.rowProcessed(AbstractParser.java:596)
at com.univocity.parsers.common.AbstractParser.parse(AbstractParser.java:133)
at com.univocity.parsers.common.AbstractParser.parse(AbstractParser.java:605)
at com.poppins.cube.common.UnivocityNahiHatanaHai.testBeanIterator(UnivocityNahiHatanaHai.java:83)
... 1 more
Process finished with exit code 1
Following is the content of the file:
CustomerId,SegmentId
6bc12a7a-2c28-4aea-a7be-6be45e16ffb2,S1
da736310-e508-47ff-92b8-59d490e37a72,S1
9a5d4454-e6d4-49a5-bb04-8354154d0493,S1
ec2ed5cc-cd18-443b-bd69-e56fc09ba0f5,S1
94ea24b0-0c83-4039-a391-1d2439c88be8,S1
2baef5f9-d8cd-451d-b579-a626cb58b284,S1
022a184b-1b06-49aa-b1c4-b94a6f343b04,S1
bcb3984c-0495-4da8-b146-9af3983cc158,S1
feef62de-1aaf-43d4-a83b-afe053db97cf,S1
5825c924-55d5-4fd6-8468-ca36d47a7cae,S1
From what I could understand, the issue is arising because I'm passing a File object to CsvParser. CsvParser internally creates an InputStream object which is not closed. If I'm passing a Buffered reader object instead of File object, the issue is not arising.
I'm not able to understand whether this is a known issue with the Univocity-Parsers or is there anything I'm missing in understanding.
Author of the library here. I can see your exception showing it got header ustomerId
instead of CustomerId
.
This looks like a bug introduced in version 2.5.0 that was fixed in version 2.5.6 if I'm not mistaken. This plagued me for a while as it was an internal concurrency issue that was hard to track down. Basically when you pass a File without an explicit encoding it will try to find a UTF BOM marker in the input (effectively consuming the first character) to determine the encoding automatically. This happened only for InputStreams and Files.
Anyway, this has been fixed so simply updating to the latest version should get rid of the problem for you (please let me know if you are not using version 2.5.something)
If you want to remain with the current version you have there, the error will be gone if you call
parser.parse(sampleFile, Charset.defaultCharset());
This will prevent the parser from trying to discover whether there's a BOM marker in your file, therefore avoiding that pesky bug.
Hope this helps