Pretty much what the title says. I'm writing code that needs to be able to work both with BOM'ed and non-BOM'ed files. Different parsing options need to be implemented, for now I'm implementing support for parsing CSV files.
Code below is a rough idea of what I'm working with. If need be, I can provide a minimum working example.
class LocalFileAccess {
// ...
// Opens an input stream to the file based on the path passed in constructor.
// Part of a larger interface, can't change the signature.
@Override
public InputStream getInputStream() throws FileNotFoundException {
File file = new File(this.path);
if (!file.isAbsolute()) {
file = getFile(this.base, this.path);
}
return new FileInputStream(file);
}
public void foo() {
try (BOMInputStream inputStream = new BOMInputStream(this.getInputStream())) {
Iterator<String[]> iterator = new CSVReaderBuilder(new InputStreamReader(inputStream, StandardCharsets.UTF_8).build().iterator();
String[] header = iterator.next(); // <- first value is prepended by BOM
} catch (...) { ... }
}
Later in the codebase, when parsing through the values gotten from the Iterator, the first value in the header is prepended with the BOM, which causes my tests to fail. The hacky way is to check for this manually, but I'd rather keep my code clean.
Wrapping the return value of getInputStream()
in new BOMInputStream()
fixes it. However, replacing new BOMInputStream(this.getInputStream())
in the try-with-resources with just this.getInputStream()
breaks it again: the BOM gets through.
I've tried different variations of wrapping only the return value of getInputStream
in a BOMInputStream, wrapping only the InputStream in try-with-resources in a BOMInputStream, to no avail. The only solution seems to be wrapping return value of getInputStream
in the try-with-resources in a BOMInputStream and I don't understand why.
Why do I need to wrap the input stream in a BOMInputStream twice?
Edit: to clarify: I'm using the Apache Commons IO BOMInputStream.
Not wishing my last comment to imply there's something wrong with Commons BOMInputStream
(since I couldn't believe they'd be incompetent enough to fail to read the stream properly in the absence of a BOM) I decided to test it. As I expected, it's perfectly capable of reading the file with or without BOM:
Source:
package com.technojeeves.opencsvbeans;
import com.opencsv.bean.CsvToBeanBuilder;
import org.apache.commons.io.input.BOMInputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.charset.StandardCharsets;
import java.util.List;
import java.io.IOException;
import java.io.Reader;
import java.io.InputStreamReader;
public class App {
public static void main(String[] args) {
try {
System.out.println(new App().read(Path.of(args[0])));
} catch (Throwable t) {
t.printStackTrace();
}
}
public List<Pojo> read(Path path) throws IOException {
try (Reader reader = new InputStreamReader(new BOMInputStream(Files.newInputStream(path)),
StandardCharsets.UTF_8)) {
return new CsvToBeanBuilder(reader).withType(Pojo.class).build().parse();
}
}
}
Data files contents:
goose@t410:/tmp/opencsvbeans$ xxd pojo.csv
00000000: 706f 696e 742c 6e61 6d65 0a31 2c67 6f6f point,name.1,goo
00000010: 7365 0a32 2c64 7563 6b0a se.2,duck.
goose@t410:/tmp/opencsvbeans$ xxd pojo-bom.csv
00000000: efbb bf70 6f69 6e74 2c6e 616d 650a 312c ...point,name.1,
00000010: 676f 6f73 650a 322c 6475 636b 0a goose.2,duck.
Run and output:
goose@t410:/tmp/opencsvbeans$ mvnt exec:java -Dexec.args=pojo-bom.csv
[INFO] Scanning for projects...
[INFO]
[INFO] -------------< com.technojeeves.opencsvbeans:opencsvbeans >-------------
[INFO] Building opencsvbeans 1.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- exec-maven-plugin:1.4.0:java (default-cli) @ opencsvbeans ---
[name=goose,point=1, name=duck,point=2]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.189 s
[INFO] Finished at: 2022-12-12T11:31:02Z
[INFO] ------------------------------------------------------------------------
goose@t410:/tmp/opencsvbeans$ mvnt exec:java -Dexec.args=pojo.csv
[INFO] Scanning for projects...
[INFO]
[INFO] -------------< com.technojeeves.opencsvbeans:opencsvbeans >-------------
[INFO] Building opencsvbeans 1.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- exec-maven-plugin:1.4.0:java (default-cli) @ opencsvbeans ---
[name=goose,point=1, name=duck,point=2]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.245 s
[INFO] Finished at: 2022-12-12T11:31:11Z
[INFO] ------------------------------------------------------------------------
goose@t410:/tmp/opencsvbeans$