I have a (possibly long) list of binary files that I want to read lazily. There will be too many files to load into memory. I'm currently reading them as a MappedByteBuffer with FileChannel.map()
, but that probably isn't required. I want the method readBinaryFiles(...)
to return a Java 8 Stream so I can lazy load the list of files as I access them.
public List<FileDataMetaData> readBinaryFiles(
List<File> files,
int numDataPoints,
int dataPacketSize )
throws
IOException {
List<FileDataMetaData> fmdList = new ArrayList<FileDataMetaData>();
IOException lastException = null;
for (File f: files) {
try {
FileDataMetaData fmd = readRawFile(f, numDataPoints, dataPacketSize);
fmdList.add(fmd);
} catch (IOException e) {
logger.error("", e);
lastException = e;
}
}
if (null != lastException)
throw lastException;
return fmdList;
}
// The List<DataPacket> returned will be in the same order as in the file.
public FileDataMetaData readRawFile(File file, int numDataPoints, int dataPacketSize) throws IOException {
FileDataMetaData fmd;
FileChannel fileChannel = null;
try {
fileChannel = new RandomAccessFile(file, "r").getChannel();
long fileSz = fileChannel.size();
ByteBuffer bbRead = ByteBuffer.allocate((int) fileSz);
MappedByteBuffer buffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0, fileSz);
buffer.get(bbRead.array());
List<DataPacket> dataPacketList = new ArrayList<DataPacket>();
while (bbRead.hasRemaining()) {
int channelId = bbRead.getInt();
long timestamp = bbRead.getLong();
int[] data = new int[numDataPoints];
for (int i=0; i<numDataPoints; i++)
data[i] = bbRead.getInt();
DataPacket dp = new DataPacket(channelId, timestamp, data);
dataPacketList.add(dp);
}
fmd = new FileDataMetaData(file.getCanonicalPath(), fileSz, dataPacketList);
} catch (IOException e) {
logger.error("", e);
throw e;
} finally {
if (null != fileChannel) {
try {
fileChannel.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return fmd;
}
Returning fmdList.Stream()
from readBinaryFiles(...)
won't accomplish this because the file contents will already have been read into memory, which I won't be able to do.
The other approaches to reading the contents of multiple files as a Stream rely on using Files.lines()
, but I need to read binary files.
I'm, open to doing this in Scala or golang if those languages have better support for this use case than Java.
I'd appreciate any pointers on how to read the contents of multiple binary files lazily.
This should be sufficient:
return files.stream().map(f -> readRawFile(f, numDataPoints, dataPacketSize));
…if, that is, you are willing to remove throws IOException
from the readRawFile method’s signature. You could have that method catch IOException internally and wrap it in an UncheckedIOException. (The problem with deferred execution is that the exceptions also need to be deferred.)