Search code examples
javaconcatenationniobytebufferfilechannel

What method is more efficient for concatenating large files in Java using FileChannels


I want to find out what method is better of two that I have come up with for concatenating my text files in Java. If someone has some insight they can share about what goes on at the kernel level that explains the difference between these methods of writing to a FileChannel, I would greatly appreciate it.

From what I understand from documentation and other Stack Overflow conversations, the allocateDirect allocates space right on the drive, and mostly avoids using RAM. I have a concern that the ByteBuffer created with allocateDirect might have a potential to overflow or not be allocated if the File infile is large, say 1GB. I am guaranteed at this point in the development of our software that the File will be no larger than 2 GB; but there is potential in the future that it might be as big as 10 or 20GB.

I have observed that the transferFrom loop never goes through the loop more than once... so it seems to succeed in writing the entire infile at once; but I haven't tested it with files bigger than 60MB. I looped though, because the documentation specifies that there is no guarantee of how much will be written at once. With transferFrom only able to accept, on my system, an int32 as its count parameter, I won't be able to specify more than 2GB at a time be transferred... Again, kernel expertise would help me understand.

Thanks in advance for your help!!

Using a ByteBuffer:

boolean concatFiles(StringBuffer sb, File infile, File outfile) {

    FileChannel inChan = null, outChan = null;

    try {

        ByteBuffer buff = ByteBuffer.allocateDirect((int)(infile.length() + sb.length()));
        //write the stringBuffer so it goes in the output file first:
        buff.put(sb.toString().getBytes());

        //create the FileChannels:
        inChan  = new RandomAccessFile(infile,  "r" ).getChannel();
        outChan = new RandomAccessFile(outfile, "rw").getChannel();

        //read the infile in to the buffer:
        inChan.read(buff);

        // prep the buffer:
        buff.flip();

        // write the buffer out to the file via the FileChannel:
        outChan.write(buff);
        inChan.close();
        outChan.close();
     } catch...etc

}

Using trasferTo (or transferFrom):

boolean concatFiles(StringBuffer sb, File infile, File outfile) {

    FileChannel inChan = null, outChan = null;

    try {

        //write the stringBuffer so it goes in the output file first:    
        PrintWriter  fw = new PrintWriter(outfile);
        fw.write(sb.toString());
        fw.flush();
        fw.close();

        // create the channels appropriate for appending:
        outChan = new FileOutputStream(outfile, true).getChannel();
        inChan  = new RandomAccessFile(infile, "r").getChannel();

        long startSize = outfile.length();
        long inFileSize = infile.length();
        long bytesWritten = 0;

        //set the position where we should start appending the data:
        outChan.position(startSize);
        Byte startByte = outChan.position();

        while(bytesWritten < length){ 
            bytesWritten += outChan.transferFrom(inChan, startByte, (int) inFileSize);
            startByte = bytesWritten + 1;
        }

        inChan.close();
        outChan.close();
    } catch ... etc

Solution

  • transferTo() can be far more efficient as there is less data copying, or none if it can all be done in the kernel. And if it isn't on your platform it will still use highly tuned code.

    You do need the loop, one day it will iterate and your code will keep working.