Search code examples
javaperformancerandomaccessfile

How can I improve the performance of execution time? And Is their any better way to read this file?


I am trying to split a text file with multiple threads. The file is of 1 GB. I am reading the file by char. The Execution time is 24 min 54 seconds. Instead of reading a file by char is their any better way where I can reduce the execution time. I'm having a hard time figuring out an approach that will reduce the execution time. Please do suggest me also, if there is any other better way to split file with multiple threads. I am very new to java.

Any help will be appreciated. :)

    public static void main(String[] args) throws Exception {
        RandomAccessFile raf = new RandomAccessFile("D:\\sample\\file.txt", "r");
        long numSplits = 10;
        long sourceSize = raf.length();
        System.out.println("file length:" + sourceSize);
        long bytesPerSplit = sourceSize / numSplits;
        long remainingBytes = sourceSize % numSplits;

        int maxReadBufferSize = 9 * 1024;

        List<String> filePositionList = new ArrayList<String>();
        long startPosition = 0;
        long endPosition = bytesPerSplit;
        for (int i = 0; i < numSplits; i++) {
            raf.seek(endPosition);
            String strData = raf.readLine();
            if (strData != null) {
                endPosition = endPosition + strData.length();
            }
            String str = startPosition + "|" + endPosition;
            if (sourceSize > endPosition) {
                startPosition = endPosition;
                endPosition = startPosition + bytesPerSplit;
            } else {
                break;
            }
            filePositionList.add(str);
        }

        for (int i = 0; i < filePositionList.size(); i++) {

            String str = filePositionList.get(i);
            String[] strArr = str.split("\\|");
            String strStartPosition = strArr[0];
            String strEndPosition = strArr[1];
            long startPositionFile = Long.parseLong(strStartPosition);
            long endPositionFile = Long.parseLong(strEndPosition);
            MultithreadedSplit objMultithreadedSplit = new MultithreadedSplit(startPositionFile, endPositionFile);
            objMultithreadedSplit.start();
        }

        long endTime = System.currentTimeMillis();

        System.out.println("It took " + (endTime - startTime) + " milliseconds");
    }

}
public class MultithreadedSplit extends Thread {

    public static String filePath = "D:\\tenlakh\\file.txt";
    private int localCounter = 0;
    private long start;
    private long end;
    public static String outPath;

    List<String> result = new ArrayList<String>();

    public MultithreadedSplit(long startPos, long endPos) {
        start = startPos;
        end = endPos;
    }

    @Override
    public void run() {
        try {
            String threadName = Thread.currentThread().getName();

            long currentTime = System.currentTimeMillis();
            RandomAccessFile file = new RandomAccessFile("D:\\sample\\file.txt", "r");  
            String outFile = "out_" + threadName + ".txt";
            System.out.println("Thread Reading started for start:" + start + ";End:" + end+";threadname:"+threadName);
            FileOutputStream out2 = new FileOutputStream("D:\\sample\\" + outFile);
            file.seek(start);
            int nRecordCount = 0;

            char c = (char) file.read();
            StringBuilder objBuilder = new StringBuilder();
            int nCounter = 1;
            while (c != -1) {
                objBuilder.append(c);
                // System.out.println("char-->" + c);
                if (c == '\n') {
                    nRecordCount++;
                    out2.write(objBuilder.toString().getBytes());
                    objBuilder.delete(0, objBuilder.length());
                    //System.out.println("--->" + nRecordCount);
                    //      break;
                }
                c = (char) file.read();
                nCounter++;
                if (nCounter > end) {
                    break;
                }
            }
        } catch (Exception ex) {
           ex.printStackTrace();
        }

    }
}

Solution

  • The fastest way would be to map the file into memory segment by segment (mapping a large file as a whole may cause undesired side effects). It will skip few relatively expensive copy operations. The operating system will load file into RAM and JRE will expose it to your application as a view into an off-heap memory area in a form of a ByteBuffer. It would usually allow you to squeze last 2x/3x of the performance.

    Memory-mapped way requires quite a bit of helper code (see the fragment in the bottom), it's not always the best tactical way. Instead, if your input is line-based and you just need reasonable performance (what you have now is probably not) then just do something like:

    import java.nio.Files;
    import java.nio.Paths;
    ...
    File.lines(Paths.get("/path/to/the/file"), StandardCharsets.ISO_8859_1)
    //      .parallel() // parallel processing is still possible
            .forEach(line -> { /* your code goes here */ });
    

    For the contrast, a working example of the code for working with the file via memory mapping would look something like below. In case of fixed-size records (when segments can be selected precisely to match record boundaries) subsequent segments can be processed in parallel.

    static ByteBuffer mapFileSegment(FileChannel fileChannel, long fileSize, long regionOffset, long segmentSize) throws IOException {
        long regionSize = min(segmentSize, fileSize - regionOffset);
    
        // small last region prevention
        final long remainingSize = fileSize - (regionOffset + regionSize);
        if (remainingSize < segmentSize / 2) {
            regionSize += remainingSize;
        }
    
        return fileChannel.map(FileChannel.MapMode.READ_ONLY, regionOffset, regionSize);
    }
    
    ...
    
    final ToIntFunction<ByteBuffer> consumer = ...
    try (FileChannel fileChannel = FileChannel.open(Paths.get("/path/to/file", StandardOpenOption.READ)) {
        final long fileSize = fileChannel.size();
    
        long regionOffset = 0;
        while (regionOffset < fileSize) {
            final ByteBuffer regionBuffer = mapFileSegment(fileChannel, fileSize, regionOffset, segmentSize);
            while (regionBuffer.hasRemaining()) {
                final int usedBytes = consumer.applyAsInt(regionBuffer);
                if (usedBytes == 0)
                    break;
            }
            regionOffset += regionBuffer.position();
        }
    } catch (IOException ex) {
        throw new UncheckedIOException(ex);
    }