Here's the code:
public static void mergeAllFilesJavolution()throws FileNotFoundException, IOException {
String fileDir = "C:\\TestData\\w12";
File dirSrc = new File(fileDir);
File[] list = dirSrc.listFiles();
long start = System.currentTimeMillis();
for(int j=0; j<list.length; j++){
int chr;
String srcFile = list[j].getPath();
String outFile = fileDir + "\\..\\merged.txt";
UTF8StreamReader inFile=new UTF8StreamReader().setInput(new FileInputStream(srcFile));
UTF8StreamWriter outPut=new UTF8StreamWriter().setOutput(new FileOutputStream(outFile, true));
while((chr=inFile.read()) != -1) {
outPut.write(chr);
}
outPut.close();
inFile.close();
}
System.out.println(System.currentTimeMillis()-start);
}
File size of the utf-8 file is 200MB as test data but high possibility of 800MB up.
Here's the UTF8StreamReader.read() source code.
/**
* Holds the bytes buffer.
*/
private final byte[] _bytes;
/**
* Creates a UTF-8 reader having a byte buffer of moderate capacity (2048).
*/
public UTF8StreamReader() {
_bytes = new byte[2048];
}
/**
* Reads a single character. This method will block until a character is
* available, an I/O error occurs or the end of the stream is reached.
*
* @return the 31-bits Unicode of the character read, or -1 if the end of
* the stream has been reached.
* @throws IOException if an I/O error occurs.
*/
public int read() throws IOException {
byte b = _bytes[_start];
return ((b >= 0) && (_start++ < _end)) ? b : read2();
}
The error occurs at _bytes[_start] because the _bytes = new byte[2048].
Here's another UTF8StreamReader constructor:
/**
* Creates a UTF-8 reader having a byte buffer of specified capacity.
*
* @param capacity the capacity of the byte buffer.
*/
public UTF8StreamReader(int capacity) {
_bytes = new byte[capacity];
}
Problem: How can I specified the correct capacity of the _bytes upon UTF8StreamReader creation?
I tried the File.length() but it returns long type (i think its right because I am expecting huge file size but the constructor receiving only by int type).
Any guidance on the right direction is appreciated.
It seems anybody does not yet experience same with the above situation.
Anyway, I tried other solution by not using the above class (UTF8StreamReader) rather ByteBuffer (UTF8ByteBufferReader). It is incredible faster than StreamReader.