i need to split the given text file into equally sized chunks and store them into an array. The input is a set of many text files in same folder. Im using the following code for this:
int inc = 0;
File dir = new File("C:\\Folder");
File[] files = dir.listFiles();
for (File f : files) {
if(f.isFile()) {
BufferedReader inputStream = null;
try {
inputStream = new BufferedReader(new FileReader(f));
String line;
while ((line = inputStream.readLine()) != null) {
String c[] = splitByLength(line, chunksize);
for (int i=0;i<c.length;i++) {
chunk[inc] = c[i];
inc++;
}
}
}
finally {
if (inputStream != null) {
inputStream.close();
}
}
}
}
public static String[] splitByLength(String s, int chunkSize) {
int arraySize = (int) Math.ceil((double) s.length() / chunkSize);
String[] returnArray = new String[arraySize];
int index = 0;
for(int i=0; i<s.length(); i=i+chunkSize) {
if(s.length() - i < chunkSize) {
returnArray[index++] = s.substring(i);
}
else {
returnArray[index++] = s.substring(i, i+chunkSize);
}
}
return returnArray;
}
Here the chunk values are stored in the "chunk" array. But the problem here is since i have used the readLine() command to parse the text file, the result obtained is correct only if the chunk size is less than the number of characters in a line. Lets say every line has 10 characters and the number of lines in the file is 5. Then if i provide chunk size of any value greater than 10 it always split the file into 10 chunks with each line in each chunk.
Example, consider a file with the following contents,
abcdefghij
abcdefghij
abcdefghij
abcdefghij
abcdefghij
if chunk size = 5 then,
abcde | fghij | abcde | fghij | abcde | fghij | abcde | fghij | abcde | fghij |
if chunk size = 10 then,
abcdefghij | abcdefghij | abcdefghij | abcdefghij | abcdefghij |
if chunk size > 10 then also my code only provides the same as before,
abcdefghij | abcdefghij | abcdefghij | abcdefghij | abcdefghij |
I tried using RandomAccessFile and FileChannel but i wasnt able to obtain the needed results... Can anyone help me solve this problem? thank you..
That's because BufferedReader.readLine()
reads only a line not the whole file.
I assume that the line break characters \r
and \n
are not part of the normal content you interested in.
Maybe that helps.
// ...
StringBuilder sb = new StringBuilder();
String line;
while ((line = inputStream.readLine()) != null) {
sb.append(line);
// if enough content is read, extract the chunk
while (sb.length() >= chunkSize) {
String c = sb.substring(0, chunkSize);
// do something with the string
// add the remaining content to the next chunk
sb = new StringBuilder(sb.substring(chunkSize));
}
}
// thats the last chunk
String c = sb.toString();
// do something with the string