Search code examples
javabufferedreadernio

java bufferedReader.readLine() can't read whole file line


our back-end program read txt file and process something line by line.

for reading file line by line, It is using bufferedReader.readLine.

but sometimes, once a quarter, bufferedReader can't read whole line.

if there is 1000 line file actually, readLine() just read 1~530.

I'm sure file is well formed. when I try reading file again after this error, It can read whole line perfectly.

this file is uploaded via FTP and file watcher batch to detect file is running.

below is code:

String fromFilePath = "/DATA/EXAMPLE.TXT"; //upload filepath example
String toFilePath = "/DATA/PROC/EXAMPLE.TXT";  //filepath to move

//read file after moving to another directory, to avoid catching file by file watcher and file in target path never exist. 
Files.move(Paths.get(fromFilePath), Paths.get(toFilePath), java.nio.file.StandardCopyOption.REPLACE_EXISTING, java.nio.file.StandardCopyOption.ATOMIC_MOVE);

BufferedReader br = new BufferedReader((new InputStreamReader(new FileInputStream(toFilePath, "UTF-8")));
int fileRowCount = 0;
String readLineResult = null;

while(readLineResult = br.readLine() != null){
  fileRowCount++;

  doBusinessLogic(readLineResult);

}

log.info("file log count {}", fileRowCount);

//confirm process to solve this problem. 
br = new BufferedReader((new InputStreamReader(new FileInputStream(toFilePath, "UTF-8")));
int assertCount= 0;

while(br.readLine() != null){
  assertCount++;
}

//it always print 'true' when occuring error, although BufferedReader is initialized newly 
log.info("assert {}", assertCount==fileRowCount);

fileRowCount can't print whole line number. of course, doBusinessLogic is also executed partially.

OS : redhat 7.4

java version : 1.7.0_181


Solution

  • The likely reason for that behaviour is that your program starts reading while the file upload is still in progress. To avoid that, you should make sure your program only reads completely transfered files. If you have any influence on the upload procress, let the uploader use a temporary file name (that is ignored by the reader), then rename the file after the transfer. If that is not possible, you could either check the file for completeness before reading (if the file end is clearly recognizeable) or wait for some time after the file appears before you start reading. That last option is probably the easiest to implement, but setting the delay long enough to ensure safe completion of the transfer takes some guesswork.