Search code examples
javafilereadlinetweets

while reading lines from file, java program is reading one single line into two different lines in some cases


I need to read lots of text files to develop my project. Each file contains tweets and retweets of a person. I wrote simple java code to do that. I also tried to read the files using c code. it is showing same problems as well The program can read some lines properly, but in some cases in it breaking the lines and reading 1 single line into two different lines. In some places the program is inputting new lines as well.

I need to read the files as it is they are. Could you kindly let me know, is it due to the inputs of files or due to some other reason. Is there any solution? thanks

Below is my code which is very simple.

public class Check {

public static void main(String[] args) throws FileNotFoundException, IOException {

   File InfileName = new File ("c:/users/syeda/desktop/12.txt");

   Scanner in = new Scanner(new FileReader(InfileName));

   String line="";
   int lineNo=0;

   while(in.hasNext()== true)
           {
                line = in.nextLine();
                System.out.println(line); 
                lineNo++;

            } 
    System.out.println(lineNo);

  }
}

My input file contains only 800 lines but it is showing 819 lines as output. The extra 19 lines are some blank lines which are not in the input files and some lines from input file are broken into two lines and showing the extra 19 lines


Solution

  • Your data is not what you think it is:

    Your file has multiple line separators in a row. That is where the blank lines are coming from.

    \n\n will count as an empty line, Windows is probably \n\r\n\r.

    End of line markers are invisible in things like TextPad you have \n or \n\r where you do not think they are, it is that simple.

    Garbage In, Garbage Out

    Code is correct, data is wrong.

    Also Scanner is the wrong choice, BufferedReader would be a better solution.