Search code examples
javaeclipsefileinputstream

FileInputStream only reads the first word in a file


I want to read words in file.txt file token by token and add a Part of Speech tag to each of them and write it to file2.text file. file.txt content is tokenized. So here's my code.

public class PoSTagging {
@SuppressWarnings("resource")
public static void PoStagMethod() throws IOException {

FileInputStream fin= new FileInputStream("C:\\Users\\dell\\Desktop\\file.txt");
DataInputStream in = new DataInputStream(fin);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strline=br.readLine();
System.out.println(strline+"first");

try{
POSModel model = new POSModelLoader().load(new File("en-pos-maxent.bin"));
PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent");
POSTaggerME tagger = new POSTaggerME(model);

String input = strline;
@SuppressWarnings("deprecation")
ObjectStream<String> lineStream =new PlainTextByLineStream(new StringReader(input));

perfMon.start();
String line;
while ((line = lineStream.read()) != null) {

    String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE.tokenize(line);
    String[] tags = tagger.tag(whitespaceTokenizerLine);

    POSSample sample = new POSSample(whitespaceTokenizerLine, tags);
    System.out.println(sample.toString()+"second");
    //String t=sample.toString();

    FileOutputStream fout=new FileOutputStream("C:\\Users\\dell\\Desktop\\file2.txt");
    //fout.write(t.getBytes());

    perfMon.incrementCounter();
    fout.close();
}
perfMon.stopAndPrintFinalResult();
}
catch (IOException e) {
    e.printStackTrace();
}
}
}

When PoStagMethod() is invoked from another class, only the first word in file.txt file gets written into the file2.txt file. Why won't it read other words in the file? What is wrong with my code?


Solution

  • You can simply read the file.txt line by line using BufferedReader. Then process each line as you know with your POSModel, then write the outputs to the file2.txt using BufferedWriter. A snippet code as below might help:

        POSModel model = new POSModelLoader().load(new File("en-pos-maxent.bin"));
        PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent");
        POSTaggerME tagger = new POSTaggerME(model);
    
        BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter("C:\\Users\\dell\\Desktop\\file2.txt"));
    
        BufferedReader bufferedReader = new BufferedReader(new FileReader("C:\\Users\\dell\\Desktop\\file.txt"));
        String line = "";
        while((line = bufferedReader.readLine()) != null){
            String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE.tokenize(line);
            String[] tags = tagger.tag(whitespaceTokenizerLine);
            // Do your work with your tags and tokenized words
    
    
            bufferedWriter.write(/* the string which is needed to be written to your output */);
            // for adding new-lines in the output file, uncomment the following line:
            //bufferedWriter.newLine();
        }
    
        //Do not forget to flush() and close() the streams after your job is done:
        bufferedWriter.flush();
        bufferedWriter.close();
        bufferedReader.close();
    

    If you could make this work, it's not bad to replace old-fashioned try-catch clause with try-with-resource which was added in java 1.7 to close the resources automatically.

    Also If you need to write each word and it's tags in separated lines you may want to have an inner loop for writing to the file. It would be something like below:

        POSModel model = new POSModelLoader().load(new File("en-pos-maxent.bin"));
        PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent");
        POSTaggerME tagger = new POSTaggerME(model);
    
        BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter("C:\\Users\\dell\\Desktop\\file2.txt"));
    
        BufferedReader bufferedReader = new BufferedReader(new FileReader("C:\\Users\\dell\\Desktop\\file.txt"));
        String line = "";
        while((line = bufferedReader.readLine()) != null){
            String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE.tokenize(line);
            String[] tags = tagger.tag(whitespaceTokenizerLine);
            for(String word: whitespaceTokenizerLine){
    
            // Do your work with your tags and tokenized words
    
            bufferedWriter.write(/* the string which is needed to be written to your output */);
            // for adding new-lines in the output file, uncomment the following line:
            //bufferedWriter.newLine();
            }
        }
    
        //Do not forget to flush() and close() the streams after your job is done:
        bufferedWriter.flush();
        bufferedWriter.close();
        bufferedReader.close();
    

    Hope this would be helpful,

    Good Luck.