Search code examples
javaioinputstreamstring-parsing

Reading multiple records from a flat file in Java


I have a text file dump that I need to convert to a delimited file. The file contains a series of "records" (for lack of a better word) formatted like this:

User: abc123 
Date: 7/3/12
Subject: the foo is bar
Project: 123456
Problem: foo bar in multiple lines of text
Resolution: foo un-barred in multiple lines of text

User: abc123 
Date: 7/3/12
Subject: the foo is bar
Project: 234567
Problem: foo bar in multiple lines of text
Resolution: foo un-barred in multiple lines of text

...

My end result is to get a flat file of delimited values. Using the records above, we would see:

abc123;7/3/12;the foo is bar;123456;foo bar in multiple lines of text;foo un-barred in multiple lines of text
abc123;7/3/12;the foo is bar;234567;foo bar in multiple lines of text;foo un-barred in multiple lines of text

Code appears below, and following that, the problem I'm experiencing.

    import java.util.*;
import java.io.*;
import java.nio.file.*;
//
public class ParseOutlookFolderForSE
{
        public static void main(String args[])
        {
            String user = "";
            String PDLDate = "";
            String name = "";
            String PDLNum = "";
            String problemDesc = "test";
            String resolutionDesc = "test";
            String delim = ";";
            int recordCounter = 0;
            //
            try
            {
                Path file = Paths.get("testfile2.txt");
                FileInputStream fstream = new FileInputStream("testfile2.txt");
               // Get the object of DataInputStream
                /* DataInputStream in = new DataInputStream(fstream);  */
                BufferedReader br = new BufferedReader(new InputStreamReader(fstream));  //Buffered Reader
                String inputLine = null;     //String
                StringBuffer theText = new StringBuffer();  //StringBuffer
// problem: output contains last record ONLY. program is cycling through the entire file, overwriting records until the end.
// add a for loop based on recordCounter
                for(recordCounter=0;recordCounter<10;recordCounter++)
                {
                while((inputLine=br.readLine())!=null)
                {
                    if(inputLine.toLowerCase().startsWith("from:"))
                    {

                /*      recordCounter = recordCounter++;    */  // commented out when I added recordCounter++ to the for loop
                        user = inputLine.trim().substring(5).trim();
                    }
                    else
                    if(inputLine.toLowerCase().startsWith("effective date"))
                    {

                        PDLDate = inputLine.trim().substring(15).trim();
                    }
                    else
                    if(inputLine.toLowerCase().startsWith("to:"))
                    {

                        name = inputLine.trim().substring(3).trim();
                    }
                    else
                    if(inputLine.toLowerCase().startsWith("sir number"))
                    {

                        PDLNum = inputLine.trim().substring(12).trim();
                    }
                }      //close for loop
                }   // close while
                System.out.println(recordCounter + "\n" + user + "\n" + name + "\n" + PDLNum + "\n" + PDLDate + "\n" + problemDesc + "\n" + resolutionDesc);
                System.out.println(recordCounter + ";" + user + ";" + name + ";" + PDLNum + ";" + PDLDate + ";" + problemDesc + ";" + resolutionDesc);
                String lineForFile = (recordCounter + ";" + user + ";" + name + ";" + PDLNum + ";" + PDLDate + ";" + problemDesc + ";" + resolutionDesc + System.getProperty("line.separator"));
                System.out.println(lineForFile);
                try
                {
                    BufferedWriter out = new BufferedWriter(new FileWriter("testfileoutput.txt"));
                    out.write(lineForFile);
                    out.close();
                }
                catch (IOException e)
                {
                    System.out.println("Exception ");
                }
            } //close try
            catch (Exception e)
            {
                System.err.println("Error: " + e.getMessage());
            }
        }

    }

My final output is ONLY the last record. I believe that what's happening is that the program is reading every line, but only the LAST one doesn't get overwritten with the next record. Makes sense. So I added a FOR loop, incrementing by 1 if(inputLine.toLowerCase().startsWith("user:")) and outputting the counter variable with my data to validate what's happening.

My FOR loop begins after step 3 in my pseudocode...after BufferedReader but before my IF statements. I terminate it after I write to the file in step 6. I'm using for(recCounter=0;recCounter<10;recCounter++) and while I get ten records in my output file, they are all instances of the LAST record of the input file, numbered 0-9.

Leaving the for loop in the same place, I modified it to read for(recCounter=0;recCounter<10;) and placed recCounter's increment WITHIN the IF statement, incrementing every time the line starts with User:. In this case, I also got ten records in my output file, they were ten instances of the last record in the input file, and all the counters are 0.

EDIT: Given how the file is formatted, the ONLY way to determine w=one record from the next is a subsequent instance of the word "User:" at the start of the line. Each time that occurs, until the NEXT time it occurs is what constitutes a single record.

It appears as though I'm not setting my "recCounter" appropriately, or I'm not interpreting the results of what IS being set as "start a new record".

Anyone have any suggestions for how to read this file as multiple records?


Solution

  • Okay, so your pseudo-code should go something like this:

    declare variables
    open file
    while not eof
      read input
      if end of set
        format output
        write output
        clear variables
      figure out which variable
      store in correct variable
    end-while
    

    There might be a trick to figuring out when you've finished one set and can start the next. If a set is supposed to be terminated by a blank line as appears from your example, then you could just check for the blank line. Otherwise, how do you know? Does a set always start with "user"?

    Also, don't forget to write the last record. You don't want to leave unwritten stuff in your buffer/table.