I have a text file dump that I need to convert to a delimited file. The file contains a series of "records" (for lack of a better word) formatted like this:
User: abc123
Date: 7/3/12
Subject: the foo is bar
Project: 123456
Problem: foo bar in multiple lines of text
Resolution: foo un-barred in multiple lines of text
User: abc123
Date: 7/3/12
Subject: the foo is bar
Project: 234567
Problem: foo bar in multiple lines of text
Resolution: foo un-barred in multiple lines of text
...
My end result is to get a flat file of delimited values. Using the records above, we would see:
abc123;7/3/12;the foo is bar;123456;foo bar in multiple lines of text;foo un-barred in multiple lines of text
abc123;7/3/12;the foo is bar;234567;foo bar in multiple lines of text;foo un-barred in multiple lines of text
Code appears below, and following that, the problem I'm experiencing.
import java.util.*;
import java.io.*;
import java.nio.file.*;
//
public class ParseOutlookFolderForSE
{
public static void main(String args[])
{
String user = "";
String PDLDate = "";
String name = "";
String PDLNum = "";
String problemDesc = "test";
String resolutionDesc = "test";
String delim = ";";
int recordCounter = 0;
//
try
{
Path file = Paths.get("testfile2.txt");
FileInputStream fstream = new FileInputStream("testfile2.txt");
// Get the object of DataInputStream
/* DataInputStream in = new DataInputStream(fstream); */
BufferedReader br = new BufferedReader(new InputStreamReader(fstream)); //Buffered Reader
String inputLine = null; //String
StringBuffer theText = new StringBuffer(); //StringBuffer
// problem: output contains last record ONLY. program is cycling through the entire file, overwriting records until the end.
// add a for loop based on recordCounter
for(recordCounter=0;recordCounter<10;recordCounter++)
{
while((inputLine=br.readLine())!=null)
{
if(inputLine.toLowerCase().startsWith("from:"))
{
/* recordCounter = recordCounter++; */ // commented out when I added recordCounter++ to the for loop
user = inputLine.trim().substring(5).trim();
}
else
if(inputLine.toLowerCase().startsWith("effective date"))
{
PDLDate = inputLine.trim().substring(15).trim();
}
else
if(inputLine.toLowerCase().startsWith("to:"))
{
name = inputLine.trim().substring(3).trim();
}
else
if(inputLine.toLowerCase().startsWith("sir number"))
{
PDLNum = inputLine.trim().substring(12).trim();
}
} //close for loop
} // close while
System.out.println(recordCounter + "\n" + user + "\n" + name + "\n" + PDLNum + "\n" + PDLDate + "\n" + problemDesc + "\n" + resolutionDesc);
System.out.println(recordCounter + ";" + user + ";" + name + ";" + PDLNum + ";" + PDLDate + ";" + problemDesc + ";" + resolutionDesc);
String lineForFile = (recordCounter + ";" + user + ";" + name + ";" + PDLNum + ";" + PDLDate + ";" + problemDesc + ";" + resolutionDesc + System.getProperty("line.separator"));
System.out.println(lineForFile);
try
{
BufferedWriter out = new BufferedWriter(new FileWriter("testfileoutput.txt"));
out.write(lineForFile);
out.close();
}
catch (IOException e)
{
System.out.println("Exception ");
}
} //close try
catch (Exception e)
{
System.err.println("Error: " + e.getMessage());
}
}
}
My final output is ONLY the last record. I believe that what's happening is that the program is reading every line, but only the LAST one doesn't get overwritten with the next record. Makes sense. So I added a FOR
loop, incrementing by 1 if(inputLine.toLowerCase().startsWith("user:"))
and outputting the counter variable with my data to validate what's happening.
My FOR
loop begins after step 3 in my pseudocode...after BufferedReader
but before my IF
statements. I terminate it after I write to the file in step 6. I'm using for(recCounter=0;recCounter<10;recCounter++)
and while I get ten records in my output file, they are all instances of the LAST record of the input file, numbered 0-9.
Leaving the for loop in the same place, I modified it to read for(recCounter=0;recCounter<10;)
and placed recCounter
's increment WITHIN the IF
statement, incrementing every time the line starts with User:
. In this case, I also got ten records in my output file, they were ten instances of the last record in the input file, and all the counters are 0.
EDIT: Given how the file is formatted, the ONLY way to determine w=one record from the next is a subsequent instance of the word "User:" at the start of the line. Each time that occurs, until the NEXT time it occurs is what constitutes a single record.
It appears as though I'm not setting my "recCounter" appropriately, or I'm not interpreting the results of what IS being set as "start a new record".
Anyone have any suggestions for how to read this file as multiple records?
Okay, so your pseudo-code should go something like this:
declare variables
open file
while not eof
read input
if end of set
format output
write output
clear variables
figure out which variable
store in correct variable
end-while
There might be a trick to figuring out when you've finished one set and can start the next. If a set is supposed to be terminated by a blank line as appears from your example, then you could just check for the blank line. Otherwise, how do you know? Does a set always start with "user"?
Also, don't forget to write the last record. You don't want to leave unwritten stuff in your buffer/table.