Im going to start by posting what the date in the text file looks like, this is just 4 lines of it, the actually file is a couple hundred lines long.
Friday, September 9 2011
-STV 101--------05:00 - 23:59 SSB 4185 Report Printed on 9/08/2011 at 2:37
0-AH 104--------07:00 - 23:00 AH GYM Report Printed on 9/08/2011 at 2:37
-BG 105--------07:00 - 23:00 SH GREAT HALL Report Printed on 9/08/2011 at 2:37
What I want to do with this text file is ignore the first line with the date on it, and then ignore the '-' on the next line but read in the "STV 101", "5:00" and "23:59" save them to variables and then ignore all other characters on that line and then so on for each line after that.
Here is how I am currently reading the lines entirely. And then I just call this function once the user has put the path in the scheduleTxt JTextfield. It can read and print each line out fine.
public void readFile () throws IOException { try { FileInputStream fstream = new FileInputStream(scheduleTxt.getText()); DataInputStream in = new DataInputStream(fstream); BufferedReader br = new BufferedReader(new InputStreamReader(in)); String strLine; while ((strLine = br.readLine()) != null) { System.out.println (strLine); } in.close(); } catch (Exception e){//Catch exception if any System.err.println("Error: " + e.getMessage()); } }
UPDATE: it turns out I also need to strip Friday out of the top line and put it in a variable as well Thanks! Beef.
Did not test it thoroughly, but this regular expression would capture the info you need in groups 2, 5 and 7: (Assuming you're only interested in "AH 104" in the example of "0-AH 104----")
^(\S)*-(([^-])*)(-)+((\S)+)\s-\s((\S)+)\s(.)*
String regex = "^(\\S)*-(([^-])*)(-)+((\\S)+)\\s-\\s((\\S)+)\\s(.)*";
Pattern pattern = Pattern.compile(regex);
while ((strLine = br.readLine()) != null){
Matcher matcher = pattern.matcher(strLine);
boolean matchFound = matcher.find();
if (matchFound){
String s1 = matcher.group(2);
String s2 = matcher.group(5);
String s3 = matcher.group(7);
System.out.println (s1 + " " + s2 + " " + s3);
}
}
The expression could be tuned with non-capturing groups in order to capture only the information you want.
Explanation of the regexp's elements:
^(\S)*-
Matches group of non-whitespace characters ended by -
. Note: Could have been ^(.)*-
instead, would not work if there are whitespaces before the first -
.(([^-])*)
Matches group of every character except -
.(-)+
Matches group of one or more -
. ((\S)+)
Matches group of one or more non-white-space characters. This is captured in group 5.\s-\s
Matches group of white-space followed by -
followed by whitespace.\s(.)*
Matches white-space followed by anything, which will be skipped.More info on regular expression can be found on this tutorial. There are also several useful cheatsheets around. When designing/debugging an expression, a regexp testing tool can prove quite useful, too.