Search code examples
javaregexfileinputstreamdatainputstream

strip data from a text file


Im going to start by posting what the date in the text file looks like, this is just 4 lines of it, the actually file is a couple hundred lines long.

Friday, September 9 2011
-STV 101--------05:00 - 23:59 SSB 4185 Report Printed on 9/08/2011 at 2:37

0-AH 104--------07:00 - 23:00 AH GYM Report Printed on 9/08/2011 at 2:37

-BG 105--------07:00 - 23:00 SH GREAT HALL Report Printed on 9/08/2011 at 2:37

What I want to do with this text file is ignore the first line with the date on it, and then ignore the '-' on the next line but read in the "STV 101", "5:00" and "23:59" save them to variables and then ignore all other characters on that line and then so on for each line after that.

Here is how I am currently reading the lines entirely. And then I just call this function once the user has put the path in the scheduleTxt JTextfield. It can read and print each line out fine.

public void readFile () throws IOException
{
    try
    {
        FileInputStream fstream = new FileInputStream(scheduleTxt.getText());
        DataInputStream in = new DataInputStream(fstream);
        BufferedReader br = new BufferedReader(new InputStreamReader(in));
        String strLine;

        while ((strLine = br.readLine()) != null)   
        {
            System.out.println (strLine);
        }
        in.close();
    }
    catch (Exception e){//Catch exception if any
        System.err.println("Error: " + e.getMessage());
    }
}

UPDATE: it turns out I also need to strip Friday out of the top line and put it in a variable as well Thanks! Beef.


Solution

  • Did not test it thoroughly, but this regular expression would capture the info you need in groups 2, 5 and 7: (Assuming you're only interested in "AH 104" in the example of "0-AH 104----") ^(\S)*-(([^-])*)(-)+((\S)+)\s-\s((\S)+)\s(.)*

        String regex = "^(\\S)*-(([^-])*)(-)+((\\S)+)\\s-\\s((\\S)+)\\s(.)*";
        Pattern pattern = Pattern.compile(regex);
        while ((strLine = br.readLine()) != null){
            Matcher matcher = pattern.matcher(strLine);
            boolean matchFound = matcher.find();
            if (matchFound){
                String s1 = matcher.group(2);
                String s2 = matcher.group(5);
                String s3 = matcher.group(7);
                System.out.println (s1 + " " + s2 + " " + s3);
            }
    
        }
    

    The expression could be tuned with non-capturing groups in order to capture only the information you want.

    Explanation of the regexp's elements:

    1. ^(\S)*- Matches group of non-whitespace characters ended by -. Note: Could have been ^(.)*- instead, would not work if there are whitespaces before the first -.
    2. (([^-])*) Matches group of every character except -.
    3. (-)+ Matches group of one or more -.
    4. ((\S)+) Matches group of one or more non-white-space characters. This is captured in group 5.
    5. \s-\s Matches group of white-space followed by - followed by whitespace.
    6. '((\S)+)' Same as 4. This is captured in group 7.
    7. \s(.)* Matches white-space followed by anything, which will be skipped.

    More info on regular expression can be found on this tutorial. There are also several useful cheatsheets around. When designing/debugging an expression, a regexp testing tool can prove quite useful, too.