Search code examples
javastringfilefileinputstream

Look for an amount of substring in a file Java


I am looking for an amount of substring in a file In brief, the file contains a certain amount of article, and I need to know how many. Each article starts with: @ARTICLE{ or with @ARTICLE{(series of integer)

Useful infos: - I have 10 files to look in - No files are empty - This code gives me a StringIndexOutOfBounds exception

Here is the code I have so far:

//To read through all files
    for(int i=1; i<=10; i++)
    {
    try
        {       
            //To look through all the bib files
            reader = new Scanner(new FileInputStream("C:/Assg_3-Needed-Files/Latex"+i+".bib"));
            System.out.println("Reading Latex"+i+".bib->");

            //To read through the whole file
            while(reader.hasNextLine())
            {
                String line = reader.nextLine();
                String articles = line.substring(1, 7);

                if(line.equals("ARTICLE"))
                    count+=1;
            }
        }
    catch(FileNotFoundException e)
        {
            System.err.println("Error opening the file Latex"+i+".bib");
        }
    }
    System.out.print("\n"+count);

Solution

  • Try just using String#contains on each line:

    while(reader.hasNextLine()) {
        String line = reader.nextLine();
        if (line.contains("ARTICLE")) {
            count += 1;
        }
    }
    

    This would at least get around the problem of having to take a substring in the first place. The problem is that while matching lines should not have the out of bounds exception, nor should lines longer than 7 characters which don't match, lines having fewer than 7 characters would cause a problem.

    You could also use a regex pattern to make sure that you match ARTICLE as a standalone word:

    while(reader.hasNextLine()) {
        String line = reader.nextLine();
        if (line.matches("\\bARTICLE\\b")) {
            count += 1;
        }
    }
    

    This would ensure that you don't count a line having something like articles in it, which is not your exact target.