Search code examples
javaspringfilelarge-filesr.java-file

handle 1gb data file to read words and calculate max length word?


i wrote this code but its going to fail on 1gb size file.

public class TestFiles {

    public static void main(String[] args) {
        int minLength = Integer.MAX_VALUE;
        int maxLength = Integer.MIN_VALUE;
        String minWord = "";
        String maxWord = "";
        List<String> words = new ArrayList<>();
        try {
            File myObj = new File("C:\\Users\\Downloads\\java.txt");
            Scanner myReader = new Scanner(myObj);
            while (myReader.hasNextLine()) {
                String data = myReader.nextLine();
                String[] dataArray = data.split(" ");
                List<String> list = Arrays.asList(dataArray);
                for (String s : list) {
                    if (s.length() < minLength) {
                        minLength = s.length();
                        minWord = s;
                    } else if (s.length() > maxLength) {
                        maxLength = s.length();
                        maxWord = s;
                    }
                }
            }
            myReader.close();
        } catch (Exception e) {
            // TODO: handle exception
        }
        System.out.println("min length " + minLength + " - max lenth " + maxLength);
        System.out.println("min length word " + minWord + " - max lenth word " + maxLength);
    }
}

could you please answers? how can i solve this?


Solution

  • The problem gets obvious, when 1gb words are squashed into 1 line!*

    Solution: Not to process the input "line-wise", but "word-wise", which is suf- & efficient! ;)

    Voila:

    public class TestFiles {
    
      public static void main(String[] args) {
        int minLength = Integer.MAX_VALUE;
        int maxLength = Integer.MIN_VALUE;
        String minWord = "";
        String maxWord = "";
        try {
            File myObj = new File("C:\\Users\\Downloads\\java.txt");
            Scanner myReader = new Scanner(myObj);
            while (myReader.hasNext()) {
                String word = myReader.next();
                if (word.length() < minLength) {
                  minLength = word.length();
                  minWord = word;
                } 
                if (word.length() > maxLength) {
                  maxLength = word.length();
                  maxWord = word;
                }
              }
            }
            myReader.close();
        } catch (Exception e) {
            // TODO: handle exception
        }
        System.out.println("min length " + minLength + " - max lenth " + maxLength);
        System.out.println("min length word " + minWord + " - max lenth word " + maxLength);
      }
    }
    

    *when "many" words are in one line, then we could get problems here:

    • myReader.hasNextLine(),
    • String data = myReader.nextLine() and
    • String[] dataArray = data.split(" ");

    @huy's answer also correct: the else is "efficient for most cases", but is "incorrect for corner cases".