java performance sorting collections space-efficiency

How to read each lines of the input and output them in sorted order?

Duplicate lines should be printed the same number of times they occur in the input. Special care needs to be taken so that a file with a lot of duplicate lines does not use more memory than what is required for the number of unique lines.

I've tried all the collection interfaces but none seems to be working for this question :( Can someone please help me?? Thanks.

The code below is memory inefficient, as it stores duplicate lines in the PriorityQueue. Hope this helps

public static void doIt(BufferedReader r, PrintWriter w) throws IOException {
    PriorityQueue<String> s=new PriorityQueue<String>();


    String   line;
    int n=0;
    while ((line = r.readLine()) != null) {


        s.add(line);
        n++;

    while (n!=0) {
        w.println(s.remove());
        n--;


    }


}

Solution

The ideal approach would be to use a sorted multiset, such as Guava's TreeMultiset.

If you're not allowed to use external libraries, you can replace s.add(line) with s.add(line.intern()). This tells the JVM to put a copy of each unique line into the String pool and share the same object among all the references.

Note that putting Strings into the pool may cause them to stick around for a long time, which can cause problems in long-running applications, so you don't want to do this casually in a production application, but for your homework problem it's fine. In the case of a production application, you'd want to put the Strings into a SortedMap where the value is the count of times the line appeared, but this is more complicated to code correctly.