Search code examples
javadata-structuresword-list

Sorting a list based on frequency of words


I would need to sort a list of words based on its frequency.

My input:

Haha, hehe, haha, haha, hehe, hehe.... , Test

For example in my data structure I would have

Haha:3
Hehe:5
Test:10 

I would need the data structure to be sorted at the output in this manner:

Test:10
Hehe:5
Haha:3

Such that if I pop the top of the data structure I would be able to obtain the element and its corresponding frequency.

The number of elements is unknown initially and hence, an array would not be feasible. If I would like to obtain the top few elements I would just need to access it sequentially. Is this possible in Java?


Solution

  • First, want to confirm: Can you get all the whole words before sorting? Or these words come continuously in a stream?

    (1)For the former case, you can use a Set to store the words, then put them into a PriorityQueue. If you implement the comparator function, the queue will sort the words automatically. I create a new class Pair to store the text and frequency, see the code:

    import java.util.Queue;
    import java.util.PriorityQueue;
    import java.util.Set;
    import java.util.HashSet;
    import java.util.Comparator;
    
    public class PriorityQueueTest {
    
        public static class Pair {
            private String text;
            private int frequency;
    
            @Override
            public int hashCode() {
                return text.hashCode();
            }
    
            @Override
            public String toString() {
                return text + ":" + frequency;
            }
    
            public Pair(String text, int frequency) {
                super();
                this.text = text;
                this.frequency = frequency;
            }
    
            public String getText() {
                return text;
            }
            public void setText(String text) {
                this.text = text;
            }
            public int getFrequency() {
                return frequency;
            }
            public void setFrequency(int frequency) {
                this.frequency = frequency;
            }
        }
    
        public static Comparator<Pair> idComparator = new Comparator<Pair>(){
            @Override
            public int compare(Pair o1, Pair o2) {
                if(o1.getFrequency() > o2.getFrequency()) {
                    return -1;
                }
                else if(o1.getFrequency() < o2.getFrequency()){
                    return 1;
                }
                else {
                    return 0;
                }
            }
        };
    
        public static void main(String[] args) {
            Set<Pair> data = new HashSet<Pair>();
            data.add(new Pair("haha", 3));
            data.add(new Pair("Hehe", 5));
            data.add(new Pair("Test", 10));
    
            Queue<Pair> queue = new PriorityQueue(16, idComparator);
    
            for(Pair pair : data) {
                queue.add(pair);
            }
    
            // Test the order
            Pair temp = null;
            while((temp = queue.poll()) != null) {
                System.out.println(temp);
            }
    
        }
    
    }
    

    (2)For the other case(the words come continuously), you may use a TreeMap to keep the order. See ref: http://www.java-samples.com/showtutorial.php?tutorialid=370