Search code examples
javadata-structurestrie

How to use string frequencies list in Trie data structure?


I am working on some performance test on various data structures. In my list I have HashMap and Trie data structure. I am done with HashMap but not sure how to use Trie for below problem -

I have a text file which contains 2 million english words with their frequencies in this format -

hello 100
world 5000
good 2000
bad 9000
...

Now I am reading this file line by line and storing it in HashMap - First splitted string goes as the key in the HashMap and next splitted string goes as the value in the HashMap and so I am able to measure the insertion performance with the below code.

Map<String, String> wordTest = new HashMap<String, String>();

try {
        fis = new FileInputStream(FILE_LOCATION);
        reader = new BufferedReader(new InputStreamReader(fis));

        String line = reader.readLine();
        while (line != null) {
        String[] splitString = line.split("\\s+");
        // now put it in HashMap as key value  pair
        wordTest.put(splitString[0].toLowerCase().trim(), splitString[1].trim());

        line = reader.readLine();
    }
}

Now how would I implement Trie data structure to load the same thing in Trie as I did for HashMap? And then do a lookup basis on String as well? This is my first time with Trie data structure so little bit confuse.

Update:-

Below is my TrieImpl class

public class TrieImpl {

    //root node
    private TrieNode r;

    public TrieImpl() {
        r = new TrieNode();
    }

    public boolean has(String word) {
        return r.has(word);
    }

    public void insert(String word){
        r.insert(word);
    }

    public String toString() {
        return r.toString();
    }

    public static void main(String[] args) {

        TrieImpl t = new TrieImpl();

        System.out.println("Testing some strings");
        t.insert("HELLO"); // how do I pass string and its count
        t.insert("WORLD"); // how do I pass string and its count

    }
}

And below is my TrieNode class -

public class TrieNode {

    // make child nodes
    private TrieNode[] c;
    // flag for end of word
    private boolean flag = false;

    public TrieNode() {
        c = new TrieNode[26]; // 1 for each letter in alphabet
    }

    protected void insert(String word) {
        int val = word.charAt(0) - 64;

        // if the value of the child node at val is null, make a new node
        // there to represent the letter
        if (c[val] == null) {
            c[val] = new TrieNode();
        }

        // if word length > 1, then word is not finished being added.
        // otherwise, set the flag to true so we know a word ends there.
        if (word.length() > 1) {
            c[val].insert(word.substring(1));
        } else {
            c[val].flag = true;
        }
    }

    public boolean has(String word) {
        int val = word.charAt(0) - 64;
        if (c[val] != null && word.length() > 1) {
            c[val].has(word.substring(1));
        } else if (c[val].flag == true && word.length() == 1) {
            return true;
        }

        return false;
    }

    public String toString() {
        return "";
    }
}

Now how would I extend this to passs a particular string and its count and then do a lookup basis on String?


Solution

  • You can just add a element frequency to your TrieNode class.

    public class TrieNode {
    
        // make child nodes
        private TrieNode[] c;
        // flag for end of word
        private boolean flag = false;
        //stores frequency if flag is set
        private int frequency;
    

    Now in the insert method, add the frequency while setting the flag..change method signature appropriately

    protected void insert(String word, int frequency) {
        int val = word.charAt(0) - 64;
        ..........
        ..........
        // if the value of the child node at val is null, make a new nod
        if (word.length() > 1) {
            c[val].insert(word.substring(1),frequency);
        } else {
            c[val].flag = true;
            c[val].frequency = frequency;
        }
    }
    

    Now create a new method to get the frequency.It can be done similar to has method, where you follow the branches till the end and finally when you find that the flag is set, return the frequency.

    public int getFreq(String word) {
        int val = word.charAt(0) - 64;
        if (word.length() > 1) {
            return c[val].getFreq(word.substring(1));
        } else if (c[val].flag == true && word.length() == 1) {
            return c[val].frequency;
        } else
            return -1;
    }
    

    -------------------------------EDIT------------------------

    Use has method first to check for the string, then use getFreq method

        public int getFreq(String word) {
            if(has(word))
                return getFreqHelper(word);
            else
                return -1; //this indicates word is not present
        }
    
        private int getFreqHelper(String word) {
            int val = word.charAt(0) - 64;
            if (word.length() > 1) {
                return c[val].getFreq(word.substring(1));
            } else if (c[val].flag == true && word.length() == 1) {
                return c[val].frequency;
            } else
            return -1;
    }