Search code examples
c++suffix-tree

Suffix Trie in C++


I have been trying to write a C++ code of a suffix trie however I want this code to keep track of counters at each node of how often a character or substring appears during the suffix trie construction: bearing in mind that am working with only 4 characters A,C,G and T

The code below is my attempt however its not working correctly:

#include<iostream>
#include <string>
#include <stdio.h>
#include <string.h>
using namespace std;

struct SuffixTreeNode{
    char c;
    struct SuffixTreeNode* one;
    struct SuffixTreeNode* two;
    struct SuffixTreeNode* three;
    struct SuffixTreeNode* four;
    //int count;

};

SuffixTreeNode* CreateNode(char ch){
    SuffixTreeNode* newnode=new SuffixTreeNode();
    newnode->c=ch;
    newnode->one=NULL;
    newnode->two=NULL;
    newnode->three=NULL;
    newnode->four=NULL;
    //count=0;
}   

SuffixTreeNode* Insert(SuffixTreeNode* root,char ch){
    if (root==NULL){
        root=CreateNode(ch);
    }
    else if(ch=='a'){
        root->one=Insert(root->one,ch);
    }
    else if(ch=='c'){
        root->two=Insert(root->two,ch);
    }
    else if(ch=='g'){
        root->three=Insert(root->three,ch);
    }
    else if(ch=='t') {
        root->four=Insert(root->four,ch);
    }

    return root;
}

bool Search(SuffixTreeNode* root, int data){
    if(root==NULL) return false;
    else if (root->c==data) return true;
    else if (root->c=='a')return Search(root->one,data);
    else if (root->c=='c')return Search(root->two,data);
    else if (root->c=='g')return Search(root->three,data);
    else return Search(root->four,data);
}

int main(){
    SuffixTreeNode* root=NULL;
    char str;
    root=Insert(root,'a');
    root=Insert(root,'c');
    root=Insert(root,'c');
    root=Insert(root,'t');
    root=Insert(root,'a');
    root=Insert(root,'g');
    cout<<"Enter character to be searched\n";
    cin>>str;

    if(Search(root,str)==true)cout<<"Found\n";
    else cout<<"Not found\n";
}

Solution

  • The problem is that its design is flawed for the the search and insert: you do it for single characters, while the trie should work with a string.

    Analysis of the problem

    If you print out the trie you will see that you build a tree expanding the branch corresponding too the letter. You have done this because you insert one letter at a time, but this is not the normal layout of a trie :

    enter image description here

    Similarly, when you search for an element, if it's the root element, everything is ok. But if it's not the root element, your code will always search the branch corresponding to the current node, and this recursively, meaning that it will search only in the branch corresponding to the root.

    First step towards a solution:correct the code

    If you want to find any letter in the trie structure, you need to update your search to explore not the branch corresponding to the letter of the current node, but to the letter that is searched:

    bool Search(SuffixTreeNode* root, int data){
        cout << (char)data<<"=="<<root->c<<"?"<<endl; 
        if(!root) return false;
        else if (root->c==data) return true;
        else if (data=='a')return Search(root->one,data);
        else if (data=='c')return Search(root->two,data);
        else if (data=='g')return Search(root->three,data);
        else return Search(root->four,data);
    }
    

    This corrects the code, not the underlying design. Here an online demo here.

    But further work is needed to correct the design

    The design should insert/search a string s. The idea would be to check current char with s[0] and recursively insert/search the remaining of the string s.substr(1);