I've always been of the belief that if you're copying and pasting code, then there's a more elegant solution. I'm currently implementing a string suffix trie in C++ and have two practically identical functions which only differ in their return statement.
The first function checks if a substring is present, the second if it is a suffix (all strings have the character #
added to the end).
substring(string S)
bool Trie::substring(string S){
Node* currentNode = &nodes[0]; //root
for(unsigned int i = 0; i < S.length(); i++){
int edgeIndex = currentNode->childLoc(S.at(i));
if(edgeIndex == -1)
return false;
else currentNode = currentNode->getEdge(edgeIndex)->getTo();
}
return true;
}
suffix(string S)
bool Trie::suffix(string S){
Node* currentNode = &nodes[0]; //root
for(unsigned int i = 0; i < S.length(); i++){
int edgeIndex = currentNode->childLoc(S.at(i));
if(edgeIndex == -1)
return false;
else currentNode = currentNode->getEdge(edgeIndex)->getTo();
}
return (currentNode->childLoc('#') == -1)? false : true; //this will be the index of the terminating character (if it exists), or -1.
}
How can the logic be generalised more elegantly? Perhaps using templates?
What I, personally, do in such circumstances, is to move the common code to the function, which I call from multiple places, where I need, said common functionality. Consider this:
Node* Trie::getCurrentNode (string S){
Node* currentNode = &nodes[0];
for(unsigned int i = 0; i < S.length(); i++){
int edgeIndex = currentNode->childLoc(S.at(i));
if(edgeIndex == -1)
return nullptr;
else currentNode = currentNode->getEdge(edgeIndex)->getTo();
}
return currentNode;
}
And then, use it, in all the cases, where you need it:
bool Trie::substring(string S){
return getCurrentNode (S) != nullptr;
}
bool Trie::suffix(string S){
Node* currentNode = getCurrentNode(S);
return currentNode != nullptr && currentNode->childLoc('#') != -1;
// I decided to simplify the return statement in a way François Andrieux suggested, in the comments. It makes your code more readable.
}