Search code examples
c++sortingalphabetical-sort

Sorting 'Alphabetically' (Alien Dictionary Code Problem)


I've started tackling coding problems to try and improve my skills.

I'm working on the 'Alien Dictionary' coding problem which, when given a sorted list of 'Alien Words' you need to determine the 'Alien Alphabet'. The alien alphabet is made up of Latin characters but in a different order than ours.

I've since learned there are more optimised ways of solving this which I will look into, but I want to see my instinctual approach through to completion.

My code does compile with c++20 and outputs the correct alphabet, however, I had to implement a 'hack' to cover an edge case which I explain in a code comment.

I can't quite wrap my head around why I needed the hack, or how to fix my code to not require it.

#include <iostream> // std::cout
#include <map> // std::map
#include <vector> // std::vector
#include <algorithm> // std::sort

/*
    Input: words[] = ["pbb", "bpku", "bpkb", "kbp", "kbu"];
    Expected Output: ['p', 'u', 'b', 'k'];
*/

typedef std::vector<std::string> WordList;
WordList alienWords = {"pbb", "bpku", "bpkb", "kbp", "kbu"};

typedef std::map<std::pair<char, char>, char> RuleBook;
RuleBook alienRuleBook;

typedef std::vector<char> Alphabet;
Alphabet alienAlphabet;

void populateAlphabet(Alphabet& alphabet, WordList& wordList) {
    alphabet.clear();
    for (int word = 0; word < wordList.size(); word++) {
        for (int letter = 0; letter < wordList[word].size(); letter++) {
            if(std::find(alphabet.begin(), alphabet.end(), wordList[word][letter]) == alphabet.end()) {
                alphabet.push_back(wordList[word][letter]);
            }
        }
    }
}

void generateRules(RuleBook& ruleBook, WordList& wordList){
    for (int firstWord = 0; firstWord < wordList.size(); firstWord++) {
        for (int secondWord = firstWord + 1; secondWord < wordList.size(); secondWord++) {
            if (secondWord == wordList.size()) break; 
            
            int letter = 0;

            for (; letter < wordList[firstWord].size(); letter++) {
                if (wordList[firstWord][letter] == wordList[secondWord][letter]) continue;
                
                ruleBook[{wordList[firstWord][letter], wordList[secondWord][letter]}] = '<';
                ruleBook[{wordList[secondWord][letter], wordList[firstWord][letter]}] = '>';
                break;
            }
        }
    }
}

// needs to return TRUE if 'l' should come before 'r'.
bool getRule(char l, char r) {
    switch(alienRuleBook[{l, r}]) {
        case '>': return false;
        case '<': return true;
    }
    std::cout << "ERROR! No rule found for: '" << l << "' vs '" << r << "'\n\n";
    
    // The below is a hack because I don't understand to fix the case of {'u', 'k'}
    // There's no 'discovered' rule saying 'u' comes before 'k' or 'k' comes after 'u'
    // even though we KNOW 'u' comes before 'b' and we know that 'b' comes before 'k'.
    return true;
}

void printAlphabet(Alphabet& alphabet){
    std::cout << "=== Alphabet ===" << "\n ";
    for(const auto it : alphabet)
        std::cout << it << " ";
    std::cout << "\n================\n\n";
}

void printRuleBook(RuleBook& ruleBook){
    std::cout << "=== Discovered Rules ===" << "\n";
    for(const auto it : ruleBook)
        std::cout << " " << it.first.first << " " << it.second << " " << it.first.second << '\n';
    std::cout << "================\n\n";
}

int main() {
    populateAlphabet(alienAlphabet, alienWords);
    
    generateRules(alienRuleBook, alienWords);
    
    std::sort(alienAlphabet.begin(), alienAlphabet.end(), getRule);
    
    printRuleBook(alienRuleBook);
    
    printAlphabet(alienAlphabet);
    
    return 0;
}

Solution

  • In order to implement the getRule function if there is no implicit rule for {a, b}, you should search for {a, x} = '>' where {x, b} = '>' or {a, x} = '<' where {x, b} = '<'

    // in case if a > b, you search for "a > x and x > b". 
    // In other words, if a is greater than x and x is greater than b,
    // then a is greater than b.
    a ... x ... b
    
    // the similar is for case when a < b
    b ... x ... a
    
    

    In case if you cannot find {a, x} and {x, b}, you should search for {a, x} + {x, y} + {y, b}. The search depth can increase. I'm not sure that is good solution.

    I would suggest to look at the problem as a directed graph:

    1. First walk through the ordered list of words and make a graph: p->b->k and p->u->b and (duplicated) p->u
    2. Then find the node that has no incoming connections (e.g. p). That will be the first char of the alphabet
    3. Then iterate its outgoing connections and find the node, that has no incoming connections except p. That will give you the second char (u).
    4. Then iterate over all connections of the second char u and find the node that has no other incoming connections except p and u. That will give you b
    5. And so on, and so on