Search code examples
algorithmmathpermutationcombinatorics

Calculate Lexicographic Rank


I am trying to calculate the Lexicographic Rank of a given permutation of characters with duplicates.

All the examples I have found seem to deal with finding the Lexicographic rank of a given input string against anagrams of input string (opposed to an arbitrary character set)

Input

  • Character Set: A,B,C
  • Input: BAA (note C is not included in the input)
  • Length: 3 (max length of character combinations...1 & 2 character combinations must also be included)

Output

  • 16

Notes

  • Can have duplicate characters
  • There are 39 total combinations (permutations of 1, 2, or 3 characters)
  • Cannot brute force by generating all combinations, etc. as the actual use-case involves a much larger character set / max string length

All Combinations (in order)

  1. A
  2. AA
  3. AAA
  4. AAB
  5. AAC
  6. AB
  7. ABA
  8. ABB
  9. ABC
  10. AC
  11. ACA
  12. ACB
  13. ACC
  14. B
  15. BA
  16. BAA
  17. BAB
  18. BAC
  19. BB
  20. BBA
  21. BBB
  22. BBC
  23. BC
  24. BCA
  25. BCB
  26. BCC
  27. C
  28. CA
  29. CAA
  30. CAB
  31. CAC
  32. CB
  33. CBA
  34. CBB
  35. CBC
  36. CC
  37. CCA
  38. CCB
  39. CCC

Solution

  • This looks like basically every other ranking/unranking algorithm pair, so easier done than said? The short version is that, to get a word of length at most n, we either have the empty word, which comes first, or a letter followed by a word of length at most n−1. By subtracting one from the rank and dividing by the number of words that answer to the latter description, we can get the first letter and then recurse (but with iteration instead of actual recursion).

    import itertools
    
    
    def count_words(alphabet, max_length):
        count = 1
        for i in range(max_length):
            count = count * len(alphabet) + 1
        return count
    
    
    def word_from_rank(alphabet, rank, max_length):
        count = count_words(alphabet, max_length)
        letters = []
        while rank > 0:
            count = (count - 1) // len(alphabet)
            i, rank = divmod(rank - 1, count)
            letters.append(alphabet[i])
        return "".join(letters)
    
    
    def rank_from_word(alphabet, word, max_length):
        letter_to_rank = {letter: i for (i, letter) in enumerate(alphabet)}
        count = count_words(alphabet, max_length)
        rank = 0
        for letter in word:
            count = (count - 1) // len(alphabet)
            rank += count * letter_to_rank[letter] + 1
        return rank
    
    
    def test(alphabet, max_length):
        words = sorted(
            {
                "".join(letters)
                for n in range(max_length + 1)
                for letters in itertools.product(alphabet, repeat=n)
            }
        )
        assert count_words(alphabet, max_length) == len(words)
        for rank, word in enumerate(words):
            assert word_from_rank(alphabet, rank, max_length) == word
            assert rank_from_word(alphabet, word, max_length) == rank
    
    
    test("ABC", 3)
    test("ABCD", 3)
    test("ABCDE", 4)
    print(rank_from_word("ABC", "BAA", 3))
    for i in range(count_words("ABC", 3)):
        print(i, word_from_rank("ABC", i, 3))
    

    Output:

    16
    0 
    1 A
    2 AA
    3 AAA
    4 AAB
    5 AAC
    6 AB
    7 ABA
    8 ABB
    9 ABC
    10 AC
    11 ACA
    12 ACB
    13 ACC
    14 B
    15 BA
    16 BAA
    17 BAB
    18 BAC
    19 BB
    20 BBA
    21 BBB
    22 BBC
    23 BC
    24 BCA
    25 BCB
    26 BCC
    27 C
    28 CA
    29 CAA
    30 CAB
    31 CAC
    32 CB
    33 CBA
    34 CBB
    35 CBC
    36 CC
    37 CCA
    38 CCB
    39 CCC