Search code examples
c++algorithmdata-structuresterminology

Common name for index for lexicographical sorting


Is there some common name for index for lexicographical sorting?

I have text and index to it sorted by lexicographical order of all substrings which start from specific position and last till the end of the text.

Let's consider the code:

#include <algorithm>
#include <iostream>
#include <ranges>
#include <vector>

auto make_index_projection_for_subrange = []<std::ranges::random_access_range R>(R & range) {
    return [&range](std::ranges::range_difference_t<R> i) -> decltype(auto) {
        return std::ranges::subrange(std::ranges::begin(range)+i, std::ranges::end(range));
        };
};

int main()
{
    const std::string s1 = "abdbc";
    std::vector i1{0,1,2,3,4};
    
    //Sort by substring from the char to the end
    auto proj = make_index_projection_for_subrange(s1);

    std::ranges::sort(i1, std::ranges::lexicographical_compare,proj);
    std::ranges::copy(i1, std::ostream_iterator<int>(std::cout));
    std::cout << std::endl;

    for(auto i: i1){
        std::cout << std::string(s1.begin()+i, s1.end()) << std::endl;
    }
}

See the demo; it provides the following results:

03142
abdbc
bc
bdbc
c
dbc

All the substrings are now sorted alphabetically and search as fast as at least binary search. If one need to find similar strings, this is just passing through adjacent lines.

I call this "adjacent index" because all similar substrings are adjacent to each other, but I guess this this is a classic structure with a common name.


Solution

  • The structure is Suffix_array

    While the Trie is closely related and could be helpful, as well.

    (Just to have it formally answered and easier to find the answer, based on the comments above)