Search code examples
javastringalgorithmlexicographic

Lexicographically middle string between 2 strings


public static String getMiddleString(String S, String T)
{
    if(S.length()<T.length()){
        for(int i=0;i<(T.length()-S.length());i++)S = '`'+S;
    }
    if(T.length()<S.length()){
        for(int i=0;i<(S.length()-T.length());i++)T = '`'+T;
    }

    int N = S.length();
    // Stores the base 26 digits after addition
    int[] a1 = new int[N + 1];

    for (int i = 0; i < N; i++) {
        a1[i + 1] = (int)S.charAt(i) - 97
                + (int)T.charAt(i) - 97;
    }

    // Iterate from right to left
    // and add carry to next position
    for (int i = N; i >= 1; i--) {
        a1[i - 1] += (int)a1[i] / 26;
        a1[i] %= 26;
    }

    // Reduce the number to find the middle
    // string by dividing each position by 2
    for (int i = 0; i <= N; i++) {

        // If current value is odd,
        // carry 26 to the next index value
        if ((a1[i] & 1) != 0) {

            if (i + 1 <= N) {
                a1[i + 1] += 26;
            }
        }

        a1[i] = (int)a1[i] / 2;
    }

    String result ="";
    for (int i = 1; i <= N; i++) {
        result+=(char)(a1[i] + 97);
    }
    return result;
}

This is code that I took from https://www.geeksforgeeks.org/find-the-string-present-at-the-middle-of-a-lexicographically-increasing-sequence-of-strings-from-s-to-t/ to help me find the string in the middle between 2 strings in lexicographical order (e.g. the middle string between "z" and "ad" is "ab". The idea is that it treats the strings as base-26 numbers and gets the number in the middle of them and changes it back to a string.The code from the blog only works with same length strings, so I modified it and it seems to work but I'm not exactly sure why and I'm also not sure if it would work with all cases, I would appreciate it if someone would explain why my modification is correct or incorrect.

The modification was the 2 if conditions on top, it basically prefixes the smaller string with "`" characters (ASCII=96). Initially I tried prefixing with "a", but that didn't work, so I thought of trying the character just before "a" in ASCII.

In my case when I talk about lexicographical ordering I mean: a,b,c.....y,z,aa,ab,ac,ad...zz,aaa,aab.....


Solution

  • This sample code implements your example in a concise manner.

    The function alpha_string_to_number takes a std::string of lower case alpha and converts it into a number using the assignments a=1, b=2, etc. This is not base-26, but similar. The function number_to_alpha_string does the inverse.

    As far as I can tell, this is a valid lexicographical ordering in that a is 1, b is 2, aa is 27, ab is 28 etc.

    For z you get 26 and for ad you get 30, the average of which is 28 or ab. Now, in what sense this is the "middle" between z and ad is up for debate, but it matches your example.

    You should be able to use the following to help debug your function.

    Sample Code

    #include <iostream>
    
    size_t alpha_string_to_number(const std::string& str) {
        size_t n{}, place_digit{1};
        for (auto elem : str) {
            n *= 26;
            n += 1 + (elem - 'a');
        }
        return n;
    }
    
    std::string number_to_alpha_string(size_t n) {
        std::string r{};
        while (n > 0) {
            r += 'a' + (--n % 26);
            n /= 26;
        }
        std::reverse(r.begin(), r.end());
        return r;
    }
    
    int main(int argc, const char *argv[]) {
    
        auto n1 = alpha_string_to_number("z");
        auto n2 = alpha_string_to_number("ad");
        std::cout << n1 << std::endl;
        std::cout << n2 << std::endl;
        std::cout << number_to_alpha_string((n1 + n2) / 2) << std::endl;
        return 0;
    }
    

    Extended Explanation

    alpha_string_to_number keeps track of the result in n and value of the current digit in place_digit. For base-10, place digit would cycle through the powers of ten, so 1, 10, 100, 1000, etc. For this example we have 26 characters so place digit is 1, 26, 26^2, 26^3, etc. This allows us to shift the previous n over a single place digit each time through loop as we add in the value of the current character.

    Note that the character values range from 1 to 26 (instead of 0 to 25) in order to make the lexicographical ordering work correctly. Hence, the +1 when converting to a number and the -1 when converting from a number.

    number_to_alpha_string keeps track of the result string r which is computed in reverse order since it is easier to extract the least significant remaining digit each time through the loop rather than the most significant. Before returning, we reverse the characters to put them in the correct order.