Search code examples
pythonexcelradix

Get the Excel column label (A, B, ..., Z, AA, ..., AZ, BA, ..., ZZ, AAA, AAB, ...)


Given the letter(s) of an Excel column header I need to output the column number.

It goes A-Z, then AA-AZ then BA-BZ and so on.

I want to go through it like it's base 26, I just don't know how to implement that.

It works fine for simple ones like AA because 26^0 = 1 + 26^1 = 26 = 27.

But with something like ZA, if I do 26 ^ 26(z is the 26th letter) the output is obviously too large. What am I missing?


Solution

  • If we decode "A" as 0, "B" as 1, ... then "Z" is 25 and "AA" is 26.

    So it is not a pure 26-base encoding, as then a prefixed "A" would have no influence on the value, and "AAAB" would have to be the same as "B", just like in the decimal system 0001 is equal to 1. But this is not the case here.

    The value of "AA" is 1*261 + 0, and "ZA" is 26*261 + 0.

    We can generalise and say that "A" should be valued 1, "B" 2, ...etc (with the exception of a single letter encoding). So in "AAA", the right most "A" represents a coefficient of 0, while the other "A"s represent ones: 1*262 + 1*261 + 0

    This leads to the following code:

    def decode(code):
        val = 0
        for ch in code: # base-26 decoding "plus 1"
            val = val * 26 + ord(ch) - ord("A") + 1 
        return val - 1
    

    Of course, if we want the column numbers to start with 1 instead of 0, then just replace that final statement with:

    return val