Search code examples
pythonstringasciiprefix

How could I judge If a string prefix of another string by ascii code in Python?


For example, now we have pattern 'frei', and we have some names like 'freiburg', 'freicking', 'flensburg' and 'freking'. Now I want to judge if the pattern is the prefix of those names(all letters are lowercase).

someone gave the solution like this:

name = ['freiburg', 'freicking', 'flensburg', 'freking']
hit = []
pattern = 'frei'
lower = 'frei'
upper = 'frei{'
for i in name:
    if lower <= i <= upper:
        hit.append(i)

well, I think this is a very fantanstic mehtod. But I don't know what the principle of it is. Could someone tell me why the lower and upper string look like this? And why a character '{' will be used here? I think the sum of ascii code of 'freiburg' is larger than 'frei{', but why is it actually still smaller than 'frei{' ?

Thanks a lot.


Solution

  • This works by ensuring that the unicode codes of corresponding characters to be compared are between the following boundaries:

    [ord(i) for i in 'frei']
    # [102, 114, 101, 105]
    
    [ord(i) for i in 'frei{']
    # [102, 114, 101, 105, 123]
    

    Note that { comes right after z in the table:

    ord('z')
    # 122
    
    ord('{')
    # 123
    

    Hence anything either lower than the first sequence of codes or larger than the second is not considered. This becomes quite clear if you try sorting the sequence of strings, including the boundaries:

    sorted(['fra', 'frei', 'frei{', 'freidja', 'freia', 'from'])
    # ['fra', 'frei', 'freia', 'freidja', 'frei{', 'from']
    

    So basically any string starting with frei, and followed by other letters will be smaller than 'frei{', since { is larger than z.

    However, as mentioned by thierry in the comments, this assumes that the subsequent characters to be matched will be from the latin alphabet. Codes for characters from other alphabets will come after {, and not fall in that boundary.

    Also, a much simpler (and probably the right) approach for this could be to just use startswith from the str class, which as its name indicates, returns a boolean if the string starts with the specified substring:

    [i for i in name if i.startswith('frei')]