Search code examples
pythonstringpython-3.xposition

How to find the byte position from a string in a string, not the character position?


My texteditor (vim) can give the positions of a string in a string but counts the number of bytes, not the number of characters.

Example:

s="I don't take an apéritif après-ski"

When I search the word apéritif my texteditor gives the position:
16,25

Python gives this position of the same word:
16,24

Vim gives the possibility to execute python code in the editor.
In one of my python scripts I do a lot of slicing.
But I never find the correct word if there are accented characters in the string.
Is there a way to resolve this in python?
Can I find the byte position of a string in a string in python?


Solution

  • This is,admittedly, a naive solution. You can encode both the text and word to bytes, and then run find() operation on encoded text with encoded word as parameter.

    def f(text,word):
        en_text=bytes(text,encoding="utf-8")
        en_word=bytes(word,encoding="utf-8")
        start = en_text.find(en_word)
        return (start,start+len(en_word))
    

    When run as:

    f("I don't take an apéritif après-ski","apéritif")
    

    returns (16, 25)