Gtk TextBuffer get_iter_at_offset

I’m writing a plugin for Gedit which makes changes to certain words depending on a regex. In some case this is applying the tag several characters beyond the intended word.

So the values returned by match.start() and match.end() are not valid for use in get_iter_at_offset.

def on_save(self, doc, location, *args, **kwargs):
    """called when document is saved"""
    for match in WORD_RE.finditer(get_text(doc)):
        if not self._checker.check(match.group().strip()):
            self.apply_tag(doc, match.start(), match.end())

def apply_tag(self, doc, start, end):
    """apply the tag to the text between start and end"""
    istart = doc.get_iter_at_offset(start)
    iend = doc.get_iter_at_offset(end)
    doc.apply_tag(self._spell_error_tag, istart, iend)

Solution

I figured it out in the end, it should have been obvious really. The text in the document contained some non-ascii characters, so the regex wasn't able to correctly determine the positions, decoding the documents string to unicode fixed the issue.

so:

get_text(doc).decode('utf-8')