Search code examples
pythonregexformattingmarkdownpython-telegram-bot

how to escape texts for formatting in python


I have the following text.

"\*hello* * . [ }"

It should be escaped like this:

"\*hello\\* \* \\. \\[ \\}"

How to do this with python regex?

Every special character (the special characters are: _, *, [, ], (, ), ~, `, >, #, +, -, =, |, {, }, ., ! must be escaped with the preceding character \.

I tried it with this but then every character is escaped:

escape_chars = r'_*[]()~`>#+-=|{}.!'
return re.sub(f'([{re.escape(escape_chars)}])', r'\\\1', text)

Then the text is unformatted like this:

\*hello\* \* \. \[ \}

But it should be like this:

**hello** \* \. \[ \}

Some examples:

At \* \* \* only the middle one should be escaped At \{ \{ \} only the middle one should be escaped

I need this for tex formatting: https://core.telegram.org/bots/api#markdownv2-style


Solution

  • Since you tagged python-telegram-bot, I'm gonna point you to the escape_markdown helper function. the source code for this is here

    Maybe this helps you. However, I have to agree with Chris: It's not clear to me what you actually want to achieve.

    EDIT:

    The use case seems to be that users should be allowed to set some kinds of template messages, which can have dynamic input. OP did not (yet) explain how exactly those templates look like, so I'll just make up an example. Let's say the user wants to specify a welcome message of the format

    Hello_there, {username}!
    

    Where Hello_there is italic and {username} is replaced with the corresponding string at runtime and should be displayed bold, including the !.

    I see two ways to approach this.

    1. The user sends the message as formatted text (i.e. the Bot receives a message "Hellow_there, {username}!"). In this case, one can store the template by simply storing update.effective_message.text_markdown(_v2)/text_html. See Message.text_html. Then at runtime, all you need to to is send_message(template.format(username=escaped_username), parse_mode=...). Note that here escaped_username is a string containing the username with special characters escaped. This can be achieved with either escape_markdown for markdown formatting or for HTML formatting with html.escape from the std lib

    2. The user sends the text with markup characters. Sticking to Markdown formatting for the example, the bot would receive a message saying _Hello_there_, *{username}!*. Now to convert this to a template, you'd have to somehow escape the relevant characters. In this case this should be _Hello\_there_,*escaped_username\!* at runtime. In this scenario I don't see a safe way to decide what to escape and what not to. While you can do some regexing to e.g. convert *{username}!* to *{username}\!*, how would you know if the user wants "Hello there_" or "Hello_there"?

    I therefore highly recommend the first approach.


    Disclaimer: I'm currently the maintainer of python-telegram-bot