Search code examples
pythonescapinghtml-escape-characters

Double quotes in bytes object vs double qoutes in flask response data


I'm writing some assertions for a flask application. I've successfully tested for Umlauts in the response like so:

assert 'Wählen Sie ...'.encode('utf-8') in rv.data

The Umlaut "ä" has the same represenation (\xc3\xa4) in both the encoded data and the response from the web application.

But now, I'm trying to do the same with double quotes:

assert 'Der gewünschte Monat ... z.B. "2019-5".'.encode('utf-8') in rv.data

which fails, because " is still " when it is encoded, but the web application responds with " instead.

What should I do with the string I am testing for in order to achieve compatibility?


Solution

  • The web application is using html entity codes to encode the double quotes before encoding to utf-8. You could use the html.escape function to simulate this, but unfortunately it replaces '"' with " rather than ".

    The xml.sax.saxutils.escape function does not automatically escape double quotes, but it does accept a dictionary of characters to escape and the escaped values, so you can use this to generate the text:

    >>> from xml.sax import saxutils
    >>> escaped = saxutils.escape('Der gewünschte Monat ... z.B. "2019-5".', {'"': '"'})
    >>> escaped
    'Der gewünschte Monat ... z.B. "2019-5".'
    

    The reverse approach would be to decode and unescape the server response and compare it with the original string. You can use the html.unescape function for this, as it will unescape the numeric escape:

    >>> import html
    >>> response = html.unescape(rv.data.decode('utf-8'))
    >>> assert 'Der gewünschte Monat ... z.B. "2019-5".'.encode('utf-8') in response