Search code examples

Regular expression to match "wap" not preceeded by "html"

I'm using NGINX to segment mobile traffic between a mobile WAP/HTML site. Looks like the best way to do this is going to be to check the UA's preference for content by checking the HTTP Accept Header.

A preference for WAP is indicated by the appearance of a 'wap' mimetype in the header before an 'html' or wildcard mimetype.

So a Sony Ericsson w300i has a preference for WAP:

multipart/mixed, application/vnd.wap.multpart.mixed,applicatnoin/vnd.wap.xhtml_xml,application/xhtml+xml,text/ved.wap.wl,*/*,text/x-hdml,image/mng,/\image/x-mng,ivdeo/mng,video/x-mng,ima/gebmp,text/html

And a Blackberry Bold has a preference for HTML:


Since I'm in NGINX land, it seems like the best tool I have is NGINX's regular expressions (PCRE).

Right now I'm trying to use a negative lookahead to assert "The accept header contains WAP but not preceeded by HTML":


But this isn't correct. Is there a different way I can think about this problem? Or my matching logic?

So far I've found these regex resources useful:


Thanks for the answer, here are the related tests:

import re

prefers_wap_re = re.compile(r'^(?!(?:(?!wap).)*html).*?wap', re.I)

tests = [
    ('', False),
    ('wap', True),
    ('wap html', True),
    ('html wap', False),

for test, expected in tests:
    result =
    assert bool(result) is expected, \
        'Tested "%s", expected %s, got %s.' % (test, expected, result)


  • The simplest way to do this is with a lookbehind instead of a lookahead. Since that is not supported you can try to emulate a lookbehind with a lookahead:


    Not pleasant to read, but it should work.
