Search code examples
pythonregexunicoderawstring

Python raw strings and unicode : how to use Web input as regexp patterns?


EDIT : This question doesn't really make sense once you have picked up what the "r" flag means. More details here. For people looking for a quick anwser, I added on below.

If I enter a regexp manually in a Python script, I can use 4 combinations of flags for my pattern strings :

  • p1 = "pattern"
  • p2 = u"pattern"
  • p3 = r"pattern"
  • p4 = ru"pattern"

I have a bunch a unicode strings coming from a Web form input and want to use them as regexp patterns.

I want to know what process I should apply to the strings so I can expect similar result from the usage of the manual form above. Something like :

import re
assert re.match(p1, some_text) == re.match(someProcess1(web_input), some_text)
assert re.match(p2, some_text) == re.match(someProcess2(web_input), some_text)
assert re.match(p3, some_text) == re.match(someProcess3(web_input), some_text)
assert re.match(p4, some_text) == re.match(someProcess4(web_input), some_text)

What would be someProcess1 to someProcessN and why ?

I suppose that someProcess2 doesn't need to do anything while someProcess1 should do some unicode conversion to the local encoding. For the raw string literals, I am clueless.


Solution

  • Apart from possibly having to encode Unicode properly (in Python 2.*), no processing is needed because there is no specific type for "raw strings" -- it's just a syntax for literals, i.e. for string constants, and you don't have any string constants in your code snippet, so there's nothing to "process".