Search code examples
pythonpython-3.xregexpython-3.7f-string

Combining f-string with raw string to be used inside regex gives SyntaxError; ValueError or wrong result


I have a string here :

s0 = 'Ready1   Origin1                 Destination1             Type1       Rate1      Phone1 #     Pro1 #'

and the following variable being calculated like this :

is_head = len([i.group() for i in re.finditer(r"(\s+){2,}", s0)]) >= 3

which gives me True which is right and expected result for me. Now I have another variable cont_ which might hold any value between 2 to 6. I want to change the regex from r"(\s+){2,}" to r"(\s+){6,}" based on value of cont_. I want to get the variable is_head without declaring separate regex for each cases. For this I need to use f-string along with raw string which is currently being used for regex checking. I've tried these :

>>> len([i.group() for i in re.finditer(fr"(\s+){{cont_},}", s0)]) >= 3
  File "<stdin>", line 1
SyntaxError: f-string: single '}' is not allowed
>>> len([i.group() for i in re.finditer(rf"(\s+){{cont_},}", s0)]) >= 3
  File "<stdin>", line 1
SyntaxError: f-string: single '}' is not allowed

As shown, both gives SyntaxError. I've also tried the following with .format() :

>>> len([i.group() for i in re.finditer(r"(\s+){{con},}".format(cont_), s0)]) >= 3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Single '}' encountered in format string
>>> len([i.group() for i in re.finditer(r"(\s+){{0},}".format(cont_), s0)]) >= 3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Single '}' encountered in format string

in that case I get ValueError. Now I've also tried these :

>>> len([i.group() for i in re.finditer(fr"(\s+){cont_,}", s0)]) >= 3
False
>>> len([i.group() for i in re.finditer(rf"(\s+){cont_,}", s0)]) >= 3
False

These don't produce any error but give wrong result in each case. If I run with cont_ = 2 case, is_head would have been set True. Upon further inspection, I can see that rf"(\s+){cont_,}" as well as fr"(\s+){cont_,}" - both are equivalent to '(\\s+)(2,)' which shouldn't be the proper regex. How to overcome this without explicitly having separate regexes for each possible values of cont_ variable and thereafter using that regex.

NOTE : Please notice, I'm aware that a similar question like this had been asked before here. But the solutions there are of no help for my cause.


Solution

  • The (\s+){2,} pattern matches one or more whitespaces two or more times, which makes little sense. Chunks of two or more whitespaces are matched with \s{2,}.

    Next, in f-strings or format strings, literal curly braces must be doubled.

    Thus, you need

    rf"\s{{{cont_},}}"
    

    where the first {{ is a literal {, {cont_} stands for the cont_ value and }} at the end is a literal } char.