I want to match all strings starting with two ## and do some substitution. That means if the string starts with more than two ## say ###, it shouldn't be a match and if it starts with just one # it should also not be a match.
import re
text = '''
# some one string
Describe your writing briefly here, what ihow many people are you looking for?
## some section two string
Describe your writing briefly here, what ihow many people are you looking for?Describe your writing briefly here, what ihow many people are you looking for?
Describe your writing briefly here, what ihow many people are you looking for?
## some other section two string with question sign?
Describe your writing briefly here, what ihow many people are you looking for? containing all keyword arguments except for those corresponding to a formal parameter. This may be combined with a formal parameter of the form *name (described in the next subsection) which receives a tuple containing the positional arguments beyond the formal parameter list. (*name must occur before **name.) For example, if we define a function like this
## some other section with . and : colon
Describe your writing briefly here, what ihow many people are you looking for?Describe your writing briefly here, what ihow many people are you looking for?
'''
pattern = r"##(.+?.*)"
list_with_sections_ = list(dict.fromkeys(re.findall(pattern, text)))
print(list_with_sections_)
if list_with_sections_:
for item in list_with_sections_:
text = re.sub(item, f'<a href="#" class="section-header title" id="{item.replace(" ", "-").strip()}_">{item}</a>', text)
print(text)
This seems to work but the re.sub returns some inconsistency when a string ends with a question mark or has some special character. For instance, when a match ends with a question mark(?), the re.sub adds an additional ?
after the a
tag.
This issue is caused by how '?' character is treated in regex.
Here: text = re.sub(item, f'<a href="#" class="section-header title..."
you treat item
(which is essentially a part of input text and may contain '?' character) as regex formula. But '?' character in regex formulas has special meaning. As a result you are matching relevant piece of text without ? at the end.
You can address this by escaping special characters in 'item' like this: text = re.sub(re.escape(item), f'<a href="#" class="section-header title..."