I'd like some help with regular expressions because I'm not really familiar with. So far, I have created the following regex:
As https://regex101.com/ states:
\b assert position at a word boundary (^\w|\w$|\W\w|\w\W)
Negative Lookbehind (?])
Assert that the Regex below does not match
Match a single character present in the list below [#-/>]
# matches the character # literally (case insensitive)
- matches the character - literally (case insensitive)
/ matches the character / literally (case insensitive)
> matches the character > literally (case insensitive)
literal matches the characters literal literally (case insensitive)
Negative Lookahead (?![\<\'\"])
Assert that the Regex below does not match
Match a single character present in the list below [\<\'\"]
\< matches the character < literally (case insensitive)
\' matches the character ' literally (case insensitive)
\" matches the character " literally (case insensitive)
\b assert position at a word boundary (^\w|\w$|\W\w|\w\W)
Global pattern flags
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
I want to add two exceptions to this matching rule. 1) if the ">" is preceded by "p", that is for example a <p>
starting tag, to match the literal only. 2) Also the literal should only be matched when <
is follwed by /p
, that is for example a </p>
closing tag.
How can achieve this ?
Example: only the bold ones should match.
**Literal** in computer science is a
<a href='http://www.google.com/something/literal#literal'>literal</a>
for representing a fixed value in source code. Almost all programming
<a href='http://www.google.com/something/else-literal#literal'>languages</a>
have notations for atomic values such as integers, floating-point
numbers, and strings, and usually for booleans and characters; some
also have notations for elements of enumerated types and compound
values such as arrays, records, and objects. An anonymous function
is a **literal** for the function type which is **LITERAL**
I know I have over-complicated things, but the situation is complicated itself and I think I have no other way.
If the text you're searching is just text mixed with some <a>
tags, then you can simplify the <
and >
parts of the lookarounds, and give a specific string that it shouldn't be followed by: </a>