Search code examples
regexpython-re

What is the correct Regex to find a letter but NOT if it appears in a bigger pattern/word/phrase?


I am looking to use Regex to find all instances of a certain letter in a given string, but NOT if that letter appears in a larger word/phrase. For example:

For test string:

lag(a,1) + 252*a + max(3a*2) / 5*pctrange(a,10)

I want to obtain all instances of the letter 'a' excluding the letter 'a' that appears in the following three words:

lag max pctrange

i.e. I would like to use Regex to get all instances of the letter 'a' as highlighted here:

lag(a,1) + 252*a + max(3a*2) / 5*pctrange(a,10)

I attempted to use the following Regex but it keeps including the character after my desired letter 'a':

a[^"lag|max|pctrange"]

To provide some context, I'm in Python looking to replace these 'a' instances using the re module:

import re
string = "lag(a,1) + 252*a + max(3a*2) / 5*pctrange(a,10)"
words = ["lag", "max", "pctrange"]
replace = "_"
re.sub(f"a[^\"{'|'.join(words)}\"]", replace, string)

This results in the (undesired) output:

lag(_1) + 252*_+ max(3_2) / 5*pctrange(_10)

I would like instead for the output to be the following:

lag(_,1) + 252*_ + max(3_*2) / 5*pctrange(_,10)

Edit: Note that the search isn't always for a single letter, for example sometimes I want to search for "aa" instead of "a", or "bdg" instead of "a" etc. It's more important to focus on the list of words to be excluded (e.g. in the above example, "lag" "max" and "pctrange").. I don't need to ignore anything other than the specific words that show up in this list. Thank you.


Solution

  • To prevent a from being matched if adjacent to another letter try negative lookarounds.

    (?i)(?<![a-z])a(?![a-z])
    

    See this demo at regex101 - Used the (?i) flag for caseless matching: [a-z][a-zA-Z]


    Update: To skip certain words and match the remaining a try PyPI regex using verbs (*SKIP)(*F).

    import regex as re
    str = re.sub(fr"\b(?i:{'|'.join(words)})\b(*SKIP)(*F)|a", "_", str)
    

    Another demo at regex101 or see a Python demo at tio.run

    What's on the left side of the | alternation will be skipped and what's on the right get matched. Used i ignorecase-flag and \b word boundaries for words inside the (?: non capturing group ).