Search code examples
pythonregexfuzzy-searchpypi-regex

python fuzzy regex with nested or regex


I'm trying to do some fuzzy matching on a string of DNA reads. I'd like to allow for up to 1 substitution error while at the same time allowing a particular basepair to be one of two options (A or G in this case).

I've started with the following:

>>> regex.findall("([A|G]TTAGATACCCTGGTAGTCC){0<=s<=1}", "ATTAGATACCCTGGTAGTCA")
['ATTAGATACCCTGGTAGTCA']

matches as expected because I'm matching against the exact string

>>> regex.findall("([A|G]TTAGATACCCTGGTAGTCC){0<=s<=1}", "GTTAGATACCCTGGTAGTCA")
['GTTAGATACCCTGGTAGTCA']

matches as expected because I'm matching against the exact string except the first base pair has been switched from an A to a G (allowed)

>>> regex.findall("([A|G]TTAGATACCCTGGTAGTCC){0<=s<=1}", "GTTAGATACCCTGGTAGTCx")
['GTTAGATACCCTGGTAGTCx']

matches as expected because a single substitution occurs (C->x)

>>> regex.findall("([A|G]TTAGATACCCTGGTAGTCC){0<=s<=1}", "xTTAGATACCCTGGTAGTCx")
[]

does not match (as expected) because there are two substitutions

>>> regex.findall("([A|G]TTAGATACCCTGGTAGTCC){0<=s<=1}", "xTTAGATACCCTGGTAGTCA")
[]

should have matched, since the first basepair error (x instead of A or G) should have been counted as a substitution.


Solution

  • You have two substitutions in your last example: the first basepair has been substituted with an x while the last has been changed to an A. You only allow one substitution, so there's no match.