Let alpha
a string of randomly sampled elements from the set 1, 2, 3, 4, 5, 6
. For example, alpha
could be "1132345216"
.
Assume alpha
is long enough to contain at least four sub-strings satisfying the following:
N
of 2
s, 3
s and/or 4
s, in any order, possibly repeated or missing. For example, the substring may begin 2222...
or 234234234...
.6
or more than M
characters 5
.For example, "2346"
and "23455"
would satisfy the properties if N = 3
, M = 2
.
I want to find all substrings of this kind in alpha
in Julia. Of course, one thinks of regular expressions. I am somewhat versed with regular expressions from the perspective of formal language theory, but I have never used them in a programming language, and there are differences. I have failed to achieve the desired result.
A quick sample code for anyone who cares to try this:
pattern_string = r"..." # What's the right regex???
# Test string to search for matches
test_string = "1111122211111 2323232234233246 5161532161 232342342322224444223323555555"
# Find all matches in the test string
matches = eachmatch(pattern_str, test_string)
# Output the matches found
println("Matches found:")
for match in matches
println(match.match)
end
In the example, I added spaces for visual clarity; the first substring (before the first space) should NOT be a match, the second one should be for a small N
; the third one should not be a match, the last one should be a match if M
is less than 6
.
Assuming you have n
and m
variables defined, you can create a regex using an interpolated string:
n=10
m=4
pattern_string = Regex("[234]{$n}[1-6]*?(?:6|5{$(m+1),})")
For the sample data, this gives a pattern string of
[234]{10}[1-6]*?(?:6|5{6,})
This matches:
[234]{10}
: 10 of 2
, 3
, 4
, in any order[1-6]*?
: a minimal number of 1-6
(?:6|5{6,})
: either a 6
, or 6 or more 5
For your sample data, this matches 2323232234233246
and 232342342322224444223323555555
.
Regex demo on regex101
Julia demo on Try it online!
If I've misinterpreted your question, and the substrings are not allowed to contain 1
, or 5
or 6
except at the end, you can change the regex to:
pattern_string = Regex("[234]{$n,}(?:6|5{$(m+1),})")
This will just match a sequence of n
or more 2
,3
or 4
, followed by a 6
or m+1
or more 5
s.
For your sample data this matches the same substrings.
Julia demo on Try it online!