I want to create a simple string generator and here is how it will work
pattern_string = "abcdefghijklmnopqrstuvwxyz"
starting_string = "qywtx"
starting_string
against the pattern string.Last character is x
. We find this character in the find it in the pattern_string
:
abcdefghijklmnopqrstuvw x yz
and see that next character is y
so I want output qywty
.
...
However, when I reach the z, I want my string to increment second last character and set the last character to the first character of the starting_pattern
so it will be qywra
and so on...
Now questions:
Can I use REGEX to achieve that?
Are there any libraries out there that already handle such generation?
The following will generate the next string according to your description.
def next(s, pat):
l = len(s)
for i in range(len(s) - 1, -1, -1): # find the first non-'z' from the back
if s[i] != pat[-1]: # if you find it
# leave everything before i as is, increment at i, reset rest to all 'a's
return s[:i] + pat[pat.index(s[i]) + 1] + (l - i - 1) * pat[0]
else: # this is only reached for s == 'zzzzz'
return (l + 1) * pat[0] # and generates 'aaaaaa' (just my assumption)
>>> import string
>>> pattern = string.ascii_lowercase # 'abcde...xyz'
>>> s = 'qywtx'
>>> s = next(s, pattern) # 'qywty'
>>> s = next(s, pattern) # 'qywtz'
>>> s = next(s, pattern) # 'qywua'
>>> s = next(s, pattern) # 'qywub'
For multiple 'z'
in the end:
>>> s = 'foozz'
>>> s = next(s, lower) # 'fopaa'
For all 'z', start over with 'a' of incremented length:
>>> s = 'zzz'
>>> s = next(s, lower) # 'aaaa'
To my knowledge there is no library function to do that. One that comes close is itertools.product
:
>>> from itertools import product
>>> list(map(''.join, product('abc', repeat=3)))
['aaa', 'aab', 'aac', 'aba', 'abb', 'abc', 'aca', 'acb', 'acc', 'baa',
'bab', 'bac', 'bba', 'bbb', 'bbc', 'bca', 'bcb', 'bcc', 'caa', 'cab',
'cac', 'cba', 'cbb', 'cbc', 'cca', 'ccb', 'ccc']
But that doesn't not work with an arbitrary start string. This behaviour could be mimicked by combining it with itertools.dropwhile
but that has the serious overhead of skipping all the combinations before the start string (which in the case of an alphabet of 26 and a start string towards the end pretty much renders that approach useless):
>>> list(dropwhile(lambda s: s != 'bba', map(''.join, product('abc', repeat=3))))
['bba', 'bbb', 'bbc', 'bca', 'bcb', 'bcc', 'caa', 'cab', 'cac', 'cba', 'cbb', 'cbc', 'cca', 'ccb', 'ccc']