Search code examples
regexruby

Regex: Capture only matching alternative in Ruby


I am cracking my head open with this: I am building a simple password generator which uses a pattern created by the user. The pattern contains the letters a-f which stand for 6 character groups:

Password Generator

So, if the user enters ababcd, he/she will get something like abce1! The pattern Ababababababccd would generate something like Robuxejiqu23#. Since Ababababababccd seems hardly readable I want to give the possibility to abbreviate this with repetitions: Ababababababccd could be written as Ab(ab){5}ccd or even Ab(ab){5}c{2}d(which would be dumb, but for completeness). The [ab] comes into play if want to have either a character from group a or group b.

I have "built" a suitable Regex expression which captures the repetitions in following combinations:

  1. single character repetition like a{3} would become aaa
  2. group repetition (ab){2}would become abab
  3. XOR-group repetition [ab]{2} becomes either ab or aa or ba etc.
  4. combinations of 2 and 3: (a[bc]d){n}

The regex expression I built so far finds all the cases above: /(?:(\[[abcdef]+\])|\(([\[\]abcdef]+)\)|([abcdef]))\{(\d+)\}/i

My working Ruby code looks like this, but it'd like find a more elegant solution:

# class
REPETITION = /(?:(\[[abcdef]+\])|\(([\[\]abcdef]+)\)|([abcdef]))\{(\d+)\}/i
GROUPS = /\[([abcdefxyz]+)\]/i;

# method generate
pattern = params[:pattern]
group = {
  'A' => params[:group_a].upcase,
  'a' => params[:group_a],
  'B' => params[:group_b].upcase,
  'b' => params[:group_b],
  'C' => params[:group_c].upcase,
  'c' => params[:group_c],
  'D' => params[:group_d].upcase,
  'd' => params[:group_d],
  'E' => params[:group_e].upcase,
  'e' => params[:group_e],
  'F' => params[:group_f].upcase,
  'f' => params[:group_f]
}

# Evaluate repetitions: ...{n}
if pattern =~ REPETITION
  pattern.gsub!(REPETITION) do
    match = $1 != nil ? $1 : $2 != nil ? $2 : $3
    count=$4.to_i
    expanded=""
    count.times do
      expanded+=match
    end
    expanded
  end
end

# Evaluate character groups [...]
if pattern =~ GROUPS
  pattern.gsub!(GROUPS) do
    $1[rand($1.length)]
  end
end

# Evaluate the final pattern (repetitions and []-groups processed)
password=""
pattern.each_char do |c|
  password+=group[c][rand(group[c].length)]
end
@password=password;

My idea is to find all occurences and replace the repetitions with the expanded repetitions so that Ab(ab){5}ccd becomes Ababababababccd which I process afterwards.

My assumption is that .gsub handles each occurences individually and and I can replace each match with its suitable pattern and count. If gsub tries to replace all of them at once it will be the wrong method.

For me it's not relevant which of the groups in (?:a|b|c){count} matches, but the match should be returned as $1 and the count as $2.

The above regex leads to the matches $1, $2, $3 and $4. $4 is always my repetition count. But then I have to find out which $1 .. $3 is not nil and use it to expand. I could to the "cheap" case and if through the cases but I want to have an elegant solution.

In regex101 I got it to work with (?| but Ruby does not understand that.

I hope it is clear what I want!?

In Typescript and C# I got it working and now I want it to work in Ruby... :-)


Solution

  • Here is what I came up with to handle the provided cases.

    REPETITION = /(\[[abcdef]+\]|\([\[\]abcdef]+\)|[abcdef])(?:\{(\d+)\})?/i
    
    GROUPS = {
      'a' => 'bcdfghjklmnpqrstvwxyz',
      'b' => 'aeiou',
      'c' => '0123456789',
      'd' => '!$%&/()=?*+#_.,:;_'
    }.then {|h| h.merge(h.map {|k,v| [k.upcase,v.upcase]}.to_h)}
    
    def expand(str)
      str.scan(REPETITION).map do |group, count|
        sub_pattern = group.start_with?('(') ? group[/(?<=\()(.*)(?=\))/, 1] : group
        count ? sub_pattern * count.to_i : sub_pattern
      end.join.gsub(/\[.*?\]/) {|match| match[1..-2].chars.shuffle.first}
    end 
    
    def generate(str, groups=GROUPS)
      expanded = expand(str)
      puts "Expansion: #{expanded}"
      expanded.each_char.sum(""){|c| groups[c][rand(groups[c].length)]}
    end 
    

    Example output:

    patterns = ['(ab){2}','a{3}','[ab]{2}','(a[bc]d){16}','Ab(ab){5}ccd','Ab(ab){5}c{2}d']
    
    patterns.each do |pattern|
      puts "-------Pattern: #{pattern}-------"
      puts "Generated: #{generate(pattern)}"
    end
    
    # -------Pattern: (ab){2}-------
    # Expansion: abab
    # Generated: niha
    # -------Pattern: a{3}-------
    # Expansion: aaa
    # Generated: svh
    # -------Pattern: [ab]{2}-------
    # Expansion: ba
    # Generated: ak
    # -------Pattern: (a[bc]d){16}-------
    # Expansion: abdacdabdabdabdabdabdacdacdacdacdabdabdacdabdabd
    # Generated: lu+s5$cu%ye#re+mo%qu)s4?c1*h8(l5!ja*zu?g9_lu/ze!
    # -------Pattern: Ab(ab){5}ccd-------
    # Expansion: Ababababababccd
    # Generated: Meqonicovala75!
    # -------Pattern: Ab(ab){5}c{2}d-------
    # Expansion: Ababababababccd
    # Generated: Pezotuwegona98#
    

    We match the repetitions, expand them by the count, and then replace the XORs to expand the full pattern.

    Then we simply lookup each character from the GROUPS hash and select a random element.

    Note: this does not handle other cases such as:

    • Group inside an XOR Ab[a(bc)d]{12}
    • Repetition inside a group which is then replicated Ab(ab{5}){4}
    • Likely many more