Search code examples
pythonruby

Regex Difference between Ruby & Python


I just started Advent of Code 2023 and am trying to use it to learn a few new programming languages. I have (some) familiarity with python, and literally just installed ruby today.

Day 1, part 2, I am using a Regex to search for digits as well as their spelled out versions. The regex in python (which yields the correct result): (?=(0|1|2|3|4|5|6|7|8|9|zero|one|two|three|four|five|six|seven|eight|nine))

When I use this exact regex in Ruby, I get a nil result. Interestingly, when I use this regex, I do get the exact same result in both python and ruby, but it is the incorrect answer: r"0|1|2|3|4|5|6|7|8|9|zero|one|two|three|four|five|six|seven|eight|nine"

So I believe the answer has to do with the positive lookahead assertion, but I don't know why, and what it is doing differently.

Below are both of the files.

Python:

import re

input = open("../resources/input.txt","r")
lines = input.readlines()

targets = [
    '0','1','2','3','4','5','6','7','8','9',
    'zero','one','two','three','four','five','six','seven','eight','nine'
]
values = {
    '0': 0,
    '1': 1,
    '2': 2,
    '3': 3,
    '4': 4,
    '5': 5,
    '6': 6,
    '7': 7,
    '8': 8,
    '9': 9,
    'zero': 0,
    'one': 1,
    'two': 2,
    'three': 3,
    'four': 4,
    'five': 5,
    'six': 6,
    'seven': 7,
    'eight': 8,
    'nine': 9
}

sum = 0

for line in lines:
    numbers = re.findall(r"(?=("+'|'.join(targets)+r"))", line)

    firstDigitValue = values[numbers[0]] * 10
    lastDigitValue = values[numbers[-1]]

    sum += (firstDigitValue+lastDigitValue)



print(sum)

Ruby:

# Init vars
sum = 0

reg = /\d|zero|one|two|three|four|five|six|seven|eight|nine/
reg2 = /(?=(0|1|2|3|4|5|6|7|8|9|zero|one|two|three|four|five|six|seven|eight|nine))/
reg3 = /0|1|2|3|4|5|6|7|8|9|zero|one|two|three|four|five|six|seven|eight|nine/

values = {
    '0' => 0,
    '1' => 1,
    '2' => 2,
    '3' => 3,
    '4' => 4,
    '5' => 5,
    '6' => 6,
    '7' => 7,
    '8' => 8,
    '9' => 9,
    'zero' => 0,
    'one' => 1,
    'two' => 2,
    'three' => 3,
    'four' => 4,
    'five' => 5,
    'six' => 6,
    'seven' => 7,
    'eight' => 8,
    'nine' => 9
}


# Pipe the file line by line and do per line
File.foreach("../resources/input.txt", chomp: true) do |line|
    # Get the first and last digits as their values
    numbers = line.scan(reg3)

    firstDigitValue = values[numbers[0]] * 10
    lastDigitValue = values[numbers[-1]]

    # accumulate
    sum += (firstDigitValue+lastDigitValue)

end

puts sum

Solution

  • 0|1|2|3|4|5|6|7|8|9|zero|one|two|three|four|five|six|seven|eight|nine
    

    The problem with this regex, in both Python and Ruby, is that you fail to account for overlapping matches. I made the exact same mistake doing this problem earlier this month. If the phrase eightwo, for instance, appears in your puzzle input, then both Python and Ruby will match the "eight" part and then start looking for more matches at the "w", so they won't see the word "two".

    (?=(0|1|2|3|4|5|6|7|8|9|zero|one|two|three|four|five|six|seven|eight|nine))
    

    This corrects the problem by putting the whole match into a lookahead (it's probably not efficient, but we're doing coding challenges so it's good enough). When considering overlaps, lookaheads aren't considered part of the pattern, so we start searching basically right where we left off.

    However, in Ruby, when you have capture groups in your regular expression, then String#scan behaves differently.

    If the pattern contains groups, each individual result is itself an array containing one entry per group.

    So your output actually looks like

    [["4"], ["one"], ["eight"], ["nine"]]
    

    You just need to deal with this extra nesting layer.

    first_digit_value = values[numbers[0][0]] * 10
    last_digit_value = values[numbers[-1][0]]