I want to add markdown to key phrases in a (gollum) wiki page that will link to the relevant wiki page in the form:
This is the key phrase.
Becomes
This is the [[key phrase|Glossary#key phrase]].
I have a list of key phrases such as:
keywords = ["golden retriever", "pomeranian", "cat"]
And a document:
Sue has 1 golden retriever. John has two cats.
Jennifer has one pomeranian. Joe has three pomeranians.
I want to iterate over every line and find every match (that isn't already a link) for each keyword. My current attempt looks like this:
File.foreach(target_file) do |line|
glosses.each do |gloss|
len = gloss.length
# Create the regex. Avoid anything that starts with [
# or (, ends with ] or ), and ignore case.
re = /(?<![\[\(])#{gloss}(?![\]\)])/i
# Find every instance of this gloss on this line.
positions = line.enum_for(:scan, re).map {Regexp.last_match.begin(0) }
positions.each do |pos|
line.insert(pos, "[[")
# +2 because we just inserted 2 ahead.
line.insert(pos+len+2, "|#{page}\##{gloss}]]")
end
end
puts line
end
However, this will run into a problem if there are two matches for the same key phrase on the same line. Because I insert things into the line, the position I found for each match isn't accurate after the first one. I know I could adjust for the size of my insertions every time but, because my insertions are a different size for each gloss, it seems like the most brute-force, hacky solution.
Is there a solution that allows me to make multiple insertions on the same line at the same time without several arbitrary adjustments each time?
After looking at @BryceDrew's online python version, I realized ruby probably also has a way to fill in the match. I now have a much more concise and faster solution.
First, I needed to make regexes of my glosses:
glosses.push(/(?<![\[\(])#{gloss}(?![\]\)])/i)
Note: The majority of that regex is look-ahead and look-behind assertions to prevent catching a phrase that's already part of a link.
Then, I needed to make a union of all of them:
re = Regexp.union(glosses)
After that, it's as simple as doing gsub on every line, and filling in my matches:
File.foreach(target_file) do |line|
line = line.gsub(re) {|match| "[[#{match}|Glossary##{match.downcase}]]"}
puts line
end