I am trying to achieve a basic substitution but I am finding it difficult to determine the behaviour here.
I want to replace the tags with the URL contained inside it.
This is my code:
require 'nokogiri'
message = "Hi Testin wFAASF,
Thank you for booking with us.
Your work has been booked on Sep 16, 2020 1:00PM at 2026 South Clark Street / unit c / Chicago, Illinois 60616
Sincerely,
Varun Security
<a href=\"https://www.google.com\">Test This PR</a>"
puts message.gsub(Nokogiri::HTML.parse(message).at('a'), Nokogiri::HTML.parse(message).at('a')['href'])
What I think the output would be:
"Hi Testin wFAASF,
Thank you for booking with us.
Your work has been booked on Sep 16, 2020 1:00PM at 2026 South Clark Street / unit c / Chicago, Illinois 60616
Sincerely,
Varun Security
https://www.google.com
What the actual output is:
"Hi Testin wFAASF,
Thank you for booking with us.
Your work has been booked on Sep 16, 2020 1:00PM at 2026 South Clark Street / unit c / Chicago, Illinois 60616
Sincerely,
Varun Security
<a href=\"https://www.google.com\">https://www.google.com</a>"
Could someone explain why this is happening and how I could do this better?
Because Nokogiri::XML::Element
is neither a string nor a regexp. Sticking .to_s
works:
puts message.gsub(
Nokogiri::HTML.parse(message).at('a').to_s,
Nokogiri::HTML.parse(message).at('a')['href']
)
However, you are going to all the trouble of parsing the HTML just to search the document again as if you didn't know anything about it. Also, it will give a wrong result if you have multiple links in one message, or if your anchor tag is not formatted canonically — e.g. if you have an extra space, like this: <a href="https://www.google.com" >https://www.google.com</a>
Why not let Nokogiri work?
puts Nokogiri::HTML.fragment(message).tap { |doc|
doc.css("a").each { |node|
node.replace(node["href"])
}
}.to_html
Note that I changed Nokogiri::HTML.fragment
, since this is not a full HTML document (with doctype and all), which Nokogiri would feel obligated to add. Then, for each anchor node, replace it with the value of its href
attribute.