Search code examples
rubycapybaranokogirivml

Replace image src in vml markup with globally available images using Nokogiri


Is it possible to find outlook specific markup via Capybara/Nokogiri ?

Given the following markup (erb <% %> tags are processed into regular HTML)

...
<div>
<!--[if gte mso 9]>
    <v:rect
        xmlns:v="urn:schemas-microsoft-com:vml" fill="true" stroke="false"
        style="width:<%= card_width %>px;height:<%= card_header_height %>px;"
    >
        <v:fill type="tile"
            src="<%= avatar_background_url.split('?')[0] %>"
            color="<%= background_color %>" />
        <v:textbox inset="0,0,0,0">
<![endif]-->
<div>

How can I get the list of <v:fill ../> tags ? (or eventually how can I get the whole comment if finding the tag inside a conditional comment is a problem)

I have tried the following

doc.xpath('//v:fill')

*** Nokogiri::XML::XPath::SyntaxError Exception: ERROR: Undefined namespace prefix: //v:fill

DO I need to somehow register the vml namespace ?

EDIT - following @ThomasWalpole approach

doc.xpath('//comment()').each do |comment_node|
  vml_node_match = /<v\:fill.*src=\"(?<url>http\:[^"]*)"[^>]*\/>/.match(comment_node)
  if vml_node_match
    original_image_uri = URI.parse(vml_node_match['url'])
    vml_tag = vml_node_match[0]
    handle_vml_image_replacement(original_image_uri, comment_node, vml_tag)
  end

My handle_vml_image_replacement then ends up calling the following replace_comment_image_src

def self.replace_comment_image_src(node:, comment:, old_url:, new_url:)
  new_url = new_url.split('?').first # VML does not support URL with query params
  puts "Replacing comment src URL in #{comment} by #{new_url}"
  node.content = node.content.gsub(old_url, new_url)
end

But then it feels like the comment is actually no longer a "comment" and I can sometimes see the HTML as if it was escaped... I am most likely using the wrong method to change the comment text with Nokogiri ?


Solution

  • Here's the final code that I used for my email interceptor, thanks to @Thomas Walpole and @sschmeck for help along the way.

    My goal was to replace images (linking to localhost) in VML markup with globally available images for testing with services like MOA or Litmus

    doc.xpath('//comment()').each do |comment_node|
      # Note : cannot capture beginning of tag, since it might span across several lines
      src_attr_match = /.*src=\"(?<url>http[s]?\:[^"]*)"[^>]*\/>/.match(comment_node)
      next unless src_attr_match
      original_image_uri = URI.parse(src_attr_match['url'])
      handle_comment_image_replacement(original_image_uri, comment_node)
    end
    

    WHich is later calling (after picking an url replacement strategy depending on source image type) :

    def self.replace_comment_image_src(node:, old_url:, new_url:)
      new_url = new_url.split('?').first
      node.native_content = node.content.gsub(old_url, new_url)
    end