Currently, I am trying to get the inner HTML of an element on a page using nokogiri. However I'm not just getting the text of the element, I'm also getting its escape sequences. Is there a way i can suppress or remove them with nokogiri?
require 'nokogiri'
require 'open-uri'
page = Nokogiri::HTML(open("http://the.page.url.com"))
page.at_css("td[custom-attribute='foo']").parent.css('td').css('a').inner_html
this returns => "\r\n\t\t\t\t\t\t\t\tTheActuallyInnerContentThatIWant\r\n\t"
What is the most effective and direct nokogiri (or ruby) way of doing this?
page.at_css("td[custom-attribute='foo']")
.parent
.css('td')
.css('a')
.text # since you need a text, not inner_html
.strip # this will strip a result
Sidenote: css('td a')
is likely more efficient than css('td').css('a')
.