I'm trying to extract the header 1 (h1) from a html code like this:
<div class="cuerpo-not"><div mod="2323">
<h1>Jamón 5 Jotas, champagne Bollinger y King Alexander III</h1>
I'm using the function xpathSApply()
but it returns nothing:
xpathSApply(webpage, "//div[contains(@class, 'cuerpo-not')]/h1", xmlValue)
# list()
But when I use the same function without specify the class of header, it returns all the information below the class in this format:
xpathSApply(webpage, "//div[contains(@class, 'cuerpo-not')]", xmlValue)
# ;\n\t\t}\n\t}\n\t\n\t\n\tenviarNoticiaLeida_Site( 6916437,16 ) ;\n//]]>Jamón 5 Jotas, champagne Bollinger y King Alexander III\n\n\n\tPor J.M.
How can I extract the information as a string? In other web pages the previous code has worked.
I think you just need one more /
in your query down to h1
, as in //h1
instead of /h1
.
library(XML)
x <- '<div class="cuerpo-not"><div mod="2323">
<h1>Jamón 5 Jotas, champagne Bollinger y King Alexander III</h1>'
xpathSApply(htmlParse(x), "//div[contains(@class, 'cuerpo-not')]//h1", xmlValue)
# [1] "Jamón 5 Jotas, champagne Bollinger y King Alexander III"