I have started using Jython as it seems to be a excellent language, and has proved to be so far.
I am using dom4j to manipulate and retrieve data from the DOM of a bunch of HTML files I have on disk. I have wrote the below script to check threw the DOM using Xpath for H1 tags and grab text, if a H1 tag is not present in the DOM it then searches for the title tag and grabs the text from that.
I am very new to Jython but I am sure there is way to perform the required task a lot more graceful than the below method, If I am right in thinking this, is there someone that can show me a better way to do it?
elemHolder = dom.createXPath('//xhtml:h1')
elemHolder.setNamespaceURIs(map)
elem = elemHolder.selectSingleNode(dom)
if elem != None:
h1 = elem.getText()
else:
elemHolder = dom.createXPath('//xhtml:title')
elemHolder.setNamespaceURIs(map)
elem = elemHolder.selectSingleNode(dom)
if elem != None:
title = elem.getText()
else:
title = "Page does not contain a H1 or title tag"
If anyone could help it would be great. Cheers
How about this (I don't claim to know much about Python, by the way, but this looks like an obvious first step):
for path in ('//xhtml:h1', '//xhtml:title'):
elemHolder = dom.createXPath(path)
elemHolder.namespaceURIs = map
elem = elemHolder.selectSingleNode(dom)
if elem is not None:
return (elem.localName, elem.text)
return (None, "Page does not contain h1 or title tag")