Search code examples
jythonrefactoringdom4j

Need help making Jython (dom4j) script more graceful :)


I have started using Jython as it seems to be a excellent language, and has proved to be so far.

I am using dom4j to manipulate and retrieve data from the DOM of a bunch of HTML files I have on disk. I have wrote the below script to check threw the DOM using Xpath for H1 tags and grab text, if a H1 tag is not present in the DOM it then searches for the title tag and grabs the text from that.

I am very new to Jython but I am sure there is way to perform the required task a lot more graceful than the below method, If I am right in thinking this, is there someone that can show me a better way to do it?

elemHolder = dom.createXPath('//xhtml:h1')
elemHolder.setNamespaceURIs(map)
elem = elemHolder.selectSingleNode(dom)
if elem != None:
    h1 = elem.getText()
else:
    elemHolder = dom.createXPath('//xhtml:title')
    elemHolder.setNamespaceURIs(map)
    elem = elemHolder.selectSingleNode(dom)
    if elem != None:
        title = elem.getText()
    else:
        title = "Page does not contain a H1 or title tag"

If anyone could help it would be great. Cheers


Solution

  • How about this (I don't claim to know much about Python, by the way, but this looks like an obvious first step):

    for path in ('//xhtml:h1', '//xhtml:title'):
        elemHolder = dom.createXPath(path)
        elemHolder.namespaceURIs = map
        elem = elemHolder.selectSingleNode(dom)
        if elem is not None:
            return (elem.localName, elem.text)
    
    return (None, "Page does not contain h1 or title tag")