Search code examples
pythonlxml.objectify

Getting contents of an lxml.objectify comment


I have an XML file that I'm reading using python's lxml.objectify library.

I'm not finding a way of getting the contents of an XML comment:

<data>
  <!--Contents 1-->
  <some_empty_tag/>
  <!--Contents 2-->
</data>

I'm able to retrieve the comment (is there a better way? xml.comment[1] does not seem to work):

xml = objectify.parse(the_xml_file).getroot()
for c in xml.iterchildren(tag=etree.Comment):
   print c.???? # how do i print the contets of the comment?
   # print c.text  # does not work
   # print str(c)  # also does not work

What is the correct way?


Solution

  • You just need to convert the child back to string to extract the comments, like this:

    In [1]: from lxml import etree, objectify
    
    In [2]: tree = objectify.fromstring("""<data>
       ...:   <!--Contents 1-->
       ...:   <some_empty_tag/>
       ...:   <!--Contents 2-->
       ...: </data>""")
    
    In [3]: for node in tree.iterchildren(etree.Comment):
       ...:     print(etree.tostring(node))
       ...:
    b'<!--Contents 1-->'
    b'<!--Contents 2-->'
    

    Of course you may want to strip the unwanted wrapping.