I am currently trying to extract the tag element < dc:title >
from an epub in Java. However, i tried using
doc.getDocumentElement().getElementsByTagName("dc:title"));
and it only showed 2nd element :com.sun.org.apache.xerces.internal.dom.DeepNodeListImpl
. I would like to know how can I extract < dc:tittle >
?
Here is my code:
File fXmlFile = new File("file directory");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
doc.getDocumentElement().normalize();
System.out.println("1st element :" + doc.getElementsByTagName("dc");
System.out.println("2nd element :" + doc.getDocumentElement().getElementsByTagName("dc:title"));
System output:
1st element : com.sun.org.apache.xerces.internal.dom.DeepNodeListImpl@4f53e9be
2nd element :com.sun.org.apache.xerces.internal.dom.DeepNodeListImpl@e16e1a2
Added Sample Data
<dc:title>
<![CDATA[someData]]>
</dc:title>
<dc:creator>
<![CDATA[someData]>
</dc:creator>
<dc:language>someData</dc:language>
The method getElementsByTagName(String)
is return a List of matching elements (note plural 's'). You then need to specify which element (such as by using .item(index)
to access a Node instance) you want to use. Therewith, you can using getNodeValue()
on that Node
object.
EDITED: because of the CDATA element, rather use Node.getTextContent()
:
NodeList elems = doc.getElementsByTagName("dc:title");
Node item = elems.item(0);
System.out.println(item.getTextContent());