Search code examples
javaxmljaxbsaxparser

How to preserve "Character Reference Codes"(<) while reading content from XML file


I have used below code to read content from xml file

public static void toXSD() {
    SAXBuilder saxBuilder = new SAXBuilder();
    Document document;
        try {
            document = saxBuilder.build(new File("D:\\Users\\schintha\\Desktop\\Work\\\test_files\\SUMMARY_11.xml"));
            for (Element element : document.getRootElement().getChildren()) {
                System.out.println("Name = " + element.getName());
                System.out.println("Value = " + element.getValue());
                System.out.println("Text = " + element.getText());                  
            }        
        } catch (JDOMException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();}}

My input file is shown below:

<?xml version="1.0" encoding="UTF-8"?>
<temp>
   <position>&lt;</position>   
</temp>

Output is

Name = position
Value = <
Text = <

In this regard , i request to let me know how to retrieve &lt; as is, instead of "<".since it is not starting of tag but a value of tag "position"


Solution

  • Using text-commons org.apache.commons.text.StringEscapeUtils class escapeXml10 method, we can escape the character reference codes in the xml tags - StringEscapeUtils.escapeXml10(element.getValue())

    Full example is shown below

    public static void toXSD() {
        SAXBuilder saxBuilder = new SAXBuilder();
        Document document;
            try {
                document = saxBuilder.build(new File("D:\\Users\\schintha\\Desktop\\Work\\\test_files\\SUMMARY_11.xml"));
                for (Element element : document.getRootElement().getChildren()) {
                    System.out.println("Name = " + element.getName());
                    System.out.println("Value = " + StringEscapeUtils.escapeXml10(element.getValue()));                                  
                }        
            } catch (JDOMException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();}}
    

    Same input file used in question:

    <?xml version="1.0" encoding="UTF-8"?>
    <temp>
       <position>&lt;</position>   
    </temp>
    

    got expected output is(value of position tag without parsing)

    Name = position
    Value = &lt;