OracleWebRowSet
has a writeXml(FileWriter)
method to convert a resultset to an XML file.
When used, it fails to escape the special characters like Ampersand and thus the generated XML file fails to conform to XML 1.0 standard
Though the default WebRowSet
from rt.jar works just fine but there are specific reasons for me to use OracleWebRowSet
I tried StringEscapeUtils.EscapeXML10.translate()
but it doesn't work like a rule but as a immediate string translator.
eg:
OracleWebRowSet owrs = new OracleWebRowSet();
FileWriter fWriter = = new FileWriter("file1.xml");
owrs.setEscapeProcessing(true);
//this is where resultset is converted to XML but not escaped properly
owrs.writeXml(fWriter);
fWriter.flush();
I'm in a bind... I might try to read the generated XML as a text file and escape the contents and write it back to the file... but that doesn't sound efficient when processing 700 xml files at a stretch
solutions? anyone?
I found a workaround to fix this... But I'm not sure if its the right way...
here it goes...
UPDATED:
extend the java.io.FileWriter
and override the write(String)
method
package customizations.java.io;
import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.commons.lang3.StringEscapeUtils;
public class XMLFileWriter extends java.io.FileWriter {
private Pattern html_prefix_pattern;
private Pattern html_suffix_pattern;
private Pattern common_tags_pattern1;
private Pattern common_tags_pattern2;
private Pattern common_tags_pattern3;
public XMLFileWriter(String fileName) throws IOException {
super(fileName);
html_prefix_pattern = Pattern.compile("(?i)(.*)<[\\s]*html(.*)>(.*)", Pattern.DOTALL);
html_suffix_pattern = Pattern.compile("(?i)(.*)<[\\s]*/html[\\s]*>(.*)", Pattern.DOTALL);
common_tags_pattern1 = Pattern.compile("(.+)<[^/?](\"[^\"]*\"|'[^']*'|[^'\">])*[^?]>(.+)", Pattern.DOTALL);
common_tags_pattern2 = Pattern.compile("^<[^/?](\"[^\"]*\"|'[^']*'|[^'\">])*[^?]>(.+)", Pattern.DOTALL);
common_tags_pattern3 = Pattern.compile("(.+)<[^/?](\"[^\"]*\"|'[^']*'|[^'\">])*[^?]>$", Pattern.DOTALL);
}
@Override
public void write(String str) throws IOException {
Matcher html_prefixMatcher = html_prefix_pattern.matcher(str);
Matcher html_suffixMatcher = html_suffix_pattern.matcher(str);
boolean cdata_proc = false;
//if(str.matches("(?i)(.*)[\\s]*<[\\s]*/html[\\s]*>[\\s]*(.*)")) {
//for CLOB data in oracle table, html tags in content will violate the XMLWebRowSet Schema Structure. So enclose them in CDATA
if(html_prefixMatcher.find()) {
str = "<![CDATA["+str;
cdata_proc = true;
}
if(html_suffixMatcher.find()) {
str = str+"]]>";
cdata_proc = true;
}
if(!cdata_proc) {
Matcher common_tagsMatcher1 = common_tags_pattern1.matcher(str);
Matcher common_tagsMatcher2 = common_tags_pattern2.matcher(str);
Matcher common_tagsMatcher3 = common_tags_pattern3.matcher(str);
if(str.matches("(.*)&(.*)") || common_tagsMatcher1.find() || common_tagsMatcher2.find() || common_tagsMatcher3.find()) {
str = StringEscapeUtils.ESCAPE_XML10.translate(str);
}
}
super.write(str);
}
}
so whenever the OracleWebRowset
uses the write()
method, our code kicks in and check if the text needs to be escaped... we need to limit the StringEscapeUtils
or else, the XML tags will also be escaped resulting in an awkward xml file
the modified code would look like:
OracleWebRowSet owrs = new OracleWebRowSet();
XMLFileWriter fWriter = = new XMLFileWriter("file1.xml");
owrs.setEscapeProcessing(true);
//this is where resultset is converted to XML but not escaped properly
owrs.writeXml(fWriter);
fWriter.flush();
hope this helps anyone who stumbles across this issue... If this code needs to be perfected, post your suggestions guys