I am trying to parse an HL7 message definition from xsd. I have my schema definition split up between two files. First file contains actual message definition and the second contains segment definitions within the message.
I am trying to tweak an example code to parse XML from here https://gist.github.com/helderdarocha/8791651. I don't understand why SAX parser doesn't follow references.
Here are two examples of my xsd definitions.
First file has the following definition
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
targetNamespace="http://www.xsd_porcessor.org/parser"
xmlns="http://www.xsd_porcessor.org/parser"
elementFormDefault="qualified"
attributeFormDefault="unqualified">
<xs:include schemaLocation="segments.xsd"/>
<xs:complexType name="ADT.01.MESSAGE">
<xs:sequence>
<xs:element maxOccurs="1" minOccurs="1" ref="MSH"/>
<xs:element maxOccurs="1" minOccurs="1" ref="EVN"/>
<xs:element maxOccurs="1" minOccurs="1" ref="PID"/>
<xs:element maxOccurs="1" minOccurs="1" ref="PV1"/>
<xs:element maxOccurs="1" minOccurs="1" ref="IN1"/>
<xs:element maxOccurs="1" minOccurs="1" ref="IN2"/>
</xs:sequence>
</xs:complexType>
<xs:element name="ADT.A01" type="ADT.01.MESSAGE"/>
</xs:schema>
The second file has the following header
<?xml version="1.1" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
targetNamespace="http://www.xsd_porcessor.org/parser"
xmlns="http://www.xsd_porcessor.org/parser"
elementFormDefault="qualified"
attributeFormDefault="unqualified">
...and a multitude segment definitions represented as complexTypes. Bellow is example of one
<xs:complexType name="MSH.SEGMENT">
<xs:sequence>
<xs:element maxOccurs="1" minOccurs="1" ref="MSH.1.FieldSeparator"/>
<xs:element maxOccurs="1" minOccurs="1" ref="MSH.2.ServiceString"/>
<xs:element maxOccurs="1" minOccurs="1" ref="MSH.3.SendingApplication"/>
<xs:element maxOccurs="1" minOccurs="0" ref="MSH.4.SendingFacility"/>
<xs:element maxOccurs="1" minOccurs="0" ref="MSH.5.ReceivingApplication"/>
<xs:element maxOccurs="1" minOccurs="0" ref="MSH.6.ReceivingFacility"/>
<xs:element maxOccurs="1" minOccurs="1" ref="MSH.7.DateTimeOfMessage"/>
<xs:element maxOccurs="1" minOccurs="0" ref="MSH.8.Security"/>
<xs:element maxOccurs="1" minOccurs="1" ref="MSH.9.MessageType"/>
<xs:element maxOccurs="1" minOccurs="1" ref="MSH.10.MessageControlID"/>
<xs:element maxOccurs="1" minOccurs="1" ref="MSH.11.ProcessingID"/>
<xs:element maxOccurs="1" minOccurs="1" ref="MSH.12.VersionID"/>
<xs:element maxOccurs="1" minOccurs="0" ref="MSH.13.SequenceNumber"/>
<xs:element maxOccurs="1" minOccurs="0" ref="MSH.14.ContinuationPointer"/>
<xs:element maxOccurs="1" minOccurs="0" ref="MSH.15.AcceptAcknowledgmentType"/>
<xs:element maxOccurs="1" minOccurs="0" ref="MSH.16.ApplicationAcknowledgmentType"/>
<xs:element maxOccurs="1" minOccurs="0" ref="MSH.17.CountryCode"/>
<xs:element maxOccurs="unbounded" minOccurs="0" ref="MSH.18.CharacterSet"/>
<xs:element maxOccurs="1" minOccurs="0" ref="MSH.19.PrincipalLanguageOfMessage"/>
<xs:element maxOccurs="1" minOccurs="0" ref="MSH.20.AlternateCharacterSetHandlingScheme"/>
<xs:element maxOccurs="unbounded" minOccurs="0" ref="MSH.21.MessageProfileIdentifier"/>
<xs:element maxOccurs="1" minOccurs="0" ref="MSH.22.SendingResponsibleOrganization"/>
<xs:element maxOccurs="1" minOccurs="0" ref="MSH.23.ReceivingResponsibleOrganization"/>
<xs:element maxOccurs="1" minOccurs="0" ref="MSH.24.SendingNetworkAddress"/>
<xs:element maxOccurs="1" minOccurs="0" ref="MSH.25.ReceivingNetworkAddress"/>
</xs:sequence>
</xs:complexType>
<xs:element name="MSH" type="MSH.SEGMENT"/>
Here is a tweaked parser itself
package ca.parser.xml;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import javax.xml.parsers.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
public class SAXReaderExample {
public static final String PATH = "resources";
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
XMLReader reader = sp.getXMLReader();
reader.setContentHandler(new SchemaSaxHandler());
reader.parse(new InputSource(new FileInputStream(new File(PATH, "messages.xsd"))));
}
}
class SchemaSaxHandler extends DefaultHandler {
// temporary - always null when tag closes
private String currentSimpleTypeName;
private String currentSimpleTypeBaseType;
private SchemaElement currentElement;
private SchemaComplexType currentComplexType;
private List<SchemaElement> currentSequence;
// cumulative - will use the data when XML finishes
private Map<String, String> simpleTypes = new HashMap<>();
private Map<String, SchemaComplexType> complexTypes = new HashMap<>();
private SchemaElement rootElement;
@Override
public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException {
if (qName.equals("xs:simpleType")) {
currentSimpleTypeName = atts.getValue("name");
}
if (qName.equals("xs:restriction")) {
currentSimpleTypeBaseType = atts.getValue("base");
}
if (qName.equals("xs:complexType")) {
currentComplexType = new SchemaComplexType();
currentComplexType.setName(atts.getValue("name"));
}
if (qName.equals("xs:sequence")) {
currentSequence = new ArrayList<>();
}
if (qName.equals("xs:element")) {
currentElement = new SchemaElement();
if (atts.getValue("name")==null) {
currentElement.setName(atts.getValue("ref"));
}else {
currentElement.setName(atts.getValue("name"));
}
currentElement.setType(atts.getValue("type"));
currentElement.setReference(atts.getValue("ref"));
if (currentSequence != null) {
currentSequence.add(currentElement);
} else {
rootElement = currentElement;
}
}
if (qName.equals("xs:attribute")) {
currentComplexType.addAttribute(atts.getValue("name"), atts.getValue("type"));
}
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if (qName.equals("xs:simpleType")) {
simpleTypes.put(currentSimpleTypeName, currentSimpleTypeBaseType);
currentSimpleTypeName = null;
currentSimpleTypeBaseType = null;
}
if (qName.equals("xs:complexType")) {
complexTypes.put(currentComplexType.getName(), currentComplexType);
currentComplexType = null;
}
if (qName.equals("xs:sequence")) {
if (currentComplexType != null) {
currentComplexType.setChildren(currentSequence);
}
currentSequence = null;
}
}
@Override
public void endDocument() throws SAXException {
makeTree(rootElement);
printTree(rootElement, "");
}
public void makeTree(SchemaElement element) {
SchemaComplexType type = complexTypes.get(element.getType());
if (type != null) {
List<SchemaElement> children = type.getChildren();
element.setChildren(children);
for (SchemaElement child : children) {
makeTree(child);
}
element.setAttributes(type.getAttributes());
} else {
element.setType(simpleTypes.get(element.getType()));
}
}
private void printTree(SchemaElement element, String indent) {
System.out.println(indent + element.getName() + " : " + element.getType());
Map<String, String> attributes = element.getAttributes();
if (attributes != null) {
for (Map.Entry<String, String> entry : attributes.entrySet()) {
System.out.println(" @" + entry.getKey() + " : " + simpleTypes.get(entry.getValue()));
}
}
List<SchemaElement> children = element.getChildren();
if (children != null) {
for (SchemaElement child : children) {
printTree(child, indent + " ");
}
}
}
class SchemaElement {
private String name;
private String type;
private String reference;
public String getReference() {
return reference;
}
public void setReference(String reference) {
this.reference = reference;
}
private List<SchemaElement> children;
private Map<String, String> attributes;
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getType() {
return type;
}
public void setType(String type) {
this.type = type;
}
public List<SchemaElement> getChildren() {
return children;
}
public void setChildren(List<SchemaElement> children) {
this.children = children;
}
public Map<String, String> getAttributes() {
return attributes;
}
public void setAttributes(Map<String, String> attributes) {
this.attributes = attributes;
}
}
class SchemaComplexType {
private String name;
private String reference;
private List<SchemaElement> children;
private Map<String, String> attributes = new HashMap<>();
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public List<SchemaElement> getChildren() {
return children;
}
public void setChildren(List<SchemaElement> children) {
this.children = children;
}
public Map<String, String> getAttributes() {
return attributes;
}
public void setAttributes(Map<String, String> attributes) {
this.attributes = attributes;
}
public String getReference() {
return reference;
}
public void setReference(String reference) {
this.reference=reference;
}
public void addAttribute(String name,String type) {
attributes.put(name, type);
}
}
Any ideas what is going? You help is appreciated.
Thank you.
It sounds like there are two separate concepts at work here.
If a validating SAX parser is being used to parse a piece of XML, and validate it against its schema:
<xmlRootElement
xmlns="http://www.xsd_porcessor.org/parser"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.xsd_porcessor.org/parser messages.xsd">
... etc, then clearly when that schema was resolved behind the scenes, the parser would need to follow any references and imports in there.
However if the .xsd
itself is the XML being parsed, then as you've already found, it's elements will be directly passed into the ContentHandler
. The SchemaSaxHandler
above will need to do some more work to learn each xs:element
- like you're already doing for the simpleTypes
and complexTypes
Maps - so they can later be resolved from a ref
.
If what you need is the model of the resolved elements and types in the XML schema though, it would be worth exploring that behind the scenes schema model - in an XML Parser such as Xerces. As a starting point, this is using XNI - the Xerces Native Interface:
File baseDir = new File("/myschemas");
XMLEntityResolver entityResolver = new XMLEntityResolver() {
@Override
public XMLInputSource resolveEntity(
XMLResourceIdentifier resourceIdentifier)
throws XNIException, IOException {
// E.g. resourceIdentifier.getLiteralSystemId() will be segments.xsd
String uri = new File(baseDir,
resourceIdentifier.getLiteralSystemId()).toURI()
.toString();
return new XMLInputSource(null, uri, null);
}
};
XMLSchemaLoader loader = new XMLSchemaLoader();
loader.setEntityResolver(entityResolver);
XSModel model = loader
.loadURI(new File(baseDir, "messages.xsd").toURI()
.toString());
System.out.println(model.getComponents(XSConstants.ELEMENT_DECLARATION));
This outputs such as:
{http://www.xsd_porcessor.org/parser}ADT.A01="http://www.xsd_porcessor.org/parser":ADT.A01