up to a previous question I asked here WebResponse posting a null string
while the answer works for the question a new problem happened. When parsing the below xml
<?xml version="1.0" encoding="UTF-8"?>
<hml xmlns="http://schemas.nmdp.org/spec/hml/1.0.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://schemas.nmdp.org/spec/hml/1.0.1 http://schemas.nmdp.org/spec/hml/1.0.1/hml-1.0.1.xsd"
version="1.0.1" >
<!--
MIRING Element 1.1 requires the inclusion of an hmlid.
hmlid can be reported in the form of an ISO Object Identifier (OID)
"root" represents a unique publically registered organization
"extension" is a unique document id managed by the reporting organization.
-->
<hmlid root="2.34.48.32" extension="HML.3245662"/>
<!--
MIRING Element 1.2 requires the inclusion of a reporting-center.
reporting-center identifies the organization sending the HML message.
"reporting-center-id" is a unique identifier of the sender.
"reporting-center-context" reports the context/naming authority of the identifier.
-->
<reporting-center reporting-center-id="567"/>
<sample id="4555-6677-8">
<typing gene-family="HLA" date="2015-01-13">
<!--
MIRING Element 3 requires the inclusion of Genotyping information.
The Genotype should include all pertinent Loci, as well as a Genotype in a standard format.
GLStrings can be included either as plain text, or as a reference to a publicly
available service, such as GL Service (gl.nmdp.org)
-->
<allele-assignment date="2015-07-28" allele-db="IMGT/HLA" allele-version="3.17.0">
<haploid locus="HLA-A" method="DNA" type="02:20:01"/>
<glstring>
HLA-A*02:20:01
</glstring>
</allele-assignment>
<typing-method>
<!--
MIRING Element 6 requires platform documentation. This could be a peer-reviewed publication,
or an identifier of a procedure on a publicly available resource, such as NCBI GTR
-->
<sbt-ngs locus="HLA-A"
test-id="HLA-A.Test.1234"
test-id-source="AcmeGenLabs">
<raw-reads uri="rawreads/read1.fastq.gz"
availability="public"
format="fastq"
paired="1"
pooled="1"
adapter-trimmed="1"
quality-trimmed="0"/>
</sbt-ngs>
</typing-method>
<consensus-sequence date="2015-01-13">
<!--
MIRING Element 2 requires the inclusion of Reference Context.
The location and identifiers of the reference sequence should be specified.
start and end attributes are 0-based, and refer to positions on the reference sequence.
-->
<reference-database availability="public" curated="true">
<reference-sequence
name="HLA-A reference"
id="Ref111"
start="945000"
end="946000"
accession="GL000123.4"
uri="http://AcmeGenReference/RefDB/GL000123.4"/>
</reference-database>
<!--
MIRING Element 4 requires the inclusion of a consensus sequence.
The start and end positions are 0-based, and refer to positions on the reference sequence (reference-sequence-id)
Multiple consensus-sequence-block elements can be included sequentially.
-->
<consensus-sequence-block reference-sequence-id="Ref111"
start="945532"
end="945832"
strand="+"
phase-set="1"
expected-copy-number="1"
continuity="true"
description="HLA-A Consensus Sequence 4.5.67">
<!--
A sequence can be reported as plain text, or as a pointer to an external reference,
or as variants from a reference sequence.
-->
<sequence>
CCCAGTTCTCACTCCCATTGGGTGTCGGGTTTCCAGAGAAGCCAATCAGTGTCGTCGCGGTCGCTGTTCTAAAGCCCGCACGCACCCACCGGGACTCAGATTCTCCCCAGACGCCGAGGATGGCCGTCATGGCGCCCCGAACCCTCCTCCTGCTACTCTCGGGGGCCCTGGCCCTGACCCAGACCTGGGCGGGTGAGTGCGGGGTCGGGAGGGAAACCGCCTCTGCGGGGAGAAGCAAGGGGCCCTCCTGGCGGGGGCGCAGGACCGGGGGAGCCGCGCCGGGACGAGGGTCGGGCAGGT
</sequence>
<!--
MIRING Element 5 requires the inclusion of any relevant sequence polymorphisms.
These represent variants from the reference sequence.
start and end attributes are 0-based, and refer to positions on the reference sequence.
You can see this variant at positions 10 - 15 on the sequence. (945542 - 945532 = 10)
-->
<variant id="0"
reference-bases="GTCATG"
alternate-bases="ACTCCC"
start="945542"
end="945548"
filter="pass"
quality-score="95">
<!--
The functional effects of variants can be reported using variant-effect.
They should use Sequence Ontology (SO) variant effect terms.
-->
<variant-effect term="missense_variant"/>
</variant>
</consensus-sequence-block>
</consensus-sequence>
</typing>
</sample>
<!--
Multiple samples can be included in a single message.
Each sample should have it's own reference-database(s) even if they are identical to other samples' references.
-->
<sample id="4555-6677-9">
<typing gene-family="HLA" date="2015-01-13">
<allele-assignment date="2015-07-28" allele-db="IMGT/HLA" allele-version="3.17.0">
<haploid locus="HLA-A" method="DNA" type="02:20:01"/>
<glstring>
HLA-A*02:01:01:01
</glstring>
</allele-assignment>
<typing-method>
<sbt-ngs locus="HLA-A"
test-id="HLA-A.Test.1234"
test-id-source="AcmeGenLabs">
<raw-reads uri="rawreads/read2.fastq.gz"
availability="public"
format="fastq"
paired="1"
pooled="1"
adapter-trimmed="1"
quality-trimmed="0"/>
</sbt-ngs>
</typing-method>
<consensus-sequence date="2015-01-13">
<reference-database availability="public" curated="true">
<reference-sequence
name="HLA-A reference"
id="Ref112"
start="945000"
end="946000"
accession="GL000123.4"
uri="http://AcmeGenReference/RefDB/GL000123.4"/>
</reference-database>
<consensus-sequence-block
reference-sequence-id="Ref112"
start="945532"
end="945832"
strand="+"
phase-set="1"
expected-copy-number="1"
continuity="true"
description="HLA-A Consensus Sequence 4.5.89">
<sequence>
CCCAGTTCTCGTCATGATTGGGTGTCGGGTTTCCAGAGAAGCCAATCAGTGTCGTCGCGGTCGCTGTTCTAAAGCCCGCACGCACCCACCGGGACTCAGATTCTCCCCAGACGCCGAGGATGGCCGTCATGGCGCCCCGAACCCTCCTCCTGCTACTCTCGGGGGCCCTGGCCCTGACCCAGACCTGGGCGGGTGAGTGCGGGGTCGGGAGGGAAACCGCCTCTGCGGGGAGAAGCAAGGGGCCCTCCTGGCGGGGGCGCAGGACCGGGGGAGCCGCGCCGGGACGAGGGTCGGGCAGGT
</sequence>
</consensus-sequence-block>
</consensus-sequence>
</typing>
</sample>
</hml>
Which is the sample given for the validator so I know it works. However when I pass it through my restful POST code:
@POST
@Path("/Validate")
@Produces("application/xml")
public String validate(@FormParam("xml") String xml)
{
System.out.println(xml);
try {
Client client = Client.create();
WebResource webResource = client.resource("http://miring.b12x.org/validator/ValidateMiring/");
// POST method
ClientResponse response = webResource.accept("application/xml").post(ClientResponse.class,"xml="+xml);
// check response status code
if (response.getStatus() != 200) {
throw new RuntimeException("Failed : HTTP error code : " + response.getStatus());
}
// display response
String output = response.getEntity(String.class);
System.out.println("Output from Server .... ");
System.out.println(output + "\n");
return output;
} catch (Exception e) {
e.printStackTrace();
}
return "Oops";
}
Everything passes through perfectly fine except for Strand="+" which for some reason drops the + and gets the error message of The value '' of attribute 'strand' on element 'consensus-sequence-block' is not valid with respect to its...'
I tried it with all of strands enumerations +,-,-1,1 and all of them work except for +.
Using the WEB UI (miring.b12x.org) it works perfectly.
Is there something with parsing with SAX that could cause a + to be dropped or any reason a certain enumeration would be dropped?
Thank you
EDIT: Here is the output received:
Output from Server ....
<?xml version="1.0" encoding="UTF-8"?>
<miring-report xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
timestamp="07/19/2016 15:07:31"
xsi:noNamespaceSchemaLocation="http://schemas.nmdp.org/spec/miringreport/1.0/miringreport.xsd">
<hml-compliant>reject</hml-compliant>
<miring-compliant>reject</miring-compliant>
<hmlid extension="HML.3245662" root="2.34.48.32"/>
<samples compliant-sample-count="4"
noncompliant-sample-count="0"
sample-count="2">
<sample hml-compliant="true" id="4555-6677-8" miring-compliant="true"/>
<sample hml-compliant="true" id="4555-6677-9" miring-compliant="true"/>
</samples>
<fatal-validation-errors>
<miring-result miring-rule-id="reject" severity="fatal">
<description>[cvc-attribute.3:, The, value, ', ', of, attribute, 'strand', on, element, 'consensus-sequence-block', is, not, valid, with, respect, to, its, type,, 'null'.]</description>
<solution>Verify that your HML file is well formed, and conforms to http://schemas.nmdp.org/spec/hml/1.0.1/hml-1.0.1.xsd</solution>
</miring-result>
<miring-result miring-rule-id="reject" severity="fatal">
<description>[cvc-attribute.3:, The, value, ', ', of, attribute, 'strand', on, element, 'consensus-sequence-block', is, not, valid, with, respect, to, its, type,, 'null'.]</description>
<solution>Verify that your HML file is well formed, and conforms to http://schemas.nmdp.org/spec/hml/1.0.1/hml-1.0.1.xsd</solution>
</miring-result>
<miring-result miring-rule-id="reject" severity="fatal">
<description>[cvc-enumeration-valid:, Value, ', ', is, not, facet-valid, with, respect, to, enumeration, '[-1,, 1,, +,, -]'., It, must, be, a, value, from, the, enumeration.]</description>
<solution>Verify that your HML file is well formed, and conforms to http://schemas.nmdp.org/spec/hml/1.0.1/hml-1.0.1.xsd</solution>
</miring-result>
<miring-result miring-rule-id="reject" severity="fatal">
<description>[cvc-enumeration-valid:, Value, ', ', is, not, facet-valid, with, respect, to, enumeration, '[-1,, 1,, +,, -]'., It, must, be, a, value, from, the, enumeration.]</description>
<solution>Verify that your HML file is well formed, and conforms to http://schemas.nmdp.org/spec/hml/1.0.1/hml-1.0.1.xsd</solution>
</miring-result>
</fatal-validation-errors>
<validation-warnings>
<miring-result miring-rule-id="1.2.b" severity="warning">
<description>The node reporting-center is missing a reporting-center-context attribute.</description>
<solution>Please add a reporting-center-context attribute to the reporting-center node. You can use reporting-center-context to specify the naming authority of the reporting center identifier. Reporting-center-context is not explicitly required.</solution>
<xpath>/hml[1]/reporting-center[1]</xpath>
</miring-result>
</validation-warnings>
</miring-report>
You don’t set the type of your WebResource, and I don’t know what the default Content-Type of the request is, but I suspect it is application/x-www-form-urlencoded, which means +
is being treated as a space. If that is the case, changing "xml="+xml to "xml=" + URLEncoder.encode(xml, "UTF-8")
may address the problem.
The application/x-www-form-urlencoded format is the default format for HTML form submissions, as described in the HTML 4.01 specification. The the documentation for the URLEncoder class also describes this format.
In that format, a +
character represents a space, so the strand
attribute contains a single space. Except, the Attribute-Value Normalization section of the XML 1.0 specification states:
If the attribute type is not CDATA, then the XML processor MUST further process the normalized attribute value by discarding any leading and trailing space (#x20) characters …
So, that single space is then normalized into the empty string (when all leading and trailing space is removed). The empty string, strand=''
, does not conform to the XML schema you are referencing, http://schemas.nmdp.org/spec/hml/1.0.1/hml-1.0.1.xsd .
URLEncoder.encode escapes all “reserved” characters, including +
, as percent-escapes, and then escapes spaces as +
. The server expects this format (almost certainly because a Content-Type: application/x-www-form-urlencoded
header is present in the HTTP request), and decodes the +
and percent-escapes back to the original XML.