Search code examples
xmlregexblacklist

Blacklist words in XML


My requirement is : "Dont allow blacklisted words to appear in a specific XML tag".

I am trying with xs:restriction using XML regex pattern.

I referenced the following link : Restrict word list in XML schema.

eg: BLACKLISTED WORDS : byte,bing,ding

The problem : If the words start with the same letter (b), byte passes for bing condition and vice versa.

Is there an AND operator I can use ? Is there any other simpler way ?

Thanks in advance !!


Solution

  • As per Michael Kay's answer, I have implemented the logic using XSD 1.1. (I had to change $value to @name)

    Steps: 1. Used the following code with latest Xerces 1.1 implementation jar files.

    <xs:element name="random-element">
        <xs:complexType>
            <xs:attribute name="name" use="required" type="xs:string" />
            <xs:attribute name="value" use="optional" type="xs:string" />
            <xs:assert test="not(tokenize(@name, '\s+') = ('byte', 'bing', 'ding'))"/>
        </xs:complexType>
    </xs:element>
    

    3. Validated using the following code :

     final SchemaFactory schemaFactory = SchemaFactory.newInstance(Constants.W3C_XML_SCHEMA11_NS_URI);
     final Schema schema = schemaFactory.newSchema(schemaFile);
     final Validator validator = schema.newValidator();
     validator.validate(xmlFile);
    

    The constant W3C_XML_SCHEMA11_NS_URI is very important, else it will fail.