Search code examples
javaxmlenumeratedtd

Inserting special characters like "#" in enumerated attribute values in XML DTD


I have the following xml.dtd file

<?xml version="1.0" encoding="UTF-8"?>

<!ELEMENT aliens (alien+,alienTesting)>    
<!ELEMENT alien (name,from,middleName?)>  
<!ELEMENT name (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT middleName (#PCDATA)>

<!--defining element attributes -->


<!ATTLIST alien aid ID #REQUIRED>
<!ATTLIST alien bioType CDATA #IMPLIED>

<!ATTLIST alien lang (Java|C|Python) "Java">

<!ELEMENT alienTesting (alienT*)>
<!ELEMENT alienT (#PCDATA)>

and here is the xml file

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE aliens SYSTEM "AleinDTD.dtd">

<aliens>

    <alien aid="a01">
        <name>Kasun </name>
        <from>Northwest</from>
    </alien>

    <alien aid="a02">
        <name>Madu</name>
        <from>south</from>

    </alien>

    <alienTesting>
        <alienT></alienT>

    </alienTesting>

</aliens>

What I want is to have Java,C#,Python in the enumerated attributes. So when I change it as below

<!ATTLIST alien lang (Java|C#|Python) "Java">

It gives me a error as

The enumerated type list must end with ')' in the "lang" attribute declaration

How to fix this, Thanks in advance


Solution

  • I'm afraid it won't be possible. Having a look at XML Specification, §3.3.1 Attribute types, the enumerated values should be Nmtoken, where the characters allowed are listed here:

    The ASCII symbols and punctuation marks, along with a fairly large group of Unicode symbol characters, are excluded from names because they are more useful as delimiters in contexts where XML names are used outside XML documents; providing this group gives those contexts hard guarantees about what cannot be part of an XML name. The character #x037E, GREEK QUESTION MARK, is excluded because when normalized it becomes a semicolon, which could change the meaning of entity references.

    [4] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

    [4a] NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]

    Grossely, you are allowed to use numbers and letters (from any language), hyphens, dots, underscores however not spaces, # ( ) [ ] | and other punctuation marks.