Search code examples
xmlxsdtei

How to apply regex on element which is complexType and mixed


I have generated a TEI xsd, that I have to make some changes on, I have "w" element that I have to apply a regex on its text content, let's say that I want the text to match [0-9].

Here's my xsd element :

  <xs:element name="w">
    <xs:annotation>
      <xs:documentation>(word) represents a grammatical (not necessarily orthographic) word. [17.1. Linguistic Segment Categories 17.4.2. Lightweight Linguistic Annotation]</xs:documentation>
    </xs:annotation>
    <xs:complexType mixed="true">
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element ref="tei:w"/>
        <xs:element ref="tei:pc"/>
      </xs:choice>
      <xs:attributeGroup ref="tei:att.global.attributes"/>
      <xs:attributeGroup ref="tei:att.segLike.attributes"/>
      <xs:attributeGroup ref="tei:att.typed.attributes"/>
      <xs:attributeGroup ref="tei:att.linguistic.attributes"/>
      <xs:attributeGroup ref="tei:att.notated.attributes"/>
    </xs:complexType>
  </xs:element>

In the example below, the first one should be valid, and not the second.

<w lemma="ttt" type="PRP">5</w>
<w lemma = "pied" type="NOM">pieds</w>

Things I have tried but didn't work :

<xs:assert test="matches($value,'[0-9]')"/>
<xs:assert test="matches(w/text(),'[0-9]')"/>
<xs:assert test="matches($w,'[0-9]')"/>

Thanks for helping.


Solution

  • Doing e.g. <xs:assert test="matches(., '^[0-9]$')"/>

    <xs:complexType mixed="true">
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element ref="tei:w"/>
        <xs:element ref="tei:pc"/>
      </xs:choice>
      <xs:attributeGroup ref="tei:att.global.attributes"/>
      <xs:attributeGroup ref="tei:att.segLike.attributes"/>
      <xs:attributeGroup ref="tei:att.typed.attributes"/>
      <xs:attributeGroup ref="tei:att.linguistic.attributes"/>
      <xs:attributeGroup ref="tei:att.notated.attributes"/>
      <xs:assert test="matches(., '^[0-9]$')"/>
    </xs:complexType>
    

    should suffice to check the element contains e.g. only a single digit.