Search code examples
pmml

What is the official specification for PMML substring handling of long strings?


Given a substring Definition of

<Apply function="substring">
  <FieldRef field="Input"/>
  <Constant>1</Constant>
  <Constant>2</Constant>
</Apply>

What is the official specification on what will happen if the string "helloworld" is the input?

Is it not allowed, or should something else occur?


Solution

  • Please refer to the specification of PMML built-in function "substring", which is based on XQuery built-in function "substring". In Java, your expression translates to the following input.substring((1 - 1), (1 - 1) + 2).

    The important thing to notice is that in PMML and XQuery the indexing of strings starts from position 1 (not 0). Also, there is no such thing as StringIndexOutOfBoundsException when working with this function. If you are interested in obtaining the remainder of a string, then you can pass an arbitrarily large number as the length argument.