Search code examples
operatorsinterpreterpostscript

PostScript mark token


In PostScript if you have

[4 5 6]

you have the following tokens:

mark integer integer integer mark

The stack goes like this:

| mark |
| mark | integer |
| mark | integer | integer |
| mark | integer | integer | integer |
| array |

Now my question: Is the ]-mark operator a literal object or an executable object?

Am I correct that the [-mark is a literal object (just data) and that the ]-mark is an executable object (because you always need to create an array when you see this ]-mark operator) ?

PostScript Language Reference Manual section 3.3.2 gives me:

The [ and ] operators, when executed, produce a literal array object with the en-closed objects as elements. Likewise, << and >> (LanguageLevel 2) produce a literal dictionary object.

That is not clear for me if both [ ] operators are executable or only the ] operator.


Solution

  • Summary.

    All of these special tokens, [, ], <<, >>, come out of the scanner as executable names. [ and << are defined to yield a marktype object (so they are not operators per se, but they are executable names defined in systemdict where all the operators live). ] and >> are defined as procedures or operators which are executed just like any other procedure or operator. These use the counttomark operator to find the opening bracket. But all of these tokens are treated specially by the scanner, which recognizes them without surrounding whitespace since they are part of its delimiter set.

    Details.

    It all depends on when you look at it. Let's trace through what the interpreter does with these tokens. I'm going to illustrate this with a string, but it works just the same with a file.

    So if you have an input string

    ([4 5 6]) cvx exec
    

    cvx makes a literal object executable. The program stream is a file object also labeled executable. exec pushes an object on the Execution Stack, where it is encountered by the interpreter on the next iteration of the inner interpreter processing loop. When executing the program stream, the executable file object is topmost on the Execution Stack.

    The interpreter uses token to call the scanner. The scanner skips initial whitespace, then reads all non-whitespace characters up to the next delimiter, then attempts to interpret the string as a number, and failing that it becomes an executable name. The brackets are part of the set of delimiters, and so are termed 'self-delimiting'. So the scanner reads the one bracket character, stops reading because it's a delimiter, discovers it cannot be a number, so it yields an executable name.

    Top of Exec Stack | Operand Stack
    (4 5 6]) [        | 
    

    Next, the interpreter loop executes anything executable (unless it's an array). Executing a name means loading it from the dictionary, and then executing the definition if it's executable. [ is defined as a -mark- object, same as the name mark is defined. It's not technically an operator or a procedure, it's just a definition. Automatic loading happens because the name comes out of the scanner with the executable flag set.

    (4 5 6])  | -mark-
    

    The scanner then yields 4, 5, and 6 which are numbers and get pushed straight to the operand stack. 6 is delimited by the ] which is pushed back on the stream.

    (])  | -mark- 4 5 6
    

    The interpreter doesn't execute the numbers since they are not executable, but it would be just the same if it did. The action for executing a number is simply to push it on the stack.

    Then, finally the scanner encounters the right bracket ]. And that's where the magic happens. Self-delimited, it doesn't need to be followed by any whitespace. The scanner yields the executable name ] and the interpreter executes it by loading and it finds ...

    { counttomark array astore exch pop }
    

    Or maybe an actual operator that does this. But, yeah. counttomark yields the number of elements. array creates an array of that size. astore fills an array with elements from the stack. And exch pop to discard that pesky mark once and for all.

    For dictionaries, << is exactly the same as [. It drops a mark. Then you line up some key-value pairs, and >> is procedure that does something to effect of ...

    { counttomark dup dict begin 2 idiv { def } repeat pop currentdict end }
    

    Make a dictionary. Define all the pairs. Pop the mark. Yield the dictionary. This version of the procedure tries to create a fast dictionary by making it double-sized. Move the 2 idiv to before dup to make a small dictionary.

    So, to get philosophical, counttomark is the operator you're using. And it requires a special object-type that isn't used for anything else, the marktype object, -mark-. The rest is just syntactical sugar to let you access this stack-counting ability to create linear data-structures.

    Appendix

    Here's a procedure that models the interpreter loop reading from currentfile.

    {currentfile token not {exit} if dup type /arraytype ne {exec} if }loop
    

    exec is responsible for loading (and further executing) any executable names. You can see from this that token really is the name of the scanner; and that procedures (arrays) directly encountered by the interpreter loop are not executed (type /arraytype ne {exec} if).

    Using token on strings lets you do really cool stuff, however. For example, you can dynamically construct procedure bodies with substituted names. This is very much like a lisp macro.

    /makeadder { % n  .  { n add }
        1 dict begin
        /n exch def
        ({//n add}) token % () {n add} true
        pop exch pop % {n add}
        end
    } def
    

    token reads the entire procedure from the string, substituting the immediately-evaluated name //n with its currently defined value. Notice here that the scanner reads an executable array all at once, effectively executing [ ... ] cvx internally before returning (In certain interpreters, like my own xpost, this allows you to bypass the stack-size limits to build an array, because the array is built in separate memory. But Level 2 garbage collection makes this largely irrelevant).

    There is also the bind operator which modifies a procedure by replacing operator names with the operator objects themselves. These tricks help you to factor-out name lookups in speed-critical procedures (like inner loops).