Search code examples
schemelispsymbolsidentifiermit-scheme

Difference between an identifier and symbol in scheme?


I am trying to understand how the Scheme meta-circular evaluator handles quoted expressions differently than symbolic data.

The accepted answer Stack Overflow question What exactly is a symbol in lisp/scheme? defines the "symbol" data object in Scheme:

In Scheme and Racket, a symbol is like an immutable string that happens to be interned

The accepted answer writes that in Scheme, there is a built-in correspondence between identifiers and symbols:

To call a method, you look up the symbol that corresponds to the method name. Lisp/Scheme/Racket makes that really easy, because the language already has a built-in correspondence between identifiers (part of the language's syntax) and symbols (values in the language).

To understand the correspondance, I read the page "A Note on Identifiers" in An Introduction to Scheme and Its Implementation, which says

Scheme identifiers (variable names and special form names and keywords) have almost the same restrictions as Scheme symbol object character sequences, and it's no coincidence. Most implementations of Scheme happen to be written in Scheme, and symbol objects are used in the interpreter or compiler to represent variable names.

Based on the above, I'm wondering if my understanding of what is happening in the following session is correct:

user@host:/home/user $ scheme
MIT/GNU Scheme running under GNU/Linux
Type `^C' (control-C) followed by `H' to obtain information about interrupts.

Copyright (C) 2011 Massachusetts Institute of Technology
This is free software; see the source for copying conditions. There is NO warranty; not even for
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Image saved on Sunday February 7, 2016 at 10:35:34 AM
  Release 9.1.1 || Microcode 15.3 || Runtime 15.7 || SF 4.41 || LIAR/x86-64 4.118 || Edwin 3.116

1 ]=> (define a (lambda (i) (+ i 1)))

;Value: a

1 ]=> a

;Value 13: #[compound-procedure 13 a]

1 ]=> (quote a)

;Value: a

1 ]=> (eval a (the-environment))

;Value 13: #[compound-procedure 13 a]

1 ]=> (eval (quote a) (the-environment))

;Value 13: #[compound-procedure 13 a]

1 ]=>
  1. The first define statement is a special form captured by the evaluator, which creates a binding for the symbol a to a compound procedure object in the global environment.

  2. Writing a in the top-level causes the evaluator to receive the symbol object 'a, which evaluates to the compound-procedure object that 'a points to in the global environment.

  3. Writing (quote a) in the top-level causes the evaluator to receive a list of symbols ('quote 'a)); this expression is a special form captured by the evaluator, which evaluates to the quoted expression, namely the symbol object 'a.

  4. Writing (eval a (the-environment)) causes the evaluator to receive a list of symbols ('eval 'a ...) (ignoring the environment). The evaluator performs a lookup for 'eval, which yields the eval compiled procedure object, a lookup for 'a, which yields the compound-procedure. Finally, the top-level evaluator applies the eval procedure to its arguments, since a compound-procedure is self-evaluating (not true in Scheme48), the final value of the expression is the compound-procedure itself.

  5. Writing (eval (quote a) (the-environment)) causes the evaluator to receive a list of symbols ('eval ('quote 'a) ...). The evaluator performs a lookup for 'eval, which yields the eval compiled procedure object. It evaluates the expression ('quote 'a) which yields the symbol object 'a. Finally, the top-level evaluator applies the eval procedure to 'a, which is a symbol object and therefore invokes an environment lookup that yields the compound procedure.

Does this explanation correctly describe (at a high level) how a Scheme interpreter might differentiate between symbol objects and identifiers in the language? Are there fundamental misunderstandings in these descriptions?


Solution

  • The R6RS Scheme report, in 4.2 Lexical Syntax, uses the term identifer to refer to the character-level syntax. That is to say, roughly, identifier means something like the lexical token from which a symbol is constructed when the expression becomes an object. However, elsewhere in the text, identifier seems to be freely used as a synonym for symbol. E.g. "Scheme allows identifiers to stand for locations containing values. These identifiers are called variables." (1.3 Variables and Binding). Basically, the spec seems to be loose with regard to this terminology. Depending on context, an identifier is either the same thing as a symbol (an object), or else <identifier>: the grammar category from the lexical syntax.

    In a sentence which says something like that a certain character may or may not appear in an identifier, the context is clearly lexical syntax, because a symbol object is an atom and not a character string; it doesn't contain anything. But when we talk about an identifier denoting a memory location (being a variable), that's the symbol; we're past the issue of what kinds of tokens can produce the symbol in the textual source code.

    The An Introduction to Scheme and Its Implementation tutorial linked to in the question is using its own peculiar definition of identifier which is at odds with the Scheme language. It implies that identifiers are "variable names, and special form names and keywords" (so that symbols which are not variable names are not identifiers, which is not supported by the specification).