Search code examples
typescripttypeslisps-expression

Is it possible to implement a simple typed lisp in the TypeScript type system?


I'm attempting to implement a simple typed lisp in the TypeScript type system, drawing inspiration from JSON Logic, but with types. Here is an example of what that looks like:

/**
 * Given a context object C, this type will allow you to construct
 * expressions that can reference that context and perform operations
 * on the data therein.
 *
 * Not shown below are the definitions for 
 *   JSONData: A simple typed representation of JSON non-object) in a 
 *   JSONPath: A union type of a terminal paths (i.e. 
 *             given JSON object.
 *   JSONPathValueType: The value type for a given path in a JSONObject
 */

type Expr<A extends JSONData, C extends JSONData> = A extends number
  ?
      | number
      | Ref<number, C>
      | ['+', ...Expr<number, C>[]]
      | ['first', Expr<number[], C>]
  : A extends JSONObject
  ? JSONObject | Ref<JSONObject, C>
  : A extends JSONArray<infer T>
  ?
      | T[]
      | Ref<T[], C>
      | ['map', Expr<JSONArray, C>, Expr<T, C>]
  : never;


type Ref<T, C extends JSONData> = {
  [P in JSONPath<C>]: JSONPathValueType<P, C> extends T ? `$.${P}` : never;
}[JSONPath<C>];

Given the above definition, the following are all valid typed expressions:

type Context = { foo: 1; bar: { baz: 2; bin: [5]; buz: [{ fiz: 3 }] } };
type SampleNumberExpr = Expr<number, Context>;

const two: SampleNumberExpr = ['+', 1, 1];
const three: SampleNumberExpr = ['+', '$.foo', '$.bar.baz'];
const four: SampleNumberExpr = ['first', [4, 5, 6]];
const five: SampleNumberExpr = ['+', 1, ['first', [4, 5, 6]]];
const six: SampleNumberExpr = ['+', '$.foo', ['first', '$.bar.bin']];

However I start to get into trouble when I try to implement and use the map functionality. Ideally I would be able to use the same Ref $.* notation in the context of an expression to access each item within the provided list:

const doesNotWork: SampleNumberExpr = 
  ['first', ['map', '$.bar.buz', ['+', '$.item.fiz', 1]]]

Yet every attempt I have made to implement this functionality is met with either type errors or the Type instantiation is excessively deep and possibly infinite. compiler error.

Specifically, trying to create a new context for the map functionality to be able to Ref, where the $.item key matches the type of each element of the inputted Array Expr is where I'm stuck. Obviously this is all beyond where the TS type system is really intended to go, but I have this intuition that it should be possible – I would love any help in finding out if that is, in fact, the case.

EDIT For those who want to play around with this, here is a full playground example


Solution

  • Answering this question and providing more context in the hopes that this helps people who have also been struggling with this type of issue. I will try to also address the critiques raised by @jcalz in the comments section of the original question.

    First, some context

    I'm currently working on a rules engine for automating the execution of tasks in an event-driven AWS cloud backend. The goal is for a user to be able to specify an abstract definition of what they would like to be done in response to a certain event taking place, storing that definition in a database, and then executing a concrete action when an instance of the event(s) in question take place.

    EXAMPLE: A user registers on the website and we send them a series of emails at intervals defined by the user (e.g. 3, 5, and 7 days after – stopping the automation if and when they respond).

    Given these constraints, I needed to design a rules DSL for defining these automations in the abstract. Furthermore, given the fact that the automation definition is created by the user in a UI and stored in a database, it must be encodable using pure JSON. Finally, to protect against programmer error when constructing these expressions programmatically, I would like this DSL to be strictly typed for arbitrarily complex expression structures.

    A Naive Approach

    Having had some experience working in Clojure/Scheme in the past, and given my experiences designing a number of DSLs / rules engines over the years, I generally have found that the S-expressions used in Lisps are the most flexible and easy way to implement such a language.

    The end result might have the following appearance:

    type Event = {
      timestampMillis: number;
      data: JSONObject;
    };
    
    type Context = Readonly<{
      priorEvents: Event[];
      currentEvent: Event
    }>
    
    type Expr<ReturnType extends JSONData, Context extends JSONObject> = ???
    
    const DAY_MILLIS = 86400000;
    
    const isThreeDaysAfterRegistration: Expr<boolean, Context> = 
      { ">=": 
        { "ref": "$.currentEvent.timestampMillis" }, 
        { "+": [
          { "*": [ 3, DAY_MILLIS ] },
          { "first", { "map": {
            "$collection": { "ref": "$.priorEvents" }, 
            "$fn": { "ref": "$.item.timestampMillis" }
        ]}
      } as const;
    

    For simple constructions, this approach works well-enough. Basic boolean algebra, arithmetic, and string manipulation are all straightforward and easy enough to implement:

    type Expr<ReturnType extends JSONData, Context extends JSONObject> = 
      ReturnType extends boolean ?
        | { "or": readonly Expr<boolean, Context>[] }
        | { "and": readonly Expr<boolean, Context>[] }
        // ...
      : ReturnType extends number ?
        | { "+": readonly Expr<number, Context>[] }
        | { "-": readonly [Expr<number, Context>, ...Expr<number, Context>[]] }
        // ...
      : never;
    

    However, we run into issues when we start dealing with higher-kinded types and recursion – namely with JSONData[] and { readonly [S in string]: JSONData }, i.e. JSON arrays and nested objects.

    Big Trouble in (Little) Generics

    Consider the following simple formulation: I would like to access and return the timestampMillis value from the first Event in the priorEvents array.

    We can structure a small set of expression structures to attempt to encode this:

    // Reference a value from the context
    type Ref<T, C extends JSONData> = {
      [P in JSONPath<C>]: JSONPathValueType<P, C> extends T ? `$.${P}` : never;
    }[JSONPath<C>];
    
    type Expr<ReturnType extends JSONData, Context extends JSONObject> = 
      ReturnType extends number ?
        { "get": any extends Expr<infer B, Context> ? B extends JSONObject ? {
            "obj": Expr<B, Context>;
            "path": Ref<ReturnType, B>;
          } : never : never;
        }
        // ...
      : ReturnType extends JSONData[] ? 
        { "ref": Ref<ReturnType, Context> }
      : ReturnType extends JSONObject ? 
        { "first": Expr<ReturnType[], Context> }
      : never;
    
    // Compilation Error: 
    // Type 'string' is not assignable to type 'never'.
    //
    // The expected type comes from property 'path' which is declared here 
    // on type '{ obj: { first: { ref: "$.priorEvents"; }; }; path: never; }'
    const firstTimestamp: Expr<number, Context> = {
      get: {
        obj: { first: { ref: '$.priorEvents' } },
        path: '$.timestampMillis',
      },
    };
    

    It turns out that the best the Typescript compiler can do with respect to the path variable is to tell you that it is a string. Since the type of path is a function of the inferred generic type B, and the type of B can only be narrowed down to the concrete type JSONObject, we cannot deduce anything about the concrete path strings which would return the subset of nested values of type number (in this case). Moreover, the use of infer above is entirely unnecessary since its use is merely a failed attempt to conjure a Rank-2 generic in a solely Rank-1 type system.

    A Rant About Ranks

    So that last paragraph was unhelpfully dense and technical, so let's take a break and discuss what is actually attempting to be accomplished and then work into a discussion about Rank-N Polymorphism and how we can use an understanding of its mechanics to hopefully get us most of the way to where we want to go.

    Looking at our failing expression from before, the culprit behind our compilation error is the serialization of the get expression. At a high level, this expression represents a trivially simple operation: namely, that of accessing a value on an object given its path. As an added bonus, this get function also allows us to limit ourselves to

    1. The finite set of strings which represent valid paths in a given object.
    2. The subset of the set of valid paths which contain a value matching the given ReturnType.

    The second constraint is of particular importance, since it allows us to actually type the get expression in a concrete way; e.g. if I want to say that firstTimestamp is an expression of type number

    const firstTimestamp: Expr<number, Context> = {
      get: {
        obj: { first: { ref: '$.priorEvents' } },
        path: '$.timestampMillis',
      },
    };
    

    I want to be sure that I constrain the values of path to only those paths within obj which return a value of type number.

    Unfortunately, the two constraints listed above are exactly where we get into trouble. At their core, they are making the statement that some unknown and arbitrary type B, a subtype of JSONObject, can be derived from the value of obj at the site of instantiation for a concrete Expr and further used to derive the set of valid path strings. This dependent formulation of generic typing is called Rank-2 Polymorphism (see also this helpful explainer from the Swift community).

    I will skip the highly technical explanations here, as there are others who have done it far more justice than I could ever hope to. Suffice it to say that the TypeScript type system does not fully support beyond Rank-1 polymorphism within type definitions. To illustrate what this means, consider the following practical example:

    You wish to write a function which takes two generic types and produces two arrays containing those types. More importantly, you want to allow the caller to specify the function that produces the arrays for the two inputted generic types. A first attempt in TypeScript might look something like this:

    function generateArrays<A, B, C>(
      arrayBuilder: (a: A) => A[],
      first: B,
      second: C,
    ): [B[], C[]] {
      return [arrayBuilder(first), arrayBuilder(second)];
    }
    

    However running this through the TS compiler will yield a warning that type A in your arrayBuilder function does not match types B or C. Thankfully, TypeScript (unlike many other languages), provides us with a mechanism to get around this: namely a polymorphic function type:

    function generateArrays<B, C>(
      arrayBuilder: <A>(a: A) => A[], // Conjuring a rank-2 generic type 
      first: B,
      second: C,
    ): [B[], C[]] {
      return [arrayBuilder(first), arrayBuilder(second)];
    }
    

    "What wonderful news!" you say, "now all I need to do is use this to help me define something equivalent for the get expression type." Sadly this sorcery is not available to us in with the strict realm of types. The attempt to use infer B in the previous formulation of is insufficient to derive any useful information about the shape of the inputted data at the instantiation site for a concrete Expr instance.

    Conclusion: A Dream of S-Expressions

    Thankfully all is not lost! We can use what we have just learned about the special functions to define and use generics to define a set of functions (@caTS smartly intuited this in their comment) which will help us build this data DSL while still maintaining all of the type safety we want to keep ourselves honest while building complex rule sets. With allowances made for some unsafe type-casting, this solution should accomplish all of our stated goals with a clean and readable syntax using function composition (squint and it'll look like S-expressions):

    /*
     * Working implementation of prior example
     */
    const { get, ref, first } = new Expression<Context>();
    
    const firstTimestamp: Expr<number, Context> = get(
      'timestampMillis',
      first(ref<Event[]>('$.priorEvents'))
    );
    
    
    /*
     * Encoding the DSL types and factory functions
     */
    type Expr<
      Type extends Readonly<JSONData>,
      Context extends Readonly<JSONObject> = Readonly<Record<string, never>>
    > = Type | Ref<Type, Context>;
    
    // Utility namespace for all Expr factory functions
    class Expression<C extends Readonly<JSONObject>> {
      get<
        O extends JSONObject,
        K extends JSONPath<O>,
        V extends JSONPathValueType<K, O>
      >(key: K, obj: Expr<O, C>): Expr<V, C> {
        return {
          '(get)': {
            '(obj)': obj,
            '(key)': key,
          },
        } as const as unknown as Expr<V, C>;
      }
    
      first<T extends JSONData>(list: Expr<T[], C>): Expr<T, C> {
        return {
          '(first)': list,
        } as const as unknown as Expr<T, C>;
      }
    
      ref<T extends JSONData>(ref: Ref<T, C>): Expr<T, C> {
        return ref;
      }
    }