Search code examples
data-structuresprogramming-languagesdeclarative

Generic Data Structure Description Language


I am wondering whether there exists any declarative language for arbitrarily describing the format and semantics of a data structure, that can be compiled to a specific implementation of that structure in any of a set of target languages. That is, something like a generic data definition language but geared toward describing arbitrary data structures such as vectors, lists, trees, etc., and the semantics of operations on those structures. I ask because I had an idea for a feasible implementation of this concept, and I'm just wondering whether it's worth it, and, consequently, whether it's been done before.

Another, slightly more abstract question: is there any real difference between the normative specification of a data structure (what it does) and its implementation (how it does it)? More specifically, should separate implementations of the same requirements be considered different structures?


Solution

  • Cozy is “a tool that synthesizes data structure implementations from very high-level specifications” and seems to be essentially the language I was actually looking for (or considering writing) when I asked this question.

    It can automatically generate an implementation (in Java or C++, at the time of this writing) from a data structure specification written in its proprietary language. A specification describes the abstract state, update operations, and query operations of a data structure, as well as invariants that must be maintained and assumptions that can be used by the solver to optimise the implementation. For example, here is a partial specification for a graph data structure:

    Graph:
        handletype Node = { id : Int }
        handletype Edge = { src : Int, dst : Int }
    
        state nodes : Bag<Node>
        state edges : Bag<Edge>
    
        // Invariant: disallow self-edges.
        invariant (sum [ 1 | e <- edges, e.val.src == e.val.dst ]) == 0;
    
        op addNode(n : Node)
            nodes.add(n);
    
        op addEdge(e : Edge)
            assume e.val.src != e.val.dst;
            edges.add(e);
    
        query out_degree(nodeId : Int)
            sum [ 1 | e <- edges, e.val.src == nodeId ]
    
        // …
    

    Its implementation is described in Fast Synthesis of Fast Collections and Generalized Data Structure Synthesis by Calvin Loncaric, Emina Torlak, and Michael D. Ernst.