Search code examples
prologentity-relationship

Entity-Relation data model in Prolog


I have a question about how to model an entity-relation data model in Prolog.

If I have two entities like these (expressed in plantuml form https://plantuml.com/ie-diagram):

entity A {
  * a_id : number
  --
  a_data : text
}

entity B {
  * b_id : number
  --
  * a_id : number
  b_data : text

B }o--|| A

For example:

a(123, 'data1').       % a(a_id, a_data).
b(456, 123, 'data2').  % b(b_id, a_id, b_data)

This way forces to me to generate identifiers for every new fact.

Other way:

a(123, 'data1').
b(456, a(123, 'data1'), 'data2').

In this way, if I change a(123, _) a_data value, b relation integrity is lost.

Another way:

a(123, 'data1').
b(456, a(123, _), 'data2').

What is the correct way of representation of this facts?

Regards.


Solution

  • I would certainly use

    a(123, 'data1').       % a(a_id, a_data).
    b(456, 123, 'data2').  % b(b_id, a_id, b_data)
    

    There is no shame in having to name things ("generate new identifiers"), and you can do so easily, possibly by just by generating a new UUID randomly. I hear the modern way of modelling some structure in a relational database is to always add identifiers, even if you don't need them so as to avoid later practical problems when one wants to perform extensions and migrations. So we are in good company with the naming approach.

    a(123, 'data1').
    b(456, a(123, 'data1'), 'data2').
    

    is not recommended because it copies the data for no gain, neither speed nor clarity. If you drop the fact a(123, 'data1'). you have a tree structure rooted in b/3 which may be adequate if the a/2 subtree is small or varies a lot (for example one would not have specific facts to store 2x2 matrixes in relation a, then reference them by name from a relation b; one would put any matrix directly into the relation b).

    a(123, 'data1').
    b(456, a(123, _), 'data2').
    

    makes little sense as you just have a mysterious unbound variable in the a/2 tree inside the b/3 tree, which just occupies memory and you can't even use it.

    P.S.

    One of the peculiarities of the relational model is the assumption that entities can "change over time", while still retaining some kind of "identity" (i.e. there is that suspect UPDATE operation). The relational database extant at time t is top layer of the database log. This idea seems rooted in the lack of sufficiently large disks back in the 70s. You may want to consider a model where you don't do destructive updates but add new time-stamped facts that deprecate old ones (best, transactionally). Datomic (JVM/Clojure only) is a realization of that idea. Prolog doesn't seem to have much out-of-the-box support for this idea.