sql database database-design foreign-keys database-normalization

Cycle in ER diagram: Bad design? BCNF?

I want to model linguistic syntax trees in my database. In my design, a syntax tree consists of inner nodes and terminal nodes; both types of nodes are weak entity types that are identified by the syntax tree they belong to. Syntax trees themselves are also weak entities; they are identified by the sentence whose syntactic structure they are to represent. A given sentence may have more than one syntax tree. A terminal node can be associated (1:m-relation) with a token from the sentence its syntax tree is identified by. Tokens, too, are weak entities that are identified by the sentence they belong to.

So overall, after translating my ER diagram into relation schemas some of my relation schemas look like this (simplified):

sentence(sentenceID)

syntaxTree(sentenceID, treeDiscr)

token(sentenceID, tokenDiscr)

terminalNode(sentenceIDFromTree, treeDiscr, nodeID, sentenceIDFromToken, tokenDiscr)

As you can see, the relation schema of terminalNode contains two foreign keys that each reference a sentenceID: One holds the ID of the sentence the syntax tree represents, and one holds the ID of the sentence the token belongs to. Of course, the values of these sentenceIDs always need to be the same. Thus, for terminalNode the following two functional dependencies hold:

sentenceIDFromTree -> sentenceIDFromToken

sentenceIDFromToken-> sentenceIDFromTree

For neither of these dependencies does the right side correspond to a subset of the left side, i.e. they are not trivial; nor does their left side correspond to a superkey -- i.e. the relation schema is not in BCNF. In order to make the relation schema be in BCNF, I thought that I could merge sentenceIDFromTree and sentenceIDFromToken into one attribute; after all, the two attributes always need to have the same values anyway. As far as I understand, in SQL I cannot make one foreign key reference several tables at the same time directly, so I would probably have to work with some additional trigger(s) to make sure that both the syntaxTree and the token table get referenced by the single sentenceID. As I do not have much experience with database design and have admittedly no idea what is considered 'good DB design', I wonder now whether I should really merge the two foreign keys, or leave my design as it is (or maybe there are further ways to deal with this?). Additionally, I fear that the whole problem might indicate that there is something really wrong about my conceptual design, but I have no idea how I could model this scenario differently. Any help will be greatly appreciated.

Solution

You wrote:

in SQL I cannot make one foreign key reference several tables at the same time directly

There is some imprecision here which is the source of your problem. A foreign key can only reference one table. However a single table column can be part of one or more foreign keys.

Therefore, the correct design is:

only have one sentenceID column in terminalNode
have two foreign keys, both of which include the sentenceID

This overlapping ensures that the two parents are correct in relation to each other.