database ms-access database-design relationship database-normalization

Denormalization and reducing redundancy in multiple relationships among four database tables

I'm helping a social scientist analyze the results of his research. He's recorded hundreds of interviews and other meetings and collected related documents. He has entered data on this work into four tables in MS Access:

+---------+-----------+-----------------------------------------------------+
|  Table  |    Key    |                  Example contents                   |
+---------+-----------+-----------------------------------------------------+
| cases   | ID_Case   | Description, case data                              |
| persons | ID_Person | Name, contact data                                  |
| events  | ID_Event  | Date, time, location of interview, meeting, etc.    |
| files   | ID_File   | Path, filename, date of recordings, documents, etc. |
+---------+-----------+-----------------------------------------------------+

We can call these the "base" tables.

To represent relationships among these tables, he first tried a set of six tables linked to the original four on their keys:

+--------------+---------------------+
|    Table     |         Key         |
+--------------+---------------------+
| CasePersons  | ID_Case, ID_Person  |
| CaseEvents   | ID_Case, ID_Event   |
| CaseFiles    | ID_Case, ID_File    |
| EventPersons | ID_Event, ID_Person |
| EventFiles   | ID_Event, ID_File   |
| FilePersons  | ID_File, ID_Person  |
+--------------+---------------------+

We can call these the "binary relationship" tables.

Example Scenario (Imaginary)

Here is an imaginary scenario to demonstrate an example of his data structure. The scenario involves three cases, seven people, three events, and five files.

Let's say he conducted two interviews with people about their love lives. One was an interview with four people: George, Sally, Henry, and Elliot. The second interview was with two people, John and Liz. The first interview was recorded on video and the second just in audio. A portion of the first interview was conducted and separately recorded with just Sally and Henry. Although Liz was at the second interview, she didn't speak and so is not in the recording. In the first interview, George shared a copy of a love letter to Sally, which was scanned into a pdf file. Finally, the third event was a phone meeting with a colleague, which was about technique, and no specific cases were discussed.

Data from this imaginary scenario are shown in the tables below. Next to the name of each table, I give the number of records in the actual table in the database.

Four Base Tables

Table [cases] (386 records)

+---------+---------------------------------+
| ID_Case |           Description           |
+---------+---------------------------------+
|       1 | A husband and wife are in love. |
|       2 | A man and woman are in love.    |
|       3 | A man has never been in love.   |
+---------+---------------------------------+

Table [persons] (1,472 records)

+-----------+-----------+----------+
| ID_Person | NameFirst | NameLast |
+-----------+-----------+----------+
|         1 | George    | Brown    |
|         2 | Sally     | White    |
|         3 | Henry     | Green    |
|         4 | John      | Baker    |
|         5 | Liz       | Jones    |
|         6 | Elliot    | Brooks   |
|         7 | Catherine | Drake    |
+-----------+-----------+----------+

Table [events] (526 records)

+----------+------------+---------------+
| ID_Event |    Date    |   Location    |
+----------+------------+---------------+
|        1 | 2016 06 01 | 123 Main St.  |
|        2 | 2016 07 02 | 456 Block Rd. |
|        3 | 2016 08 03 | Phone         |
+----------+------------+---------------+

Table [files] (1,748 records)

+---------+---------------------------+------+
| ID_File |         Filename          | Type |
+---------+---------------------------+------+
|       1 | Brown, Brooks interview   | avi  |
|       2 | White, Green subinterview | avi  |
|       3 | Brown letter              | pdf  |
|       4 | Baker interview           | wav  |
|       5 | Drake meeting             | wav  |
+---------+---------------------------+------+

Six Binary Relationship Tables

In the listings of binary relationship tables, I replace the keys for persons, events, and files with related text to make them easier to read. MS Access allows the same behavior by displaying a table field according to a query. Similarly, the roles shown in some of the tables are text displays of numeric foreign keys to separate tables of allowed roles.

Table [CasePersons] (720 records)

+---------+-----------+---------------+
| ID_Case | ID_Person |     Role      |
+---------+-----------+---------------+
|       1 | George    | Husband       |
|       1 | Sally     | Wife          |
|       1 | Henry     | Wife's friend |
|       2 | John      | Boyfriend     |
|       2 | Liz       | Girlfriend    |
|       3 | Elliot    | Individual    |
+---------+-----------+---------------+

Table [CaseEvents] (299 records)

+---------+------------+
| ID_Case |  ID_Event  |
+---------+------------+
|       1 | 2016 06 01 |
|       2 | 2016 07 02 |
|       3 | 2016 06 01 |
+---------+------------+

Table [CaseFiles] (301 records)

+---------+---------------------------+
| ID_Case |          ID_File          |
+---------+---------------------------+
|       1 | Brown, Brooks interview   |
|       1 | White, Green subinterview |
|       1 | Brown letter              |
|       2 | Baker interview           |
|       3 | Brown, Brooks interview   |
+---------+---------------------------+

Table [EventPersons] (700 records)

+------------+-----------+-------------+
|  ID_Event  | ID_Person |    Role     |
+------------+-----------+-------------+
| 2016 06 01 | George    | Interviewed |
| 2016 06 01 | Sally     | Interviewed |
| 2016 06 01 | Henry     | Interviewed |
| 2016 06 01 | Elliot    | Interviewed |
| 2016 07 02 | John      | Interviewed |
| 2016 07 02 | Liz       | Present     |
| 2016 08 03 | Catherine | Present     |
+------------+-----------+-------------+

Table [EventFiles] (1,490 records)

+------------+---------------------------+-----------+
|  ID_Event  |          ID_File          |   Role    |
+------------+---------------------------+-----------+
| 2016 06 01 | Brown, Brooks interview   | Recording |
| 2016 06 01 | White, Green subinterview | Recording |
| 2016 06 01 | Brown letter              | Received  |
| 2016 07 02 | Baker interview           | Recording |
| 2016 08 03 | Drake meeting             | Recording |
+------------+---------------------------+-----------+

Table [FilePersons] (2,392 records)

+---------------------------+-----------+-------------+
|          ID_File          | ID_Person |    Role     |
+---------------------------+-----------+-------------+
| Brown, Brooks interview   | George    | Interviewed |
| Brown, Brooks interview   | Sally     | Interviewed |
| Brown, Brooks interview   | Henry     | Interviewed |
| Brown, Brooks interview   | Elliot    | Interviewed |
| Brown letter              | George    | Writer      |
| Brown letter              | Sally     | Subject     |
| White, Green subinterview | Sally     | Interviewed |
| White, Green subinterview | Henry     | Interviewed |
| Baker interview           | John      | Interviewed |
| Drake meeting             | Catherine | Present     |
+---------------------------+-----------+-------------+

The problem is that there's a lot of redundancy in the data entered in all these tables. With hundreds of interviews, it's easy to make mistakes, either by entering some data wrong or forgetting to enter all the data in all six binary relationship tables. Once the data are entered, it's hard to check them for errors. Further, redundancies make it harder to find meaningful patterns in the data.

Single Four-Way Relationship Table

I've been trying to help him figure out how to represent these data in a more manageable fashion. One idea I've been exploring is to combine the six binary relationship tables into a single table with the four key fields: ID_Case, ID_Person, ID_Event, and ID_File. The above data then become:

Table [CasePersonEventFiles] (??? records)

+---------+-----------+------------+---------------------------+-----------------+---------------+-------------+-------------+
| ID_Case | ID_Person |  ID_Event  |          ID_File          |     CP_Role     |    EP_Role    |   EF_Role   |   FP_Role   |
+---------+-----------+------------+---------------------------+-----------------+---------------+-------------+-------------+
|       1 | George    | 2016 06 01 | Brown, Brooks interview   | Husband         | Interviewed   | Recording   | Interviewed |
|       1 | Sally     | 2016 06 01 | Brown, Brooks interview   | Wife            | Interviewed   | * Recording | Interviewed |
|       1 | Henry     | 2016 06 01 | Brown, Brooks interview   | Wife's friend   | Interviewed   | * Recording | Interviewed |
|       1 | Sally     | 2016 06 01 | White, Green subinterview | * Wife          | * Interviewed | Recording   | Interviewed |
|       1 | Henry     | 2016 06 01 | White, Green subinterview | * Wife's friend | * Interviewed | * Recording | Interviewed |
|       1 | George    | 2016 06 01 | Brown letter              | * Husband       | * Interviewed | Received    | Writer      |
|       1 | Sally     | 2016 06 01 | Brown letter              | * Wife          | * Interviewed | * Received  | Subject     |
|       2 | John      | 2016 07 02 | Baker interview           | Boyfriend       | Interviewed   | Recording   | Interviewed |
|       2 | Liz       | 2016 07 02 | 0                         | Girlfriend      | Present       |             |             |
|       3 | Elliot    | 2016 06 01 | Brown, Brooks interview   | Individual      | Interviewed   | * Recording | Interviewed |
|       0 | Catherine | 2016 08 03 | Drake meeting             |                 | Present       | Recording   | Present     |
+---------+-----------+------------+---------------------------+-----------------+---------------+-------------+-------------+

Obviously, this is much cleaner and more compact. The number of data rows have been reduced from 36 in six tables to eleven in one. Reduction in the number of data elements (non-null cells) is less dramatic, from 100 to 85. Related data are in close proximity that makes it easier to avoid errors, or to see any that exist.

In this table, zeros are used instead of nulls in order to allow the first four fields to form a primary key, ensuring uniqueness across those four fields. In the field names, "CP_Role", for example, means "Case-person role", i.e., the role of the person in the case.

Redundancy is reduced, but not eliminated. Thirteen data elements that are redundant with others are marked in this example with asterisks. Such redundancies are a pernicious invitation to data-entry error. However, it is fairly straightforward to automatically check for such errors in this table.

The big problem in implementing this four-way relationship table at this stage is in collapsing the data into it from the six binary tables. It's easy to create a table with all the necessary fields and then append all the rows of the six tables into it with zero values for absent keys. This makes a combined table of 5,902 records. If the numbers in the example scenario above can be extrapolated to the real data, then the number of records can be reduced to about 1,800, meaning over 4,000 records can be removed! I'm currently studying patterns in the combined table, seeking ways to automate the merging and removal of groups of superfluous rows. It's very slow-going.

Worse than the amount of time it's taking to make this new table is that I'm not fully confident that the result will be the optimal representation of the data.

The Question

Am I missing something important in this approach? Is there a more intelligent way to manage this collection of data? Does database theory provide a better way to represent these relationships?

Solution

You can find how to structure the database in any database book. for instance I came up with:

Next we need some data entry forms. the data entry forms are designed to take the case data and sort it into the appropriate database tables. So the data enterer does not need to know about the structure of the database and only has to enter the actual data. Information on making data entry forms for 1 to many and many to many forms is hard to find. Here is a link:

create form to add records in multiple tables

here is a data entry form and eventual subform I made for this data:

the 1 side of a 1 to m relationship is the form and the m side is the subform. Here I dragged a file/persons form created with the wizard onto a files form that was also created using the wizard. to be able to represent the second 1 to m relationship we replace the textbox on the form which holds the foreign key with a combo box. Here ID_Person has been replaced with a combo box so the data enterer doesn't have to know anything about ID_person but just selects the persons name instead. The result is the data enterer doesn't need to know anything about the keys underlying relationships in the database and can just enter the case data.