Search code examples
data-modelingdata-warehousedimensional-modeling

Use of primary key in the DWH / Dimensional Model beyond to search functionality


I work currently in a DWH where the OLTP Database deliver the Data, and I ve noticed that from the input layer to the core of the DWH.

we make the following transformation

  • we add a Pk and we normalize the Data in 1NF
  • We add a historic Column.

For example

Table client:

    id_client is the primary key

    id_client |client | Date     |Subsidary ..
  -------------------------------------------------
     000001   |vha57  | 19-08-18 | Detutsche Bank

The interesting thing is that we dont use the primary key(Client ID) to retrun a value, in fact i never see it used.

If you want to search a row you use the client and Date

     select  colum1,column2 from client where client = 'vha57' and Date = '19-08-18'

I wonder why we have to add a primary key if they dont have a search use. Does PK have another use apart from to localizated a specific row?


Solution

  • A primary key is a data modeling concept that enables you to uniquely identify a specific row in a table, e.g. setting the "userId" column as a primary key on the "users" table guarantees that there is only one row/user with userId = X.

    In practical terms, in most database systems a primary key is a referential integrity constraint that enforces unique values for the column that is set as a key, and hence guarantees that the each value of the key column uniquely identifies one specific row of that table.

    So, in that case, it seems that the primary key is used exactly for its purpose: to guarantee that each user has a unique id, and can thus be identified by it.

    Also, a primary key is only "useful for search" because in most database systems it is implemented by using a (unique) index over the column that is set as primary key. Whenever you use that column in a where clause, or use the column in a join, the index is used to locate the specific rows that match your query, instead of reading through the whole table. This makes your search/join faster than if you use a column that is not indexed.

    Edit: If you are only considering the "useful for search" scenario, then the benefit of having a primary key defined (e.g. on dimensions) is that its index, which is created automatically in most RDBMS, will be used to optimize and accelerate joins between fact and dimension tables (just like in a relational model). Even if you don't use it explicitly in queries, the primary key's index still improves the speed of searches/queries when joining with other tables.