Search code examples
databasedatabase-designdbtlooker

Whats the best way to design an event table where different event items are provided each time the event updated


I have a raw event table that I'm working with. It has two columns date and metadata. Metadata has raw json dump of all the event attributes. But I'd like to make it clear that each time different attributes are sent. And then I need to ingest this data into looker.

Raw table:

timestamp metadata
2024-04-1 {"type":"created","title":"test1","due":"2024-04-02","id":12345}
2024-04-1 {"type":"confirmed","id":12345}
2024-04-1 {"type":"completed","id":12345, "completedby":"johndoe"}

Now I need to normalize it

OPTION A :

timestamp type title due_date id completed_by
2024-04-1 created test1 2024-04-02 12345
2024-04-1 confirmed 12345
2024-04-1 completed johndoe

OPTION B:

timestamp type title due_date id completed_by
2024-04-1 created test1 2024-04-02 12345 johndoe
2024-04-1 confirmed test1 2024-04-02 12345 johndoe
2024-04-1 completed test1 2024-04-02 12345 johndoe

How should I design the table, should I fill all the rows with their respective information (Option B) or should I leave them as nulls (Option A)?


Solution

  • Option B is what I would aim for, specially because it holds relevant information for all rows, which would make it easier for reporting purposes.

    Also, given that you want to treat the different actions (created, confirmed, completed) as "sub-events", one thing you could do on top of option B is to also add a surrogate key based on the id, the type and the timestamp. See an example below using the dbt_utils package:

    {{ dbt_utils.generate_surrogate_key(['id', 'type', 'timestamp']) }} as sub_event

    This way, you can have a primary key in your model to define the granularity, as well as e.g. easily count "sub-events" for each user.