Search code examples
sqlsqlitedatabase-designmany-to-manydatabase-normalization

Normalising a tags table in a database for SQLite


So I'm learning how to create a database using SQLite, and I'm struggling in trying to separate the string with comma separated values in vendor tag and vendor tagname and then align the tag numbers with the tag names.

where lets say the vendorid is incremented from 1,2,3... etc for each row. Here is a schema

CREATE TABLE unnormalized(
  vendor_tag varchar(200),
  vendor_tag_name varchar(200),
  vendor_id int
  );
  
 INSERT INTO unnormalized
VALUES
('5,8,30,24','Burgers,Desserts,Fries,Salads',1),
('5','Burgers',2),
('8,42','Desserts,Mexican',3),
('1,5,30,16','American,Burgers,Fries,Sandwiches',4),
('1,5,30,16','American,Burgers,Fries,Sandwiches',5);

enter image description here


Solution

  • The column names in your picture aren't descriptive of the column contents.

    First, I would define a Vendor table:

    Vendor
    ------
    VendorID
    VendorName
    ...
    

    where VendorID is an auto-incrementing integer blind primary (clustering) key and VendorName is the name of the vendor. Additional Vendor attributes would belong in this table.

    Edited to add: You don't have any vendors in your picture, so let's make a couple of vendors up.

    Vendor table
    
    VendorID | VendorName
    -------- | ----------
           1 | McDonald's
           2 | IHOP
    

    Next, I would define a Tag table.

    Tag
    ---
    TagID
    TagName
    

    where TagID is an auto-incrementing integer blind primary (clustering) key and TagName is the name of the tag.

    Edited to add: Here's what the contents of the Tag table look like.

    Tag table
    
    TagID | TagName
    ----- | -------
        1 | American
        2 | Arabic
        3 | ?
        4 | Breakfast
        5 | Burgers
      ...
    

    You have one row for each tag.

    Finally, since the relationship between Vendor and Tag is many to many, I would define a VendorTag junction table.

    VendorTag
    ---------
    VendorTagID
    VendorID
    TagID
    Created Timestamp
    

    where VendorTagID is an auto-incrementing integer blind primary (clustering) key, VendorID is a foreign key back to the Vendor table, and TagID is a foreign key back to the Tag table.

    You would also define a unique index on (VendorID, TagID) so you can retrieve the tags for a vendor fairly easily. If you also want to retrieve the vendors for a tag, you can create another unique index on (TagID, VendorID).

    Edited to add: Here's what some rows of the VendorTag table might look like.

    VendorTag table
    
    VendorTagID | VendorID | TagID | Created Timestamp
    ----------- | -------- | ----- | -------------------
              1 |        1 |     4 | 2020-12-27 16:44:10
              2 |        1 |     8 | 2020-12-27 16:44:10
              3 |        1 |    10 | 2020-12-27 16:44:11
              4 |        1 |    91 | 2020-12-27 16:44:11
    

    Every column of every table contains one and only one value. That's one of the points of database normalization. No duplicate information in the database is another point of database normalization.