Search code examples
mysqlnormalization

Normalizing MySQL data


I'm new to MySQL, and just learned about the importance of data normalization. My database has a simple structure:

I have 1 table called users with fields:

userName (string)
userEmail (string)
password (string)
requests (an array of dictionaries in JSON string format)
data (another array of dictionaries in JSON string format)
deviceID (string)

Right now, this is my structure. Being very new to MySQL, I'm really not seeing why my above structure is a bad idea? Why would I need to normalize this and make separate tables? That's the first question-why? (Some have also said not to put JSON in my table. Why or why not?)

The second question is how? With the above structure, how many tables should I have, and what would be in each table?

Edit: So maybe normalization is not absolutely necessary here, but maybe there's a better way to implement my data field? The data field is an array of dictionaries: each dictionary is just a note item with a few keys (title, author, date, body). So what I do now is, which I think might be inefficient, every time a user composes a new note, I send that note from my app to PHP to handle. I get the JSON array of dictionaries already part of that user's data, I convert it to a PHP array, I then add to the end of this array the new note, convert the whole thing back to JSON, and put it back in the table as an array of dictionaries. And this process is repeated every time a new note is composed. Is there a better way to do this? Maybe a user's data should be a table, with each row being a note-but I'm not really sure how this would work?


Solution

  • The answer to all your questions really depends on what the JSON data is for, and whether you'll ever need to use some property of that data to determine which rows are returned.

    If your data truly has no schema, and you're really just using it to store data that will be used by an application that knows how to retrieve the correct row by some other criteria (such as one of the other fields) every time, there's no reason to store it as anything other than exactly as that application expects it (in this case, JSON).

    If the JSON data DOES contain some structure that is the same for all entries, and if it's useful to query this data directly from the database, you would want to create one or more tables (or maybe just some more fields) to hold this data.

    As a practical example of this, if the data fields contains JSON enumerating services for that user in an array, and each service has a unique id, type, and price, you might want a separate table with the following fields (using your own naming conventions):

    serviceId (integer)
    userName (string)
    serviceType (string)
    servicePrice (float)
    

    And each service for that user would get it's own entry. You could then query for users than have a particular service, which depending on your needs, could be very useful. In addition to easy querying, indexing certain fields of the separate tables can also make for very QUICK queries.

    Update: Based on your explanation of the data stored, and the way you use it, you probably do want it normalized. Something like the following:

    # user table
    userId (integer, auto-incrementing)
    userName (string)
    userEmail (string)
    password (string)
    deviceID (string)
    
    # note table
    noteId (integer, auto-incrementing)
    userId (integer, matches user.userId)
    noteTime (datetime)
    noteData (string, possibly split into separate fields depending on content, such as subject, etC)
    
    # request table
    requestId (integer, auto-incrementing)
    userId (integer, matches user.userId)
    requestTime (datetime)
    requestData (string, again split as needed)
    

    You could then query like so:

    # Get a user
    SELECT * FROM user WHERE userId = '123';
    SELECT * FROM user WHERE userNAme = 'foo';
    
    # Get all requests for a user
    SELECT * FROM request WHERE userId = '123';
    # Get a single request
    SELECT * FROM request WHERE requestId = '325325';
    
    # Get all notes for a user
    SELECT * FROM note WHERE userId = '123';
    # Get all notes from last week
    SELECT * FROM note WHERE userId = '123' AND noteTime > CURDATE() - INTERVAL 1 WEEK;
    
    # Add a note to user 123
    INSERT INTO note (noteId, userId, noteData) VALUES (null, 123, 'This is a note');
    

    Notice how much more you can do with normalized data, and how easy it is? It's trivial to locate, update, append, or delete any specific component.