Search code examples
mysqldatabasedatabase-designnormalization

Database Design: User Profiles like in Meetup.com


In Meetup.com, when you join a meetup group, you are usually required to complete a profile for that particular group. For example, if you join a movie meetup group, you may need to list the genres of movies you enjoy, etc.

I'm building a similar application, wherein users can join various groups and complete different profile details for each group. Assume the 2 possibilities:

  1. Users can create their own groups and define what details to ask users that join that group (so, something a bit dynamic -- perhaps suggesting that at least an EAV design is required)
  2. The developer decides now which groups to create and specify what details to ask users who join that group (meaning that the profile details will be predefined and "hard coded" into the system)

What's the best way to model such data?

More elaborate example:

The "Movie Goers" group request their members to specify the following:

  • Name
  • Birthdate (to be used to compute member's age)
  • Gender (must select from "male" or "female")
  • Favorite Genres (must select 1 or more from a list of specified genres)

The "Extreme Sports" group request their member to specify the following:

  • Name
  • Description of Activities Enjoyed (narrative form)
  • Postal Code

The bottom line is that each group may require different details from members joining their group. Ideally, I would like anyone to create a group (ala MeetUp.com). However, I also need the ability to query for members fairly well (e.g. find all women movie goers between the ages of 25 and 30).


Solution

  • For something like this....you'd want maximum normalization, so you wouldn't have duplicate data anywhere. Because your user-defined tables could possibly contain the same type of record, I think that you might have to go above 3NF for this.

    My suggestion would be this - explode your tables so that you have something close to 6NF with EAV, so that each question that users must answer will have its own table. Then, your user-created tables will all reference one of your question tables. This avoids the duplication of data issue. (For instance, you don't want an entry in the "MovieGoers" group with the name "John Brown" and one in the "Extreme Sports" group with the name "Johnny B." for the same user; you also don't want his "what is your favorite color" answer to be "Blue" in one group and "Red" in another. Any data that can span across groups, like common questions, would be normalized in this form.)

    The main drawback to this is that you'd end up with a lot of tables, and you'd probably want to create views for your statistical queries. However, in terms of pure data integrity, this would work well.

    Note that you could probably get away with only factoring out the common fields, if you really wanted to. Examples of common fields would include Name, Location, Gender, and others; you could also do the same for common questions, like "what is your favorite color" or "do you have pets" or something to that extent. Group-specific questions that don't span across groups could be stored in a separate table for that group, un-exploded. I wouldn't advise this because it wouldn't be as flexible as the pure 6NF option and you run the risk of duplication (how do you predetermine which questions won't be common questions?) but if you really wanted to, you could do this.

    There's a good question about 6NF here: Would like to Understand 6NF with an Example

    I hope that made some sense and I hope it helps. If you have any questions, leave a comment.