Search code examples
database-designdatabase-table

Should I have less tables and use complex queries to fetch data or have more tables to simplify queries?


Let's take a senario where user is tracking traffic for certain cities. The traffic is updated every two hours and we've to keep previous data to plot graph. So I've a traffic_stats table which looks like this -

traffic_stats(id,city_id,user_id,traffic,created_at)

(given traffic is a number)

There is a stats refresher daemon which takes the unique city_ids, gets current traffic stats for these cities and adds new entry to this table itself. The daemon uses this query to fetch city_id -

SELECT * FROM traffic_stats GROUP BY city_id

and adds new entry for each city_id in the same table. The user_id attribute for each new entry is 0 since it doesn't matter which user has subscribed for the city. If the city_id is in the table, it's traffic_stats is refreshed.

On the front end, following query is run to fetch data for user -

SELECT * FROM 
(SELECT * FROM traffic_stats WHERE user_id = #{session[:user_id]} ORDER BY created_at DESC)
as traffic_for_user_in_descending_order 
GROUP BY city_id

This gives single latest entry for a city_id.

This should work fine except for the fact that if 100 users are tracking 200 unique cities, there will be 200 new entry in the traffic stats table every two hours. That's 2400 entries a day and the table will keep growing.

Now, I could have had one table which has data about the cities that users are tracking and another table that the refresher daemon adds entry to. But I'm not sure if there's any performance advantage to this approach.


Solution

  • It might be better to create a separate City table, that way you can query the distinct city ids from that rather than scanning the whole table in the first select statement. It would also make reading the database a bit easier. If you'd rather not do this, I would suggest using SELECT DISTINCT city_id FROM traffic_stats. This way you will be retrieving less information.

    Having a single table seems reasonable in this case, as the application you are using the information for is simple. As for historical data, it might be nice to create a separate table to store aggregated information. You could prune the primary table, selecting and storing averages for a particular length of time (day, week, month, etc.), and then filter even more by basing information off of the user id. This would cut down on the database disk usage and query time.

    Personally I like to break things out as much as possible. It does make for more complicated queries, but it makes using and reading information from a database much easier in my opinion.