Search code examples
mysqldatabaseloggingreportinganalytics

Running an analytics website....1 heavy insert vs several light selects and inserts


Currently my database design is not normalised.

When a user visits the website all the data such as the full browser name, OS name, etc are stored. This is causing the database to grow very fast at a rate of about 10GB a day with just 2 million people a day (2 million inserts per day). However this only means that there is only one insert per data. I am using mysql and php.

Now what I was wondering if it would be faster if I normalised the data. What I mean is create separate tables for OS name and OS ids, browser name and browser id. Then use the ids in the traffic log table. In this situation there would be two lookups in the browser table and the os table for the ids and if they dont exist then two inserts and then another insert with the browser id and os id in the traffic log table. So 2 selects and 2 potential inserts and 1 set insert.

Which one is likely to be faster. Also are there any other options for file based approach for storing the os etc or maybe store the all possible browsers, os in arrays and then only insert if the visitor has one of those browsers or os else just put them in as none.


Solution

  • You're really looking at a question of speed versus space; the more complex table structure will likely be a bit slower than inserting a large quantity of bulk data into a single table. In these sorts of situations, a little bit of speed sacrifice to reduce the rate of database size growth is usually considered to be appropriate; normalizing the tables in the way you describe should provide exactly that compromise.

    Additionally, splitting up the data into multiple tables and having cross-referencing will allow you to later do easier queries on the data, say if you decide that you want to run analytics on the users to your website.