I'm working no a site which stores individual page views in a 'views' table:
CREATE TABLE `views` (
`view_id` bigint(16) NOT NULL auto_increment,
`user_id` int(10) NOT NULL,
`user_ip` varchar(15) NOT NULL,
`view_url` varchar(255) NOT NULL,
`view_referrer` varchar(255) NOT NULL,
`view_date` date NOT NULL,
`view_created` int(10) NOT NULL,
PRIMARY KEY (`view_id`),
KEY `view_url` (`view_url`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
It's pretty basic, stores user_id (the user's id on the site), their IP address, the url (without the domain to reduce the size of the table a little), the referral url (not really using that right now and might get rid of it), the date (YYYY-MM-DD format of course), and the unix timestamp of when the view occurred.
The table, of course, is getting rather big (4 million rows at the moment and it's a rather young site) and running queries on it are slow.
For some basic optimization I've now created a 'views_archive' table:
CREATE TABLE `views_archive` (
`archive_id` bigint(16) NOT NULL auto_increment,
`view_url` varchar(255) NOT NULL,
`view_count` smallint(5) NOT NULL,
`view_date` date NOT NULL,
PRIMARY KEY (`archive_id`),
KEY `view_url` (`view_url`),
KEY `view_date` (`view_date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
This ignores the user info (and referral url) and stores how many times a url was viewed per day. This is probably how we'll generally want to use the data (how many times a page was viewed on a per day basis) so should make querying pretty quick, but even if I use it to mainly replace the 'views' table (right now I imagine I could show page views by hour for the last week/month or so and then show daily views beyond that and so would only need the 'views' table to contain data from the last week/month) but it's still a large table.
Anyway, long story short, I'm wondering if you can give me any tips on how to best handle the storage of stats/page views in a MySQL site, the goal being to both keep the size of the table(s) in the db as small as possible and still be able to easily (and at least relatively quickly) query the info. I've looked at partitioned tables a little, but the site doesn't have MySQL 5.1 installed. Any other tips or thoughts you could offer would be much appreciated.
You probably want to have a table just for pages, and have the user views have a reference to that table. Another possible optimization would be to have the user IP stored in a different table, perhaps some session table information. That should reduce your query times somewhat. You're on the right track with the archive table; the same optimizations should help that as well.