Search code examples
phpmysqlscalabilityanalytics

PHP Mysql tracking visitors scalability


I have a web application that runs multiple websites from one codebase. I have it setup with a table that contains the sites and domains that run on the application. The application tracks visitors so we can tell how much traffic we are getting per site and on a global basis for the application.

The problem I am running into is that the visitor tracking is super slow now that there are 2.5 million records in there. Running a query to get the number of visitors this month is taking multiple minutes, making our data not so accessible.

The system is recording the tracking directly from the base php file that includes all the other files. It creates a record in the visitors table when it doesn't find an existing identifying cookie. When it creates the record it assigns a cookie to the user so that when they come back they are only creating the single visitor record. That visitor record stores how many pages they viewed and which page they came into ( entry page ) and the last page they looked at ( exit page ).

We get a fair amount of traffic and I'd like to make this report of monthly visitors accessible by speeding up the results.

I have tried adding an index to the site_id and dates before, but it didn't seem to help speed up things much...

We decided to track analytics ourselves instead of using a tool like google analytics so that we would be able to create some more meaningful data with it later. Such as when a user who is viewing the site submits a contact form and becomes a contact in the CRM we like to see the history of that contact to see which pages they viewed before asking for support, etc.

Any suggestions? The table schema is below. Thanks much in advance, I've been banging my head against the wall trying to come up with solutions.

CREATE TABLE `analytics_track_visits` (
    `id` bigint unsigned NOT NULL AUTO_INCREMENT
    ,`site_id` int(4) unsigned default NULL

    ,`inc` bigint unsigned default NULL
    ,`referer` text NOT NULL
    ,`refer_host` text NOT NULL
    ,`user_agent` text NOT NULL
    ,`browser` text NOT NULL
    ,`os` text NOT NULL
    ,`search_term` text NOT NULL

    ,`entry_page` int(4) unsigned default NULL
    ,`entry_page_url` text default NULL
    ,`exit_page` int(4) unsigned default NULL
    ,`exit_page_url` text default NULL

    ,`created` datetime NOT NULL
    ,`created_ip` varchar(200) NOT NULL default ''
    ,`created_user_id` int(4) unsigned default NULL
    ,`modified` datetime NOT NULL default '0000-00-00'
    ,`modified_user_id` int(4) unsigned default NULL

    ,PRIMARY KEY(`id`)
    ,CONSTRAINT `analytics_track_visits__site` FOREIGN KEY (`site_id`) 
        REFERENCES `site` (`id`) ON DELETE CASCADE
    ,CONSTRAINT `analytics_track_visits__entry_page` FOREIGN KEY (`entry_page`) 
        REFERENCES `page` (`id`) ON DELETE CASCADE
    ,CONSTRAINT `analytics_track_visits__exit_page` FOREIGN KEY (`exit_page`) 
        REFERENCES `page` (`id`) ON DELETE CASCADE
) ENGINE=INNODB;

inc stores the number of pages viewed by that specific visitor. entry_page is a foreign key to our cms page table ( same with exit_page ). browser and os hold values interpreted from the user_agent. search_term stores any keyword that was used to find the entry page. site_id relates to a table containing the list of site settings with doman names.

I have a suspicion that part of the problem is that the table never really gets a break, so when we run a report there are active queries inserting and updating this table at the same time.


Solution

  • Without knowing what kind of queries you're running on it, there are a few things you might want to consider:

    • Create a separate table for each site; I know that doesn't seem like a wonderful solution, but it removes the need for another expensive index in your table.
    • Set up a read-only slave to do your reporting queries on; this reduces the stress on your main database.
    • I believe that InnoDB creates an index for all your foreign keys as well; this doesn't help with the size of your table (it slows down inserts as well). Unless you remove pages regularly, you could do without those.

    I'll add more hints if I can think of more.