Search code examples
mysqlsessionweb-analyticsmatomologfile

Piwik MySQL DB: calculate length of user actions


I'm trying to calculate the length of an user action on a website that was tracked by Piwik. The session data is stored in the MySQL table 'log_visit_action'.

In the column 'time_spent_action' I want to calculate the time the user spent till taking the next action or till leaving the website (no further action assigned to the 'idvisit').

In Excel I would know how to solve the problem. Per se it's not difficult. But since I'm new to MySQL I have no idea how to implement it on this MySQL table with 50 million entries. Perhaps you can provide some code examples?

This is a screenshot of the table structure I exported to Excel. Here you can find some example data on sqlfiddle.

Many thanks in advance, Lars

Table Structure


Solution

  • I came over the same problem, recently. Therefore, here's a solution, working with temporary tables. Actually, I offer two solutions.

    Solution (A) identifies the subsequent log_action_link using a subquery (very time-consuming)

    # Very time-consuming
    CREATE TEMPORARY TABLE tmp
        (INDEX idlink_follow (idlink_follow))
        SELECT idlink_va, time_spent_ref_action AS time_spent_action, idvisit, (
            SELECT idlink_va FROM piwik_log_link_visit_action b
            WHERE (a.idvisit = b.idvisit) AND (b.idlink_va < a.idlink_va)
            ORDER BY idlink_va
            LIMIT 1
        ) AS idlink_follow
        FROM piwik_log_link_visit_action a
        ORDER BY idvisit DESC, idlink_va DESC;
    

    Solution (B) depends on a clean and complete piwik_log_link_visit_action table. This is usually the case, but you can never be perfectly sure.

    # Create table containing the time per action
    # (requires a clean and complete piwik_log_link_visit_action)
    CREATE TEMPORARY TABLE tmpA
    SELECT idlink_va, time_spent_ref_action, idvisit
        FROM piwik_log_link_visit_action
        ORDER BY idvisit, idlink_va;
    SELECT * FROM tmpA;
    
    SET @lagID = NULL;
    CREATE TEMPORARY TABLE tmp
        (INDEX idlink_follow (idlink_follow))
        SELECT @lagID AS idlink_follow, time_spent_ref_action AS time_spent_action, @lagID:=idlink_va AS idlink_va
        FROM tmpA
        ORDER BY idvisit, idlink_va;
    
    DROP TEMPORARY TABLE tmpA;
    

    Based on the temporary table, the page times can be accessed:

    SELECT l.idvisit, t.time_spent_action, l.*
    FROM piwik_log_link_visit_action l
    LEFT JOIN tmp t ON (t.idlink_follow = l.idlink_va)
    ORDER BY l.idvisit, l.server_time
    

    Mind to delete the temporary table.

    DROP TEMPORARY TABLE tmp;