google-analytics web-analytics data-analysis

What is the Optimal Way for a Web Metrics App to Calculate a Visitor's Time On Site?

i am developing an internal web analysis system like Google Analytics, i am not very clear about the concept of page stay time, the typical explanation of this measure from web is:

user accessed page A at timestamp: t1
user accessed page B at timestamp: t2, (t2 > t1)

then the page stay time for A is t2 - t1, for B is 0

My question is: In this scenario, when calculating page stay time for B, do we need to check whether user click page B from page A? i.e. B's refer is A?

Solution

There are two techniques to measure Time on Page, and its aggregated counterpart Time on Site, distinguished by the markers used to record time-event pairs:

timestamp
ping-based

Google Analytics, for instances uses the former, in particular, GA records a timestamp for each pageview, event, and transaction that occurs in the user's session.

So exactly as you indicated in your Question, Google Analytics calculates Time on Site by summing the timestamp deltas for that user's entire session history. There is no timestamp for the last page in the user's session, so the final time delta is not calculated.

This introduces error into the Time on Site metric, but i still think it's the best available choice of measurement technique. The technique is simple to explain and therefore simple to understand precisely where the occur occurs and from which direction it influences the reported metric. In other words, you know that Time on Site is always undercounted.

Second, this error can be estimated (i.e., estimate the true Time on Site) because you have reliable Time on Page for every other page in the user's visit. Even better, from your population of site visitors, you have data on the mean Time on Page for the particular page that the user visited last in their session.

The other group of techniques for measuring Time on Page are ping-based. Here javascript in the page repeatedly calls, at a pre-determined time interval, a function that page ping the server. The javascript snippet on the page calls this pinging function as long as that page is open on the client browser.

Perhaps the key advantage of these techniques is that they address the problem of not counting the time that the user spent on the page that they ended their session on. I suppose the primary disadvantage of ping-based techniques is a higher implementation cost. The accuracy of this technique depends of course on ping frequency--average measurement precision is roughly half the ping frequency. If your ping frequency is 10 seconds, you can resolve Time on Page to 5 seconds on average. But any server activity has an associated resource cost so this parameter, i.e., ping frequency needs to be optimized with care. That's what i mean by "higher implementation cost".

A recent blog post by Brian Cray discusses such a solution and provides a javascript snippet for this purpose. In addition, Episodes is a javascript library for accurate measurement of javascript (rather than DOM) events. This might be of use to your analytics project.

So which of these two techniques is better? I suspect a clever combination of the two would give you the highest resolution with the lowest page weight and server load. The only analytics app i am aware of that implements such a hybrid system is W3Counter. [Note: i have no affiliation or agreement of any kind, with this Project.]

I have not used W3Counter, but based on this feature alone, i believe it's worth consideration. (I do not however, like the name, "W3Counter" which causes me to think it's a validation checker.)