Search code examples
mathdownloadanalyticscomputer-science

Calculate the number of visits based on downloaded GB


I have a website hosted in firebase that totally went viral for a day. Since I wasn't expecting that, I didn't install any analytics tool. However, I would like to know the number of visits or downloads. The only metric I have available is the GB Downloaded: 686,8GB. But I am confused because if I open the website with the console of Chrome, I get two different metrics about the size of the page: 319KB transferred and 1.2MB resources. Furthermore, not all of those things are transferred from firebase but from other CDN as you can see in the screenshots. What is the proper way of calculating the visits I had?

chrome browser console firebase screenshot


Solution

    • Transferred metric is how much bandwidth was used after compression was applied.
    • Resources metric is how much disk space those resources use before they are compressed (for transfer).

    True analytics requires an understanding how what is on the web. There are three classifications:

    • Humans, composed of flesh and blood and overwhelmingly (though not absolutely) use web browsers.
    • Spiders (or search engines) that request pages with the notion that they obey robots.txt and will list your website in their websites for relevant search queries.
    • Rejects (basically spammers and the unknowns) which include (though are far from limited to) content/email scrapers, brute-force password guessers, vulnerability scanners and POST spammers.

    With this clarification in place what you're asking in effect is, "How many human visitors am I receiving?" The easiest way to obtain that information is to:

    1. Determine what user agent requests are human (not easy, behavior based).
    2. Determine the length of time a single visit from a human should count as.
    3. Assign human visitors a session.

    I presume you understand what a cookie is and how it differs from a session cookie. Obviously when you sign in to a website you are assigned a session. If that session cookie is not sent to the server on a page request you will in effect be signed out. You can make session cookies last for a long time and it will come down to factors such as convenience for the visitor and if you directly count those sessions or use it in conjunction with something else.

    Now your next thought likely is, "But how do I count downloads?" Thankfully you mention PHP in your website so I can thankfully give you some code that should make sense to you. If you just link directly to the file you'd be stuck with (at best) counting clicks via a click event on the anchor element though if the download gets canceled because it was a mistake or something else makes it more subjective than my suggestion. Granted my suggestion can still be subjective (e.g. they decide they actually don't want to download and cancel before the completion) and of course if they use the download is another aspect to consider. That being said if you want the server to give you a download count you'd want to do the following:

    1. You'll may want to use Apache rewrite (or whatever the other HTTP server equivalents are) so that PHP handles the download.
    2. You'll may need to ensure Apache has the proper handling for PHP (e.g. AddType application/x-httpd-php5 .exe .msi .dmg) so your server knows to let PHP run on the request file.
    3. You'll want to use PHP's file_exists() with an absolute file path on the server for the sake of security.
    4. You'll want to ensure that you set the correct mime for the file via PHP's header() as you should expect browsers to be horrible at guessing.
    5. You absolutely need to use die() or exit() to avoid Gecko (Firefox) bugs if your software leaks even whitespace as the browser would interpret it as part of the file likely causing corruption.

    Here is the code for PHP itself:

    $p = explode('/',strrev($_SERVER['REQUEST_URI']));
    $file = strrev($p[0]);
    header('HTTP/1.1 200');
    header('Content-Type: '.$mime);
    echo file_get_contents($path_absolute.$file);
    die();
    

    For counting downloads if you want to get a little fancy you could create a couple of database tables. One for the files (download_files) and the second table for requests (download_requests). Throw in basic SQL queries and you're collecting data. Record IPv6 (Storing IPv6 Addresses in MySQL) and you'll be able to discern from a query how many unique downloads you have.

    Back to human visitors: it takes a very thorough study to understand the differences between humans and bots. Things like Captcha are garbage and are utterly annoying. You can get a rough start by requiring a cookie to be sent back on requests though not all bots are ludicrously stupid. I hope this at least gets you on the right path.