Search code examples
firebasefirebase-hostinggoogle-cloud-logging

How to analyze Firebase Hosting downloads from cloud logging


I have a small firebase project that uses Firebase Hosting to host a static site with a couple of cloud functions. Normally it is well under the free limit but over the past few days I have seen some very large spikes that go well beyond the free limit. I am trying to determine why this is. For example what files are being downloaded so much, where is the traffic coming from, etc.

I have linked my firebase hosting for the project to google Cloud Logging. So I can go in there and see the web request logs and this allows me to see the raw details. But I can't find a way to aggregate and analyze this information.

For example I would like to answer questions like:

  • For a given 24 hour period, list each request URL sorted by aggregate request size across all requests.
  • For a given 24 hour period, what requesting IP addresses requested the most data

And then where ever these queries lead.

Is there anything that can be done to do this type of thing in Google Cloud Platform? It seems like a pretty standard thing that companies would want to do in analyzing their web request traffic to understand where it is going, but I can't find anything about how to do this. It makes me think I don't know the right search term to find the best way to do this with GCP.

Any advice?


Solution

  • To analyze the traffic from the past days, use Big Query to load the logs and query them instead of cloud logging, as Log-based metrics only capture data after they have been defined (cannot be used to analyze the traffic from the past days).

    Steps to load logs to BigQuery:

    1. Export existing log data to a GCS bucket: You need to copy log entries that are already stored in Cloud Logging buckets to Cloud Storage buckets. Logging only copies log entries that are stored in the log bucket when the copy operation starts. Log entries that are ingested and stored after the copy operation starts don't get copied to Cloud Storage. Refer Copying log entries.
    2. Load exported data into BigQuery (exported data is JSON. Upload the logs.json file to BigQuery, using the JSONL file format). Refer Loading JSON data from Cloud Storage.
    3. Now query the logs using standard SQL. Standard SQL is the preferred SQL dialect for querying data stored in BigQuery. Refer BigQuery schema for logs.

    Example Query :

    1. For a given 24 hour period, list each request URL sorted by aggregate request size across all requests:

       SELECT  httpRequest.requestUrl, SUM(httpRequest.responseSize) AS totSize FROM logs_table GROUP BY httpRequest.requestUrl
      

    Note : Things are easier for newly-produced logs, as one can set up direct log routing to BigQuery tables by configuring sinks. As Logging receives new log entries, they are compared against each sink. If a log entry matches a sink's filter, then a copy of the log entry is written to the sink's destination. Refer Configure sinks for information.