I have created a log sink to capture the logs generating by components being used in our project. below are given sink details:
gcloud logging sinks describe test-project-instance-activity
bigqueryOptions:
usePartitionedTables: true
usesTimestampColumnPartitioning: true
createTime: '2021-10-17T05:15:48.434334305Z'
description: test sink to capture the instance activities
destination: bigquery.googleapis.com/projects/test-project/datasets/test_logging
filter: |-
resource.type = cloud_composer_environment OR
resource.type = cloud_dataproc_cluster OR
resource.type = gce_disk OR
resource.type = gce_vm_instance OR
resource.type = gke_container OR
resource.type = k8s_cluster
name: test-project-instance-activity
updateTime: '2021-10-17T05:15:48.434334305Z'
writerIdentity: serviceAccount:p121-639060@gcp-sa-logging.iam.gserviceaccount.com
I am capturing log details in big query dataset which has created below list of tables:
SELECT table_id FROM `test-project.test_logging`.__TABLES__;
I checked and found that most of the tables are including INFO
logs and they are generating in huge numbers for any activity happening around these google APIs. Are we really needed these many info logs? what would be the best way to exclude or filter them?
Exclusion filter(s):
resource.type="container"
severity="INFO"
as per google docs: Logs are excluded after they are received by the Logging API
Does it mean that I can only save the space where I am keeping my excluded INFO logs entries.. such as gcs or bq.
or do I need to change my application code to report less on logging.. or something can be change on airflow.cfg
file.
Any pointers to sqls to analyze these log tables?
Just a summary: incase it helps. We are running airflow dags to ingest the gcs bucket data to bq and using spark to do some aggregation upon them and we are ingesting a huge loads of data in every 15 minutes throughout the day.
Kindly suggest to minimize and reduce the logging cost. We are generating a huge logs every month.
Do we get billed for _Default
log bucket as well? what I am going to miss if I disable it.
It's hard to answer to your questions, it depends a lot on what you do and what you need!!
Are we really needed these many info logs?
I don't know. Do you use them? If not, you can skip them.
what would be the best way to exclude or filter them?
In your filter you can add severity>"INFO"
(exclude the priority INFO and below) or severity!="INFO"
(exclude only the info trace)
as per google docs: Logs are excluded after they are received by the Logging API Does it mean that I can only save the space where I am keeping my excluded INFO logs entries..
That means the logs reach the Cloud Logging and are routed afterward. Some filter can include them, some can exclude them. That means you want exclude for every routes!
Do we get billed for _Default log bucket as well?
Yes, all the logs matching with the default filter will go the the default bucket. However, the pricing has changed recently, and you won't pay for the default storage period (30 days)
what I am going to miss if I disable it.
if you have your own filter and bucket, and if they are correct and sufficient, you will miss nothing. Depends on what you do and what you need.