Search code examples
google-bigqueryfluentdstackdriver

Parsing fields out of textPayload in stackdriver logging


Here is a line from tomcat access log:

127.0.0.1 - - [24/May/2016:17:53:05 -0700] "POST /users HTTP/1.1" 200 10676

Is there a way to parse out the various fields here like client IP, HTTP request method, request path, response code, etc., and load it into separate columns in a BigQuery table?

The table at the bottom of this page links to the fluent catch-all config, but I believe different logs should be parsed out and loaded differently in a configurable manner for ease of querying?

Or am I missing something fundamental here?


Solution

  • it might be not what you mean - but just guess :

    How about loading log into GBQ table so each log line becomes row in table and than parse it into another table with something like below (code is not pretending to be optimal - just to show idea)

    SELECT 
      REGEXP_EXTRACT(log_line, r'(?: (?:.+?)){0} (.+?) '),
      REGEXP_EXTRACT(log_line, r'(?: (?:.+?)){1} (.+?) '),
      REGEXP_EXTRACT(log_line, r'(?: (?:.+?)){2} (.+?) '),
      REGEXP_EXTRACT(log_line, r'(?: (?:.+?)){3} (.+?) '),
      REGEXP_EXTRACT(log_line, r'(?: (?:.+?)){4} (.+?) '),
      REGEXP_EXTRACT(log_line, r'(?: (?:.+?)){5} (.+?) '),
      REGEXP_EXTRACT(log_line, r'(?: (?:.+?)){6} (.+?) '),
      REGEXP_EXTRACT(log_line, r'(?: (?:.+?)){7} (.+?) '),
      REGEXP_EXTRACT(log_line, r'(?: (?:.+?)){8} (.+?) '),
      REGEXP_EXTRACT(log_line, r'(?: (?:.+?)){9} (.+?) '),
    FROM (
      SELECT ' ' + REGEXP_REPLACE(log_line, r'[\[\]\"]', '') + ' ' AS log_line 
      FROM 
        (SELECT '127.0.0.1 - - [24/May/2016:17:53:05 -0700] "POST /users HTTP/1.1" 200 10676' AS log_line)
    )