Search code examples
regexparsingloggingfluentdobservability

Parse string of key=values with escaped characters


Loki outputs the following log in a key-value format with the structure key1=value1 key2=value2:

level=info ts=2023-10-20T14:30:48.716410806Z caller=metrics.go:159 component=frontend org_id=fake traceID=58290ebda8d79180 latency=fast query=\"sum by (level) (count_over_time({k8s_namespace=\\\"ingress-nginx\\\"} |= ``[1s]))\" query_hash=110010092 query_type=metric range_type=range length=15m0.001s start_delta=15m0.833402507s end_delta=832.40267ms step=1s duration=61.999532ms status=200 limit=1000 returned_lines=0 throughput=4.2MB total_bytes=260kB total_bytes_structured_metadata=0B lines_per_second=4209 total_lines=261 post_filter_lines=261 total_entries=1 store_chunks_download_time=0s queue_time=819.962996ms splits=2 shards=32 cache_chunk_req=0 cache_chunk_hit=0 cache_chunk_bytes_stored=0 cache_chunk_bytes_fetched=0 cache_chunk_download_time=0s cache_index_req=0 cache_index_hit=0 cache_index_download_time=0s cache_stats_results_req=0 cache_stats_results_hit=0 cache_stats_results_download_time=0s cache_result_req=0 cache_result_hit=0 cache_result_download_time=0s source=logvolhist

In fluentd, I'm trying to parse this log using the Labeled Tab-separated Values parser, with delimiter_pattern as /\s+/ and label_delimiter as = and get the following result:

{
  "level": "info",
  "caller": "metrics.go:159",
  "component": "frontend",
  "org_id": "fake",
  "traceID": "58290ebda8d79180",
  "latency": "fast",
  "query": "\"sum",
  "(count_over_time({k8s_namespace": "\\\"ingress-nginx\\\"}",
  "|": "",
  "query_hash": "110010092",
  "query_type": "metric",
  "range_type": "range",
  ...
}

For the key query this parser can only capture the first word and uses the space that comes after as a delimiter for another key-value.

I've tried different RegEx expressions, and two plugin parsers (https://shihadeh.dev/ruby-gems/Key-ValueParser/ & https://github.com/fluent-plugins-nursery/fluent-plugin-kv-parser) but no luck so far.

Is it a matter of getting the right regex, using a different parser, trying to unescape the characters or something else?


Solution

  • \s(?=\w+=(\\")?.+?(?(1)\1(?=\s)))
    

    From the comment by @CAustin, this regex for the delimiter_pattern works better than just /\s+/

    Extra

    This is for my specific case regarding Fluentd and log parsing. In the meantime, after analysing Fluentd and Fluent bit better, I'll probably switch to using Fluent bit because of the parsing capabilities.

    To whom it may concern :p