Search code examples
prometheusgrafanagrafana-lokipromtail

Unable to send long json labels from promtail to loki


Describe the bug For a few days, I've been moving from google cloud logs to the grafana stack. We have logs that due to the nature, can ocationaly contain long json details (which are very important for us to see). We are only need to browse through these logs locally. So the logs are collected in log files, zipped, and backed, stored in cold storage at interval.

I've configured promtail, loki and grafana locally using docker compose and config files. docker-compose.yaml

networks:
  loki:

services:
  loki:
    image: grafana/loki:latest
    ports:
      - "3100:3100"
    volumes:
      - C:/softwares/grafana/docker/loki/loki-config.yaml:/etc/loki/local-config.yaml
    command: -config.file=/etc/loki/local-config.yaml
    networks:
      - loki

  promtail:
    environment:
      - GRAFANA_TOKEN={{GRAFANA_TOKEN}}
    image: grafana/promtail:latest
    volumes:
      - C:/softwares/grafana/docker/loki/promtail-config.yml:/etc/promtail/promtail-config.yml
      - C:/projects/app/logs/:/var/log/
    command: -config.file=/etc/promtail/promtail-config.yml
    networks:
      - loki

  grafana:
    environment:
      - GF_PATHS_PROVISIONING=/etc/grafana/provisioning
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
      - GF_SERVER_HTTP_TIMEOUT=1200
    entrypoint:
      - sh
      - -euc
      - |
        mkdir -p /etc/grafana/provisioning/datasources
        cat <<EOF > /etc/grafana/provisioning/datasources/ds.yaml
        apiVersion: 1
        datasources:
        - name: Loki
          type: loki
          access: proxy 
          orgId: 1
          url: http://loki:3100
          basicAuth: false
          isDefault: true
          version: 1
          editable: false
        EOF
        /run.sh
    image: grafana/grafana:latest
    ports:
      - "3101:3000"
    networks:
      - loki

Then loki config

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096
  log_level: debug
  grpc_server_max_concurrent_streams: 9999999

common:
  instance_addr: 127.0.0.1
  path_prefix: /tmp/loki
  storage:
    filesystem:
      chunks_directory: /tmp/loki/chunks
      rules_directory: /tmp/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 1000

limits_config:
  max_label_names_per_series: 80
  max_line_size: 52428800999999
  max_line_size_truncate: false

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

pattern_ingester:
  enabled: true

ruler:
  alertmanager_url: http://localhost:9093

frontend:
  encoding: protobuf

And promtail config

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push
    batchwait: 1s
    batchsize: 1048576  # 1MB batch size

limits_config:
  # 10MB is the default line limit. Set to 0 for unlimited line length.
  max_line_size: 0

scrape_configs:
- job_name: generations_job_name
  static_configs:
  - targets:
      - localhost
    labels:
      job: varlogs
      __path__: /var/log/**/*.log
  pipeline_stages:
    - drop:
        expression: "(faillog|lastlog|problematic_log)"
    - regex:
        expression: "(?P<full_json>.*)"
    - json:
        expressions:
          # Custom labels
          level: level
          msg: msg
          time: time
          src:
          a_service:
          cron_id:
    - template:
        source: level
        template: '{{ if eq .Value "10" }}trace{{ else if eq .Value "20" }}debug{{ else if eq .Value "30" }}info{{ else if eq .Value "40" }}warn{{ else if eq .Value "50" }}error{{ else if eq .Value "60" }}fatal{{ else }}unknown{{ end }}'
    - output:
        source: msg
    - labels:
        # Custom labels
        level: level
        msg: msg
        time: time
        src:
        a_service:
        cron_id:
        full_json:
       

All works well except that once I have the full_json (which has the full log entry}, not all the logs are accepted by loki from promtail.

  • When I run the containers, all will start

Image

  • But in grafana, we'll only get 445 logs

Image while it should be about 605

Image

  • Which means some of the logs with bigger json payload where rejected (by loki, I suspect).
  • In promtail's log, I see this logs

Image

Here is the text ``` 2025-01-31 21:18:45 level=info ts=2025-01-31T20:18:45.605055546Z caller=tailer.go:147 component=tailer msg="tail routine: started" path=/var/log/gen-2025-01-31-17-00.log 2025-01-31 21:18:45 ts=2025-01-31T20:18:45.605075546Z caller=log.go:168 level=info msg="Seeked /var/log/gen-2025-01-31-17-00.log - &{Offset:0 Whence:0}" 2025-01-31 21:18:46 level=error ts=2025-01-31T20:18:46.9207078Z caller=client.go:430 component=client host=loki:3100 msg="final error sending batch" status=400 tenant= error="server returned HTTP status 400 Bad Request (400): 1 errors like: stream '{a_service="generate", cron_id="8b690111-adfa-49ea-9e75-1a4a3a7cf864", filename="/var/log/gen-2025-01-31-11-38.log", full_json="{\"name\":\"b-w-g\",\"hostname\":\"DESKTOP-8JQM1T8\",\"pid\":7996,\"a_service\":\"generate\",\"cron_id\":\"8b690111-adfa-49ea-9e75-1a4a3a7cf864\",\"local_generate\":\"8b690111-adfa-49ea-9e75-1a4a3a7cf864\",\"generate_cron_source\":\"browser\",\"s_generationService_startGeneration\":\"1ckeaWuL8vz5KG6A8kf79A\",\"s_generationService_generateTask\":\"mLpwBC4Bcsb8DrtNna8Cxt\",\"s_generationService_generateArticle\":\"7MkWEkVxdF88asJPqG9KrR\",\"s_generationService_getArticleElementJsonFromPresetElements\":\"6ARQahfN2YQmxRfsshMFj1\",\"level\":10,\"my_debug_info\":{\"allArticleElementJson\":[{\"id\":\"sigQXvLHCV44UJVAPCWh8v\",\"element_id\":\"9c53cde5-0d46-48cf-b0c6-a7410415ed81\",\"ai_messages\":[],\"ai_messages_topic_body\":{\"outline\":[],\"batch1\":[],\"batch2\":[],\"batch3\":[]},\"product_list_tables\":{\"payload\":{\"products\":[],\"others\":[]}},\"product_grid_t" 2025-01-31 21:18:48 level=error ts=2025-01-31T20:18:48.131233805Z caller=client.go:430 component=client host=loki:3100 msg="final error sending batch" status=400 tenant= error="server returned HTTP status 400 Bad Request (400): 1 errors like: stream '{a_service="generate", cron_id="8b690111-adfa-49ea-9e75-1a4a3a7cf864", filename="/var/log/gen-2025-01-31-11-38.log", full_json="{\"name\":\"b-w-g\",\"hostname\":\"DESKTOP-8JQM1T8\",\"pid\":7996,\"a_service\":\"generate\",\"cron_id\":\"8b690111-adfa-49ea-9e75-1a4a3a7cf864\",\"local_generate\":\"8b690111-adfa-49ea-9e75-1a4a3a7cf864\",\"generate_cron_source\":\"browser\",\"s_generationService_startGeneration\":\"1ckeaWuL8vz5KG6A8kf79A\",\"s_generationService_generateTask\":\"mLpwBC4Bcsb8DrtNna8Cxt\",\"s_generationService_generateArticle\":\"7MkWEkVxdF88asJPqG9KrR\",\"s_GenerationService_generateArticleElement\":\"cr5oDMSfgRoUpU9JTMEGDy\",\"s_RoundupElementGenerationService_startGeneration\":\"90276ed0-7166-48ef-982a-9adaf94c542c\",\"level\":10,\"my_debug_info\":{\"allProductsToUse\":[{\"type\":\"search_product\",\"title\":\"Amazon Fire TV Stick 4K (newest model) with AI-powered Fire TV Search, Wi-Fi 6, stream over 1.5 million movies and shows, free & live TV\",\"image\":\"https://m.media-a"


### What I expect
- Now, once I remove that ```full_json``` label, all the logs come in and I am able to browse through them on grafana.
- We really need to get the full json payload as well on grafana through loki. We don't need it indexed, just for it to be there.
- I've spent days looking for how to twick both loki and promtail to get it to work through the docs but nothing.

Is there anything I can set in loki to get it to accept my json data since I am running it locally?



Solution

  • For now, what I did was to use templates.

            source: z_full_json
            # Truncate to 64KB (adjust size as needed)
            template: '{{ if gt (len .Value) 65536 }}{{ slice .Value 0 65536 }}...TRUNCATED{{ else }}{{ .Value }}{{ end }}'
    

    This way, promtail will truncate any huge json that loki might reject. This way, I am still able to get most of my json label values for each log and only miss a few.