Search code examples
amazon-web-servicesamazon-s3aws-gluelogstash-grok

Glue Classifier could not classify columns using Grok pattern


I have an s3 bucket that I structured using the format s3://<bucket-name>/year=<yearno>/month=<monthno>/day=<dayno>/<filename>.log. The lines in the .log files that I've got is structured like:

2020-01-06 09:05:14,450 INFO [Asterisk-Java DaemonPool-1-thread-3] handler.CallHandler (CallHandler.java:849) - Original name : harris changed to : haris . Exist? true

While the Grok pattern that I'm using for the classifier is:

[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9:,]{12} INFO \[Asterisk-Java DaemonPool-1-thread-[0-9]{1,3}] handler.CallHandler \(CallHandler.java:849\) - Original name : %{WORD:original_name} changed to : %{WORD:transformed_name} . Exist\? %{WORD:exist_prior}

I checked my Grok pattern using this debugger web app, and it's confirmed to be correct. What I expected the resulting table to be:

+------+-------+-----+---------------+------------------+--------------+
| year | month | day | original_name | transformed_name | exists_prior |
+------+-------+-----+---------------+------------------+--------------+
|    - |     - |   - |             - |                - |            - |
+------+-------+-----+---------------+------------------+--------------+

However, the table that I've gotten is:

+------+-------+-----+------+------+------+------+
| year | month | day | col0 | col1 | col2 | col3 |
+------+-------+-----+------+------+------+------+
|    - |     - |   - |    - |    - |    - |    - |
+------+-------+-----+------+------+------+------+

Where did I go wrong?


Solution

  • I changed my capture regex from %{WORD:variable_name} to %{DATA:variable_name}. It then worked as expected.