Search code examples
pythonregexlogginglogparser

How to make a group NULL in regex


I wanted to make a regex where in one log a group gets NULL

Regex

\d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\d.\d\d\dZ)\s+(INFO|WARN|DEBUG|ERROR|FATAL|TRACE)\s+(.*?.*?\-\s+.*?)\s(\[?.*?\]?)\s+(.*)

Logs

2019-11-14T04:25:00.123Z  WARN http-nio-127.0.0.1-7440-exec-127 CorfuCompileProxy - accessInner: Encountered a trim exception while accessing version 120383907 on attempt 0
2019-11-14T04:23:08.700Z  INFO RpcManagerRequestCleanupTimer RpcManager - SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Rpc response not received for application FabricStats request com.vmware.nsx.management.agg.messaging.AggService$ClientDataRequestMsg from client 8ac94189-d611-4eb3-9b93-c3c3a8e3d36a with correlation id 287e690e-0a47-4459-a0bb-be36fe439068 in 432000000 msec.
2019-11-14T04:24:04.072Z  INFO MessagingObjectFactoryImpl-4-2 ExporterLastAckServiceImpl - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Found exporter with elaId = Node#a3844284-e626-11e9-a87b-005056bcc0c6#AggSvc-L2-Bridging, returning lastAck = 16507 
2019-11-14T04:23:08.362Z  INFO ActivityEventRecovery-1 ActivityCacheManager - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Handling activity 92d6a146-fa12-4889-a0ff-441087e047d0 completion event for 1 
2019-11-14T04:23:08.362Z  DEBUG ActivityEventRecovery-1 ActivityCacheManager - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Handling activity 92d6a146-fa12-4889-a0ff-441087e047d0 completion event for 1

In above logs I want group 4 to be NULL as that log doesn't contain any square brackets.

I tried it for some time. Here's the link - https://regex101.com/r/LJnVrS/93

Please Help!


Solution

  • If group 4 has to be there and the content has to start or end with square brackets, you could make the contents of the group itself optional.

    To match either starting with a square bracket on the left or ending with one at the right, you could use an alternation:

    ((?:\S+\]|\[\S+)?)
    
    • ( Capture group 4
      • (?: Non capture group
        • \S+\] Match 1+ non whitespace chars and the ending ]
        • | Or
        • \[\S+ Match the starting [ and 1+ non whitespace chars
      • )? Close group and make optional
    • )

    The pattern could look like

    (\d{4}-\d\d-\d\dT\d\d:\d\d:\d\d\.\d{3}Z)\s+(INFO|WARN|DEBUG|ERROR|FATAL|TRACE)\s+((?:\S+\s+){4})((?:\S+\]|\[\S+)?)(.*)
    

    Regex demo

    If the lines all start at the beginning of the string, you could prepend the pattern with ^ and add using the multiline flag.

    Note that you have to escape the dot to match it literally.