Search code examples
regexregex-lookarounds

Keep/remove log line entries containing a word that continues over multiple lines, until the next timestamp instance?


To remove single lines in a log using regular expression, I'll use the following:

  1. If I wanted to remove lines not containing the word TMTimeZoneManager, I use: ^(?!.*TMTimeZoneManager.*).+$

  2. And if I want to remove a line containing the word TMTimeZoneManager I use: ^.*(TMTimeZoneManager).*$

Log example:

2023-10-05 07:47:38.609480+1000 0x789      Default     0x0                  121    0    timed: [com.apple.timed:data] TZ,init,rules,2
2023-10-05 07:47:38.609589+1000 0x789      Default     0x0                  121    0    timed: [com.apple.timed:text] <TMTimeZoneManager: 0x155f0fa80 {
          Location 0 —,
    MobileLockdown 0 —,
} = (null)>
2023-10-05 07:47:38.609594+1000 0x789      Default     0x0                  121    0    timed: [com.apple.timed:data] TZ,reset,reason,init
2023-10-05 07:47:38.610578+1000 0x789      Info        0x0                  121    0    timed: [com.apple.timed:text] Loading time source: ServerState.bundle
2023-10-05 07:47:38.611413+1000 0x8b5      Default     0x284                126    0    locationd: (Network) [com.apple.network:path] nw_path_evaluator_start [XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX <NULL> generic, attribution: developer]
    path: unsatisfied (No network route)
2023-10-05 07:47:38.611680+1000 0x8b5      Activity    0x286                126    0    locationd: (LocationSupport) CL: #Manufacturing service

The expected results would be:

Removing all lines that do not contain TMTimeZoneManager would return:

2023-10-05 07:47:38.609589+1000 0x789      Default     0x0                  121    0    timed: [com.apple.timed:text] <TMTimeZoneManager: 0x155f0fa80 {
          Location 0 —,
    MobileLockdown 0 —,
} = (null)>

Removing only lines containing the word TMTimeZoneManager would return:

2023-10-05 07:47:38.609480+1000 0x789      Default     0x0                  121    0    timed: [com.apple.timed:data] TZ,init,rules,2
2023-10-05 07:47:38.609594+1000 0x789      Default     0x0                  121    0    timed: [com.apple.timed:data] TZ,reset,reason,init
2023-10-05 07:47:38.610578+1000 0x789      Info        0x0                  121    0    timed: [com.apple.timed:text] Loading time source: ServerState.bundle
2023-10-05 07:47:38.611413+1000 0x8b5      Default     0x284                126    0    locationd: (Network) [com.apple.network:path] nw_path_evaluator_start [XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX <NULL> generic, attribution: developer]
    path: unsatisfied (No network route)
2023-10-05 07:47:38.611680+1000 0x8b5      Activity    0x286                126    0    locationd: (LocationSupport) CL: #Manufacturing service

The issue is that this is okay for lines that are only one line, but if the log entry goes over multiple lines it will obviously remain strict and either remove all lines except line 2 for the first regex. In regex 2, it will remove line 2 and keep everything else.

I'd like to be able to either keep or remove the whole log entry even if it goes over multiple lines using the timestamp as the endpoint, as the date 2023-10-05 will always be an indicator of the start of the next log entry.

It's for use in the apps Drafts and Runestone, so I'm restricted to regular expressions.

Scouring other posts, I have figured out how I can remove lines containing TMTimeZoneManager, even if they go over multiple lines with: ^.*TMTimeZoneManager.*(?:\n+(?!\d{4}-\d{2}-\d{2}\s\d{2}\.*).+)*

[My issue now] The closest I have gotten to keep only the lines containing the word TMTimeZoneManager is: ^(?!.*TMTimeZoneManager.*).+(?:\n+(?!\d{4}-\d{2}-\d{2}\s\d{2}\.*).+)*

It manages to highlight everything correctly that doesn't have TMTimeZoneManager in it, even over multiple lines.

Where its failing is that on the lines to keep, it will only do the first line. I have spent much time looking over other posts and have tried many various ways to both re-write it and also to try to combine both.

/^.*TMTimeZoneManager.*(?:\n+(?!\d{4}-\d{2}-\d{2}\s\d{2}\.*).+)*$|^(?!.*TMTimeZoneManager.*).+(?:\n+(?!\d{4}-\d{2}-\d{2}\s\d{2}\.*).+)*$

I am not understanding how I should get it to also do the same (the correct outcome for lines to remove) but for the lines to keep.

How could I modify or better write the regex to achieve this so line 3, 4 and 5 will be included? And if you could explain where I'm going wrong I'd be grateful to learn.

Update: I kept persisting and trying more options, I found one with \b and I got it working, but if it’s a hack and there is a better way please let me know. I also noticed I’d left seconds in.

The final working version to remove the whole log line entry not containing a particular word (including across multiple lines) is: ^\b(?!.*TMTimeZoneManager.*).+(?:\n+(?!\d{4}-\d{2}-\d{2}\.*|$).+)*\n

Update: The suggestions given work on most lines, but I have run into an issue where on some entries it doesn't select the whole entry above the appearance of a keyword. I've got it close but not the whole way. I looked over the links provided and have tried variations with look behind, start/finish, and the best I've come up with are the following, but it's not quite there.

(.*TMTimeZoneManager.*$(?s:(?!\d{4}-\d{2}-\d{2}).)+) .*TMTimeZoneManager.* (?s:(?!\d{4}-\d{2}-\d{2}).)+ (?!=\d{4}-\d{2}-\d{2}.*).*TMTimeZoneManager.*+(?:\n*+(?!\d{4}-\d{2}-\d{2}|$).)*

I'm limited to using single line regex as the $1/$2 crashes. Ideally it'd be great to have one regex to keep only lines with any instance of the keyword and one to remove lines not containing the keyword. Within any word appearing between the timestamps.

Trimmed example below where it is cutting off in one of the log events.

2023-10-05 07:47:38.609589+1000 0x789      Default     0x0                  121    0    timed: [com.apple.timed:text] <TMTimeZoneManager: 0x155f0fa80 {
          Location 0 —,
    MobileLockdown 0 —,
} = (null)>
2023-10-05 07:47:38.611413+1000 0x8b5      Default     0x284                126    0    locationd: (Network) [com.apple.network:path] nw_path_evaluator_start [XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX <NULL> generic, attribution: developer]
    path: unsatisfied (No network route)
2023-10-05 07:47:38.611680+1000 0x8b5      Activity    0x286                126    0    locationd: (LocationSupport) CL: #Manufacturing service
2023-10-05 07:47:39.816687+1000 0x128a     Info        0x0                  499    0    networkserviceproxy: (NetworkServiceProxy) [com.apple.networkserviceproxy:Large] Validated configuration <NSPPrivacyProxyConfiguration: 0x*********> {
    authInfo =     {
        accessTokenKnownOrigins =         (
            “challenges.url.com”,
            “hcaptcha.com”,
            “recaptcha.net”
        );
        accessTokenTypes =         (
        );
        accessTokenURL = “https://URL.com”;
        authType = “XXXX_XXXX”;
        authURL = “https://URL.com”;
    };
    bootstrapResolver =     {
        someURL = “https://URL.com”;
    };
    enabled = 1;
    fallbackPathWeights =     (
                {
            proxies =             (
                2,
                0
            );
            weight = 30;
        },
                {
            proxies =             (
                2,
                12
            );
            weight = 0;
        }
    );
    policyTierMap =     (
                {
            policy =             {
                conditions =                 (
                    “XXXX_XXXX”,
                    “XXXX_XXXX_XXXX”
                );
            };
            tier = SUBSCRIBER;
        }
    );
    preferredPathEnabledPercentage = 80;
    proxies =     (
                {
            preferredPathConfigUri = “https://URL.com:443”;
            TMTimeZoneManager = “XXXX_XXXX”;
            proxyKeyInfo =             (
                {length = 91, bytes = 0x******** XXXXXX },
                {length = 91, bytes = 0x******** XXXXXX }
            );
            proxyURL = “https://URL.com:443”;
            supportsFallback = 1;
            tokenKeyInfo = {length = 342, bytes = 0x******** XXXXXX };
            vendor = XXXX;
        },
                {
            TMTimeZoneManager = “XXXX_XXXX”;
            proxyKeyInfo =             (
                {length = 91, bytes = 0x******** XXXXXX }
            );
            proxyURL = “https://URL.com:443”;
            tokenKeyInfo = {length = 342, bytes = 0x******** XXXXXX };
            vendor = “XXXX”;
        }
    );
    version = 1;
}
2023-10-05 07:47:46.212587+1000 0x8b5      Default     0x284                126    0    locationd: (Network) [com.apple.network:path] nw_path_evaluator_start```

Solution

  • If you want to remove the whole log line entry not containing a particular word (including across multiple lines):

    ^\d{4}-\d{2}-\d{2} (?!.*\bTMTimeZoneManager\b).*(?:\n(?!\d{4}-\d{2}-\d{2} |.*\bTMTimeZoneManager\b).*)*
    

    The pattern matches:

    • ^ Start of string
    • \d{4}-\d{2}-\d{2} Match a date like pattern followed by a space
    • (?!.*\bTMTimeZoneManager\b).* Assert that the lines does not contain the word TMTimeZoneManager, and if that is true then match the whole line
    • (?: Non capture group \n Match a newline
      • (?!\d{4}-\d{2}-\d{2} |.*\bTMTimeZoneManager\b) Negative lookahead to assert that the lines does not start with a dateline pattern and does not contain the word TMTimeZoneManager
      • .* Match the whole line
    • )* Close the non capture group and optionally repeat it

    Regex demo

    If you want to remove optional trailing newlines:

    ^\d{4}-\d{2}-\d{2} (?!.*\bTMTimeZoneManager\b).*(?:\n(?!\d{4}-\d{2}-\d{2} |.*\bTMTimeZoneManager\b).*)*\n?
    

    Regex demo