Search code examples
regexcoldfusioncoldfusion-9

How to exclude "[" character in RegEx


I have a string of mail headers and their values. Unfortunately it comes as a string, and I want to exclude some patterns that are not really mail headers.

Below is what I have:

Return-Path: Received: from out.ipsmtp4nec.opaltelecom.net (out.ipsmtp4nec.opaltelecom.net [62.24.202.76]) by smartermail.divtech.co.za with SMTP; Mon, 6 Jul 2015 12:59:14 +0200 X-SMTPAUTH: [email protected] X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DSrwBOXppVPOPoVl0aAUErgmdUYIMfp3gMBgGBA4IZK4VrAYJ3V4ckhW8EKYEFTQEBAQEBAQcBAQEBQAE/HwEBIAECAoNdAQIMGzMuCgYDAQIPHw4COwoCCAEGCQESCAmICAMWCZFaoGKWHYYdhS6CTR6FCi+BFAWFXAqOLQIBhGGFJ4FfkTmHHYFvAQEIAQEBAQEBgiI+MYJLAQEB X-IPAS-Result: A2DSrwBOXppVPOPoVl0aAUErgmdUYIMfp3gMBgGBA4IZK4VrAYJ3V4ckhW8EKYEFTQEBAQEBAQcBAQEBQAE/HwEBIAECAoNdAQIMGzMuCgYDAQIPHw4COwoCCAEGCQESCAmICAMWCZFaoGKWHYYdhS6CTR6FCi+BFAWFXAqOLQIBhGGFJ4FfkTmHHYFvAQEIAQEBAQEBgiI+MYJLAQEB X-Header: TalkTalk X-IronPort-AV: E=Sophos;i=""5.15,414,1432594800""; d=""scan'208,217"";a=""693647776"" Received: from 93-86-232-227.dynamic.isp.telekom.rs (HELO smtp.tiscali.co.uk) ([93.86.232.227]) by out.ipsmtp4nec.opaltelecom.net with ESMTP; 06 Jul 2015 11:59:04 +0100 Message-ID: From: "jonjon.bracq" To: "Webtickets" , "Webtickets Highlights" , "RYA" , "www jobonyachts com ADMIN" , "RYA InBrief" , "RYA InBrief" , "Webtickets Highlights" , "Webtickets Regional Highlights" , "RYA InBrief" Subject:
=?ISO-8859-1?Q?FW=3AFrom=3Ajonjon.bracq=40yahoo.com?= Date: Thu, 26 Jun 2015 11:59:43 +0000 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_00BE_8320AA74.4FC1860E" X-Priority: 3 X-MSMail-Priority: Normal Importance: Normal X-Mailer: Microsoft Windows Live Mail 16.4.3522.110 X-MIMEOLE: Produced By Microsoft MimeOLE V16.4.3522.110 X-SmarterMail-Spam: SPF_Pass, RHSBL, UCEProtect Level 1, Bayesian Filtering, ISpamAssassin 0 [raw: 0], DK_None, DKIM_None, Custom Rules [] X-SmarterMail-TotalSpamWeight: 12

I want to match all Headers (words followed by ":") excluding raw: which is inside the [] brackets. This is because raw: is part of the value of X-SmarterMail-Spam header (towards the end of the list). I don't want to remove "raw:" manually as there might be other such values in the future.

The expression /(\D[a-z\-]*)(\:)+/ig includes "raw:".

Note: I included \D so that I can exclude time (11:59:43) too, but I can't seem to be able to exclude "raw:". Please help.


Solution

  • raw: is a syntactically valid header name, so you'll have to add context in order to single it out. As its occurrence seems to be a rare exception, I'd suggest not to cater for it in the match but to filter it in subsequent processing.

    However, if you want to keep it in the regexp, rule out the opening bracket and make sure that the complete header string is matched. Beware of using \D to start the regexp as this is way too loose a condition (eg. it would also also match the opening bracket ...):

    ([^\[a-z_0-9\-]|^)([a-z_\-][a-z0-9_\-]*:)/ig
    

    Regex checked at Regex 101 against your sample input.