Search code examples
regexpcre

PCRE Regex: Is it possible to check within only the first X characters of a string for a match


PCRE Regex: Is it possible for Regex to check for a pattern match within only the first X characters of a string, ignoring other parts of the string beyond that point?

My Regex:

I have a Regex:

/\S+V\s*/

This checks the string for non-whitespace characters whoich have a trailing 'V' and then a whitespace character or the end of the string.

This works. For example:

Example A:

 SEBSTI FMDE OPORV AWEN STEM students into STEM 

// Match found in 'OPORV' (correct)

Example B:

 ARKFE SSETE BLMI EDSF BRNT CARFR (name removed) Academy Networking Event 
      
//Match not found (correct).   

Re: The capitalised text each letter and the letters placement has a meaning in the source data. This is followed by generic info for humans to read ("Academy Networking Event", etc.)

My Issue:

It can theoretically occur that sometimes there are names that involve roman numerals such as:

Example C:

 ARKFE SSETE BLME CARFR Academy IV Networking Event 
      
//Match found (incorrect).  

I would like my Regex above to only check the first X characters of the string.

Can this be done in PCRE Regex itself? I can't find any reference to length counting in Regex and I suspect this can't easily be achieved. String lengths are completely arbitary. (We have no control over the source data).

Intention:

/\S+V\s*/{check within first 25 characters only}
 ARKFE SSETE BLME CARFR Academy IV Networking Event 
                         ^
                         \-  Cut off point. Not found so far so stop. 

//Match not found (correct).  

Workaround:

The Regex is in PHP and my current solution is to cut the string in PHP, to only check the first X characters, typically the first 20 characters, but I was curious if there was a way of doing this within the Regex without needing to manipulate the string directly in PHP?

$valueSubstring = substr($coreRow['value'],0,20); /* first 20 characters only */
$virtualCount = preg_match_all('/\S+V\s*/',$valueSubstring); 

Solution

  • You can find your pattern after X chars and skip the whole string, else, match your pattern. So, if X=25:

    ^.{25,}\S+V.*(*SKIP)(*F)|\S+V\s*
    

    See the regex demo. Details:

    • ^.{25,}\S+V.*(*SKIP)(*F) - start of string, 25 or more chars other than line break chars, as many as possible, then one or more non-whitespaces and V, and then the rest of the string, the match is failed and skipped
    • | - or
    • \S+V\s* - match one or more non-whitespaces, V and zero or more whitespace chars.