Search code examples
regexrubyregex-lookaroundsregex-groupregex-greedy

RegEx for matching 3 alphabets and 1-2 digits


I am trying to write a regular expression to find a match in a text having at least 100 characters. The match should be like this - Any sub string within a string that contains at least 3 alphabet to begin with, at least 1 digit following it and a maximum of 2 digits following the 3 letters.

Examples -

  1. abcjkhklfdpdn24hjkk - In this case I want to extract pdn24

  2. hjdksfkpdf1lkjk - In this case I want to extract pdf1

  3. hjgjdkspdg34kjfs dhj khk678jkfhlds1 - In this case I want both pdg34 and lds1

How do I write a regex for this ? The length of the starting letters for a match is always 3 and the digits length can be either 1 or 2 (not more not less)

This is what works if there are 2 digits after the 3 letter string.

[A-Za-z]{3}[0-9]{2}

But the length of the digits can vary between 1 and 2. How do I include the varying length in the regex?


Solution

  • The expression we wish to design is quite interesting. We can first add your original expression with a slight modification in a capturing group, then we should think of left and right boundaries around it. For instance, on the right we might want to use \D:

    ([A-Za-z]{3}[0-9]{1,2})\D
    

    DEMO 1

    We can surely define an exact restricted expression. However, this might just work.


    Based on Cary Swoveland's advice, we can also use this expression, which is much better:

    \p{L}{3}\d{1,2}(?!\d)
    

    enter image description here

    Test

    re = /([A-Za-z]{3}[0-9]{1,2})\D/m
    str = 'abcjkhklfdpdn24hjkk
    hjdksfkpdf1lkjk
    hjgjdkspdg34kjfs dhj khk678jkfhlds1 '
    
    # Print the match result
    str.scan(re) do |match|
        puts match.to_s
    end
    

    This script shows how the capturing group works:

    const regex = /([A-Za-z]{3}[0-9]{1,2})\D/gm;
    const str = `abcjkhklfdpdn24hjkk
    hjdksfkpdf1lkjk
    hjgjdkspdg34kjfs dhj khk678jkfhlds1 `;
    let m;
    
    while ((m = regex.exec(str)) !== null) {
        // This is necessary to avoid infinite loops with zero-width matches
        if (m.index === regex.lastIndex) {
            regex.lastIndex++;
        }
        
        // The result can be accessed through the `m`-variable.
        m.forEach((match, groupIndex) => {
            console.log(`Found match, group ${groupIndex}: ${match}`);
        });
    }