Search code examples
c++regex-group

C++ regex capture is dropping last char in email validator


C++ Shell Online Execution Link: http://cpp.sh/5z2uq

I am writing a regex to validate an email ID which can have multiple dots and plus characters in its local name and can only have one dot in the domain name.

The problem I'm facing now is in capture group. My domain name capture, i.e. group #2 is working as expected, as seen in the output. But, when I try to capture local name i.e. group #1,

it is not supposed to capture anything past the '+' sign not including '+', and after capturing local name, output has a missing last character.

Please take a look at my C++ regex code:

#include <iostream>
#include <regex>
using namespace std;
int main()
{
    string str;
    vector<string> emails = {
            "local@domain.com",
            "local.constant@domain.com",
            "local+addon@domain.com",
            "local.constant+addon@domain.com",
            "local@domain.c.o.m"
        };

    for(auto ele : emails)
    {
        str = ele;
        
        regex e("([\\w+\\.]+)\\+*[\\+\\w]+\\@([\\w]+\\.[\\w]+)$");
        smatch parts;
        bool match = regex_match(str,parts,e);
        
        if(match==true)
        {
            cout << "Local  : " << parts.str(1) << endl;
            cout << "Domain : " << parts.str(2) << endl;
            cout << "Valid Email ID: " << ele << endl << endl;
        }
        else
        {
            cout << "Invalid Email ID: " << ele << endl << endl;
        }
    }

    return 0;
}

Output:

Local : loca
Domain : domain.com
Valid Email ID: local@domain.com

Local : local.constan
Domain : domain.com
Valid Email ID: local.constant@domain.com

Local : local+addo
Domain : domain.com
Valid Email ID: local+addon@domain.com

Local : local.constant+addo
Domain : domain.com
Valid Email ID: local.constant+addon@domain.com

Invalid Email ID: local@domain.c.o.m

Notice how, in the local variable, my regex group capture is dropping the last character.

Questions:

  1. How do I group capture till the '+' sign
  2. How do I make the group capture not drop the last character?

Solution

  • You can use this expression:

    "([\\w.]+)(?:\\+[\\w]+)*\\@([\\w]+\\.[\\w]+)$"
    

    The first part ([\\w.]+) matches the Local part (i.e. any word character or dot)
    The second part (?:\\+[\\w]+)* denotes a non-capturing group repeated 0 or more times (matching a plus sign folowed by one or more word characters).
    The third part \\@ matches the @ character.
    The last part ([\\w]+\\.[\\w]+) matches the Domain part (i.e. two words separated with one dot), which you got right.