regex regex-lookarounds regex-group vala regex-greedy

RegEx for capturing vcard groups in Perl

I have been studying syntax and semantics this semester on university, and regex often plays part of this. As a way of excercising I have found different scenarios in which regex could be applied. Considering VCards to be one of these, I've been quite unable to specify something to group everything between the BEGIN:VCARD and END:VCARD

please notice, .vcf files use line separation

My best pattern for this looks like so: (though I've tried many variations

BEGIN:VCARD\n([^(END:VCARD)\n]*END:VCARD

so the idea is: "From begin vcard read all that is not END:VCARD, and which ends with a linebreak, until end vcard is encountered"

I'm using the perl variant, but working with the vala programming language.

I realise the problem is my pattern, but after a long time of reading, and trial and error, I'm still not quite certain why the tester shows it as not working.

Test data:

BEGIN:VCARD
VERSION:3.0
N:Doe;John;;;
FN:John Doe
ORG:Example.com Inc.;
TITLE:Imaginary test person
EMAIL;type=INTERNET;type=WORK;type=pref:johnDoe@example.org
TEL;type=WORK;type=pref:+1 617 555 1212
TEL;type=WORK:+1 (617) 555-1234
TEL;type=CELL:+1 781 555 1212
TEL;type=HOME:+1 202 555 1212
NOTE:John Doe has a long and varied history\, being documented on more police files that anyone else. Reports of his death are alas numerous.
CATEGORIES:Work,Test group
X-ABUID:5AD380FD-B2DE-4261-BA99-DE1D1DB52FBE\:ABPerson
END:VCARD
BEGIN:VCARD
VERSION:3.0
N:Doe;Jane;;;
FN:Jane Doe
ORG:Example.com Inc.;
TITLE:Another Imaginary test person
EMAIL;type=INTERNET;type=WORK;type=pref:johnDoe@example.org
TEL;type=WORK;type=pref:+1 617 555 1213
TEL;type=WORK:+1 (617) 555-1233
TEL;type=CELL:+1 781 555 1213
TEL;type=HOME:+1 202 555 1213
NOTE:Jane Doe has a long and varied history\, being documented on more police files that anyone else. Reports of her death are alas numerous.
CATEGORIES:Work,Test group
X-ABUID:5AD380FD-B2DE-4261-BA99-DE1D1DB52FBE\:ABPerson
END:VCARD

In my most successful test it marks everything from the first BEGIN:VCARD to the line just before END:VCARD

Solution

This expression might help you to do that:

(BEGIN:VCARD([\s\S]*?)END:VCARD)

Perl Test:

use strict;

my $str = 'BEGIN:VCARD
VERSION:3.0
N:Doe;John;;;
FN:John Doe
ORG:Example.com Inc.;
TITLE:Imaginary test person
EMAIL;type=INTERNET;type=WORK;type=pref:johnDoe@example.org
TEL;type=WORK;type=pref:+1 617 555 1212
TEL;type=WORK:+1 (617) 555-1234
TEL;type=CELL:+1 781 555 1212
TEL;type=HOME:+1 202 555 1212
NOTE:John Doe has a long and varied history\\, being documented on more police files that anyone else. Reports of his death are alas numerous.
CATEGORIES:Work,Test group
X-ABUID:5AD380FD-B2DE-4261-BA99-DE1D1DB52FBE\\:ABPerson
END:VCARD
BEGIN:VCARD
VERSION:3.0
N:Doe;Jane;;;
FN:Jane Doe
ORG:Example.com Inc.;
TITLE:Another Imaginary test person
EMAIL;type=INTERNET;type=WORK;type=pref:johnDoe@example.org
TEL;type=WORK;type=pref:+1 617 555 1213
TEL;type=WORK:+1 (617) 555-1233
TEL;type=CELL:+1 781 555 1213
TEL;type=HOME:+1 202 555 1213
NOTE:Jane Doe has a long and varied history\\, being documented on more police files that anyone else. Reports of her death are alas numerous.
CATEGORIES:Work,Test group
X-ABUID:5AD380FD-B2DE-4261-BA99-DE1D1DB52FBE\\:ABPerson
END:VCARD';
my $regex = qr/(BEGIN:VCARD([\s\S]*?)END:VCARD)/mp;

if ( $str =~ /$regex/g ) {
  print "Whole match is ${^MATCH} and its start/end positions can be obtained via \$-[0] and \$+[0]\n";
  # print "Capture Group 1 is $1 and its start/end positions can be obtained via \$-[1] and \$+[1]\n";
  # print "Capture Group 2 is $2 ... and so on\n";
}

# ${^POSTMATCH} and ${^PREMATCH} are also available with the use of '/p'
# Named capture groups can be called via $+{name}

RegEx

If this wasn't your desired expression, you can modify/change your expressions in regex101.com.

RegEx Circuit

You can also visualize your expressions in jex.im: