Search code examples
regexregex-lookaroundsregex-groupvalaregex-greedy

RegEx for capturing vcard groups in Perl


I have been studying syntax and semantics this semester on university, and regex often plays part of this. As a way of excercising I have found different scenarios in which regex could be applied. Considering VCards to be one of these, I've been quite unable to specify something to group everything between the BEGIN:VCARD and END:VCARD

please notice, .vcf files use line separation

My best pattern for this looks like so: (though I've tried many variations

BEGIN:VCARD\n([^(END:VCARD)\n]*END:VCARD

so the idea is: "From begin vcard read all that is not END:VCARD, and which ends with a linebreak, until end vcard is encountered"

I'm using the perl variant, but working with the vala programming language.

I realise the problem is my pattern, but after a long time of reading, and trial and error, I'm still not quite certain why the tester shows it as not working.

Test data:

BEGIN:VCARD
VERSION:3.0
N:Doe;John;;;
FN:John Doe
ORG:Example.com Inc.;
TITLE:Imaginary test person
EMAIL;type=INTERNET;type=WORK;type=pref:[email protected]
TEL;type=WORK;type=pref:+1 617 555 1212
TEL;type=WORK:+1 (617) 555-1234
TEL;type=CELL:+1 781 555 1212
TEL;type=HOME:+1 202 555 1212
NOTE:John Doe has a long and varied history\, being documented on more police files that anyone else. Reports of his death are alas numerous.
CATEGORIES:Work,Test group
X-ABUID:5AD380FD-B2DE-4261-BA99-DE1D1DB52FBE\:ABPerson
END:VCARD
BEGIN:VCARD
VERSION:3.0
N:Doe;Jane;;;
FN:Jane Doe
ORG:Example.com Inc.;
TITLE:Another Imaginary test person
EMAIL;type=INTERNET;type=WORK;type=pref:[email protected]
TEL;type=WORK;type=pref:+1 617 555 1213
TEL;type=WORK:+1 (617) 555-1233
TEL;type=CELL:+1 781 555 1213
TEL;type=HOME:+1 202 555 1213
NOTE:Jane Doe has a long and varied history\, being documented on more police files that anyone else. Reports of her death are alas numerous.
CATEGORIES:Work,Test group
X-ABUID:5AD380FD-B2DE-4261-BA99-DE1D1DB52FBE\:ABPerson
END:VCARD

In my most successful test it marks everything from the first BEGIN:VCARD to the line just before END:VCARD


Solution

  • This expression might help you to do that:

    (BEGIN:VCARD([\s\S]*?)END:VCARD)
    

    Perl Test:

    use strict;
    
    my $str = 'BEGIN:VCARD
    VERSION:3.0
    N:Doe;John;;;
    FN:John Doe
    ORG:Example.com Inc.;
    TITLE:Imaginary test person
    EMAIL;type=INTERNET;type=WORK;type=pref:[email protected]
    TEL;type=WORK;type=pref:+1 617 555 1212
    TEL;type=WORK:+1 (617) 555-1234
    TEL;type=CELL:+1 781 555 1212
    TEL;type=HOME:+1 202 555 1212
    NOTE:John Doe has a long and varied history\\, being documented on more police files that anyone else. Reports of his death are alas numerous.
    CATEGORIES:Work,Test group
    X-ABUID:5AD380FD-B2DE-4261-BA99-DE1D1DB52FBE\\:ABPerson
    END:VCARD
    BEGIN:VCARD
    VERSION:3.0
    N:Doe;Jane;;;
    FN:Jane Doe
    ORG:Example.com Inc.;
    TITLE:Another Imaginary test person
    EMAIL;type=INTERNET;type=WORK;type=pref:[email protected]
    TEL;type=WORK;type=pref:+1 617 555 1213
    TEL;type=WORK:+1 (617) 555-1233
    TEL;type=CELL:+1 781 555 1213
    TEL;type=HOME:+1 202 555 1213
    NOTE:Jane Doe has a long and varied history\\, being documented on more police files that anyone else. Reports of her death are alas numerous.
    CATEGORIES:Work,Test group
    X-ABUID:5AD380FD-B2DE-4261-BA99-DE1D1DB52FBE\\:ABPerson
    END:VCARD';
    my $regex = qr/(BEGIN:VCARD([\s\S]*?)END:VCARD)/mp;
    
    if ( $str =~ /$regex/g ) {
      print "Whole match is ${^MATCH} and its start/end positions can be obtained via \$-[0] and \$+[0]\n";
      # print "Capture Group 1 is $1 and its start/end positions can be obtained via \$-[1] and \$+[1]\n";
      # print "Capture Group 2 is $2 ... and so on\n";
    }
    
    # ${^POSTMATCH} and ${^PREMATCH} are also available with the use of '/p'
    # Named capture groups can be called via $+{name}
    

    RegEx

    If this wasn't your desired expression, you can modify/change your expressions in regex101.com.

    RegEx Circuit

    You can also visualize your expressions in jex.im:

    enter image description here