Search code examples
perltemplate-toolkit

Extract email from string using Template Tookit


I'm guessing this is relatively simple, but I can't find the answer.

From a string such as '"John Doe" <[email protected]>' - how can I extract the email portion from it using Template Tookit?

An example string to parse is this:

$VAR1 = { 
    'date' => '2021-03-25',
    'time' => '03:58:18',
    'href' => 'https://example.com',
    'from' => '[email protected] on behalf of Caroline <[email protected]>',
    'bytes' => 13620,
    'pmail' => '[email protected]',
    'sender' => '[email protected]',
    'subject' => 'Some Email Subject'
};

My code, based on @dave-cross help below where $VAR1 is the output of dumper.dump(item.from)

[% text = item.from -%]
[% IF (matches = text.match('(.*?)(\s)?+<(.*?)>')) -%]
<td>[% matches.1 %]</td>
[% ELSE -%]
<td>[% text %]</td>
[% END %]

However, it's still not matching against $VAR1


Solution

  • This does what you want, but it's pretty fragile and this really isn't the kind of thing that you should be doing in TT code. You should either get the data parsed outside of the template and passed into variables, or you should pass in a parsing subroutine that can be called from inside the template.

    But, having given you the caveats, if you still insist this is what you want to do, then this is how you might do it:

    In test.tt:

    [% text = '"John Doe" <[email protected]>';
       matches = text.match('"(.*?)"\s+<(.*?)>');
       IF matches -%]
    Name: [% matches.0 %]
    Email: [% matches.1 %]
    [% ELSE -%]
    No match found
    [% END -%]
    

    Then, testing using tpage:

    $ tpage test.tt
    Name: John Doe
    Email: [email protected]
    

    But I cannot emphasise enough that you should not be doing it like this.

    Update: I've used this test template to investigate your further problem.

    [% item = { from => '"John Doe" <[email protected]>' };
       text = item.from -%]
    [% IF (matches = text.match('(.*?)(\s)?+<(.*?)>')) -%]
    <td>[% matches.1 %]</td>
    [% ELSE -%]
    <td>[% text %]</td>
    [% END %]
    

    And running it, I get this:

    $ tpage test2.tt
    <td> </td>
    

    That's what I'd expect to see for a match. You're printing matches.1. That's the second item from the matches array. And the second match group is (\s). So I'm getting the space between the name and the opening angle bracket.

    You probably don't want that whitespace match in your matches array, so I'd remove the parentheses around it, to make the regex (.*?)\s*<(.*?)> (note that \s* is a simpler way to say "zero or more whitespace characters").

    You can now use matches.0 to get the name and matches.1 to get the email address.

    Oh, and there's no need to copy items.from into text. You can call the matches vmethod on any scalar variable, so it's probably simpler to just use:

    [% matches = item.from.match(...) -%]
    

    Did I mention that this is all a really terrible idea? :-)

    Update2:

    This is all going to be far easier if you give me complete, runnable code examples in the same way that I am doing for you. Any time I have to edit something in order to get an example running, we run the risk that I'm guessing incorrectly how your code works.

    But, bearing that in mind, here's my latest test template:

    [% item = {
        'date' => '2021-03-25',
        'time' => '03:58:18',
        'href' => 'https://example.com',
        'from' => '[email protected] on behalf of Caroline <[email protected]>',
        'bytes' => 13620,
        'pmail' => '[email protected]',
        'sender' => '[email protected]',
        'subject' => 'Some Email Subject'
    };
       text = item.from -%]
    [% IF (matches = text.match('(.*?)(\s)?<(.*?)>')) -%]
    <td>[% matches.2 %]</td>
    [% ELSE -%]
    <td>[% text %]</td>
    [% END %]
    

    I've changed the definition of item to have your full example. I've left the regex as it was before my suggestions. And (because I haven't changed the regex) I've changed the output to print matches.2 instead of matches.1.

    And here's what happens:

    $ tpage test3.tt
    <td>[email protected]</td>
    

    So it works.

    If yours doesn't work, then you need to identify the differences between my (working) code and your (non-working) code. I'm happy to help you identify those differences, but you have to give my your non-working example in order for me to do that.

    Update3:

    Again I've tried to incorporate the changes that you're talking about. But again, I've had to guess at stuff because you're not sharing complete runnable examples. And again, my code works as expected.

    [% USE dumper -%]
    [% item = {
        'date' => '2021-03-25',
        'time' => '03:58:18',
        'href' => 'https://example.com',
        'from' => '[email protected] on behalf of Caroline <[email protected]>',
        'bytes' => 13620,
        'pmail' => '[email protected]',
        'sender' => '[email protected]',
        'subject' => 'Some Email Subject'
    };
     -%]
    [% matches = item.from.match('(.*?)(\s)?<(.*?)>') -%]
    [% dumper.dump(matches) %]
    

    And testing it:

    $ tpage test4.tt
    $VAR1 = [
              '[email protected] on behalf of Caroline',
              ' ',
              '[email protected]'
            ];
    

    So that works. If you want any more help, then send a complete runnable example. If you don't do that, I won't be able to help you any more.