Search code examples
regexzend-framework2pcrecharacter-properties

How to exclude characters from a RegEx pattern with category property codes?


There is a number of category property codes (see part "Unicode character properties"), that can be used for a Perl-compatible Regular Expression (PCRE)

I defined a regex pattern (named subpattern), that should match letters (\p{L}), numbers (\p{N}), the space separator (\p{Zs}), but also the punctuation (\p{P}).

(?<sport>[\p{L}\p{N}\p{Zs}\p{P}]*)

Since I'm using that for URL routing, the slashes should be excluded. How can I do that?


EDIT:

Addtitional information about the context: The pattern is used for a route definition in a Zend Framework 2 module.

/Catalog/config/module.config.php

<?php
return array(
    ...
    'router' => array(
        'routes' => array(
            ...
            'sport' => array(
                'type'  => 'MyNamespace\Mvc\Router\Http\UnicodeRegex',
                'options' => array(
                    'regex' => '/catalog/(?<city>[\p{L}\p{Zs}]*)/(?<sport>[\p{L}\p{N}\p{Zs}\p{P}]*)',
                    'defaults' => array(
                        'controller' => 'Catalog\Controller\Catalog',
                        'action'     => 'list-courses',
                    ),
                    'spec'  => '/catalog/%city%/%sport%',
                ),
                'may_terminate' => true,
                'child_routes' => array(
                    'courses' => array(
                    'type'  => 'segment',
                        'options' => array(
                            'route' => '[/page/:page]',
                            'defaults' => array(
                                'controller' => 'Catalog\Controller\Catalog',
                                'action'     => 'list-courses',
                            ),
                        ),
                        'may_terminate' => true,
                    ),
                )
            ),
        ),
    ),
    ...
);

Solution

  • You can use negative look-ahead to exclude some character from your character set. For your example:

    (?<sport>(?:(?!/)[\p{L}\p{N}\p{Zs}\p{P}])*)
    

    Basically, you will check that the next character is not / with negative look-ahead (?!/), before proceeding to check whether that character belongs to the character set [\p{L}\p{N}\p{Zs}\p{P}].

    PCRE doesn't have set intersection or set difference feature, so this is the work-around for that.