There is a number of category property codes (see part "Unicode character properties"), that can be used for a Perl-compatible Regular Expression (PCRE)
I defined a regex pattern (named subpattern), that should match letters (\p{L}
), numbers (\p{N}
), the space separator (\p{Zs}
), but also the punctuation (\p{P}
).
(?<sport>[\p{L}\p{N}\p{Zs}\p{P}]*)
Since I'm using that for URL routing, the slashes should be excluded. How can I do that?
EDIT:
Addtitional information about the context: The pattern is used for a route definition in a Zend Framework 2 module.
/Catalog/config/module.config.php
<?php
return array(
...
'router' => array(
'routes' => array(
...
'sport' => array(
'type' => 'MyNamespace\Mvc\Router\Http\UnicodeRegex',
'options' => array(
'regex' => '/catalog/(?<city>[\p{L}\p{Zs}]*)/(?<sport>[\p{L}\p{N}\p{Zs}\p{P}]*)',
'defaults' => array(
'controller' => 'Catalog\Controller\Catalog',
'action' => 'list-courses',
),
'spec' => '/catalog/%city%/%sport%',
),
'may_terminate' => true,
'child_routes' => array(
'courses' => array(
'type' => 'segment',
'options' => array(
'route' => '[/page/:page]',
'defaults' => array(
'controller' => 'Catalog\Controller\Catalog',
'action' => 'list-courses',
),
),
'may_terminate' => true,
),
)
),
),
),
...
);
You can use negative look-ahead to exclude some character from your character set. For your example:
(?<sport>(?:(?!/)[\p{L}\p{N}\p{Zs}\p{P}])*)
Basically, you will check that the next character is not /
with negative look-ahead (?!/)
, before proceeding to check whether that character belongs to the character set [\p{L}\p{N}\p{Zs}\p{P}]
.
PCRE doesn't have set intersection or set difference feature, so this is the work-around for that.