Search code examples
phpregexkohana

regular expression for route parameter of URL


I'm not great with regular expressions, thats why I need your help. look here http://kohanaframework.org/3.3/guide/kohana/routing#examples:

Route::set('search', ':<query>', array('query' => '.*'))
  ->defaults(array(
    'controller' => 'Search',
    'action' => 'index',
  ));

this regular expression (.*) excepts all parameters what I need:
"cat1/cat2/cat3"
but also:
"cat1/cat 2/ cat3",
"cat1/cat 2/ /// a |<>"?\':*"
How to modify this expression to disallow:
1. any kind of spaces ( "\s" )
2. more then one slash together ( 'cat1/cat2' but not 'cat1/////cat2')
3. and each symbol of range : [ "|", "<", ">" , "\"", "?", "\", "'", ":", "*" ]

Thanks for everyone who try to help me

define('CATEGORIES_RGXP', '(?:[^|<>\\?"\':*\s]+\/?)+');
Route::set('debug_route', '(<categories>/)<file>.<ext>',array(
    'categories'    => CATEGORIES_RGXP,
))
    ->defaults(array(
        'controller'        =>  'index',
        'action'            =>  'file',
    ));

Dump in controller when i follow "/cat1/cat2/////cat3/file.php": var_dump($this->request->param());

array(3) {
  ["categories"]=>
  string(14) "cat1/cat2/cat3"
  ["file"]=>
  string(4) "file"
  ["ext"]=>
  string(3) "php"
}

so it allow to pass a group of few slashes


Solution

  • the . matches every character (except new line) which explains the observed behaviour

    Instead, we'll use the negated character class ie [^X] which means "match everything but X"

    According to your requirements, you should use then:

    ^((?:[^|<>\\\/?"':*\s]+\/?)+)$
    

    DEMO

      NODE                     EXPLANATION
    --------------------------------------------------------------------------------
      ^                        the beginning of the string
    --------------------------------------------------------------------------------
      (                        group and capture to \1:
    --------------------------------------------------------------------------------
        (?:                      group, but do not capture (1 or more
                                 times (matching the most amount
                                 possible)):
    --------------------------------------------------------------------------------
          [^|<>\\\/?"':*\s         any character except: '|', '<', '>',
          ]+                       '\\', '\/', '?', '"', ''', ':', '*',
                                   whitespace (\n, \r, \t, \f, and " ")
                                   (1 or more times (matching the most
                                   amount possible))
    --------------------------------------------------------------------------------
          \/?                      '/' (optional (matching the most
                                   amount possible))
    --------------------------------------------------------------------------------
        )+                       end of grouping
    --------------------------------------------------------------------------------
      )                        end of \1
    --------------------------------------------------------------------------------
      $                        before an optional \n, and the end of the
                               string