Search code examples
phpregexzend-frameworkzend-validatenon-ascii-characters

zend_validate_regex not happy with accented characters


so I'm using Zend_Framework and I need to validate a text and accept not only digits and normal letters, but also some guys like 'ã', 'ç' and so on...

I was confident that a simple regex validation would do the job:

    public function SetTitle($title) 
    {
      $validator = new Zend_Validate_Regex('/^[0-9a-zA-ZÀ-ú]+[0-9A-Za-zÀ-ú\'\-\.:,; ]{1,50}$/');

      if ($validator->isValid($title)) {
        if ($this->title != $title) {
            $this->title = $title;
        }
      } else {
        throw new MyApp_Projects_ProjectException("This ($title) is not a valid title.");
      }

} //SetTitle

and it really worked when, after some thinking reported below, I test something like this:

public function testIfCanAttributeTitleToProject()
{            
    $someTitle = "some title with ç, á and ã";
    $this->project->SetTitle($someTitle);
    $this->assertEquals($this->project->getTitle(), $someTitle);
}

But, when I try to add a validator to check data at the form, like this:

    $title = new Zend_Form_Element_Text('title');
    $title->setLabel('Nome:')
        ->setOptions(array('size' => '50'))
        ->setRequired(true)
        ->addValidator('Regex', false, array(
            'pattern' => "/^[0-9a-zA-ZÀ-ú]+[0-9A-Za-zÀ-ú\'\-\.,: ]{1,50}$/"
            ))
        ->addFilter('HtmlEntities')
        ->addFilter('StringTrim');
    // attach elements to form
    $this->addElement($title);

a error is raised when I try to test

public function testUserCanUseAccentedCharacters() {

   $form = new MyApp_Form_ProjectCreate();
   $formData = array(
       'title' => 'we scream to weird chars like ã é or ç',
       'submit' => true
       );
    $form->process($formData);
}

where process function is like:

public function process($data) 
{
    if ($this->isValid($data) !== true) {
        throw new MyApp_Form_ProjectCreateException('Invalid data!');
    } else {
        $db = Zend_Registry::get('db');
        $projectMapper = new MyApp_Projects_ProjectMapper($db);        
        $project = new MyApp_Projects_Project();
        $project->SetTitle($this->title->GetValue());
        $projectMapper->insert($project);
    }
}

I have already checked and retested the regular expression in other contexts and it seems OK, but for some reason, even as Zend_Validate itself works with this expression, a validator inside a form element doesn't accept anything inside the À-ú range...

For sure I'm (still) losing something basic here... or banging my head against a wall when I have some better way around...

Do someone may help me, please?

TIA, again... :)


Solution

  • '/^[0-9a-zA-ZÀ-ú]+[0-9A-Za-zÀ-ú\'\-\. ]{1,50}$/'
    

    embeds a single quote. Will this work for you?

    "/^[0-9a-zA-ZÀ-ú]+[0-9A-Za-zÀ-ú\'\-\. ]{1,50}$/"
    

    Update

    Three more things to try. I don't know the details of Zend's implementation of regular expressions, so I don't know whether the first two will work.

    The Unicode Letter property:

    "/^([0-9]\p{Letter})+([0-9\'\-\. ]\p{Letter}){1,50}$/"
    

    The Posix character class:

    "/^([0-9][[:alpha:]])+([0-9\'\-\. ][[:alpha:]]){1,50}$/"
    

    Brute force enumeration of the letters you care about:

    "/^[0-9a-zA-ZÀÁÂ ...et cetera... øùú]+[0-9A-Za-zÀÁÂ ...et cetera... øùú\'\-\. ]{1,50}$/"