Hi am feeding context to zend_lucene_search and it can search for the word up to special characters and after that it is not searchable.
for example:
very well to the other job boards � one of the main things that has impressed is the variety of the applications, especially with regards to the background of the candidates" manoj � Head
if I search for 'boards' I can get it but if I search for one or any string after the unreadable characters, I cannot search it.
How to remove these and I want to get plain text.
I got these kind of characters on converting .docx/pdf files to text.
OR
let me know how to feed only text to zend_search_lucene..
Please help.
You can use following preg_replace
function call to remove all non-ASCII (so called special) characters from your string:
$replaced = preg_replace('/[^\x00-\x7F]+/', '', $str);
// produces this converted text:
// "very well to the other job boards one of the main things that has impressed
// is the variety of the applications, especially with regards to the background of the
// candidates" manoj Head"