I'm building a site that uses tweets from Twitters public timeline.
http://twitter.com/statuses/public_timeline.xml
I don't want tweets in Chinese, Russian, etc. I want everything but the tweets that are written in symbols.
Here is an example of what I don't want: スポーツブランドPR、マーケティング。2児の母。好きなもの:ユニコーン、着物、駅伝。
I've tried mb_detect_encoding UTF8 but that isn't working.
All the encoding is the same, the english posts are in UTF-8 too ;)
There are two options, either find a solution from the Twitter API that you can filter English only posts.
Or you can use a regex and a loop to filter the posts with non-roman/latin chars in them.
preg_match('/[^\00-\255]+/u', $post);
Hope this helps,
Niko