I'm trying to get rid of urls, mentions, hashtags from a tweet to get only the actual text so instead of:
Hello this is a test @someone #tag1 #tag2 http://bit.ly/123
it'd be just:
Hello this is a test
I believe I'd have to use some sort of regular expression but I'm terrible at it, could someone point me in the right direction?
Thanks in advance.
Here's how to do it in three regular expressions (you could probably merge all three in one, but let's not go there!)
$str = preg_replace('/(^|\b)@\S*($|\b)/', '', $str); // remove @someone
$str = preg_replace('/(^|\b)#\S*($|\b)/', '', $str); // remove hashtags
// taken from http://daringfireball.net/2010/07/improved_regex_for_matching_urls
$urlRegex = '~(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))~';
$str = preg_replace($urlRegex, '', $str); // remove urls