Search code examples
phphtmlurl-parsingsanitization

PHP - remove http/www from message (except for the host domain) to disable clickable links


I have a simple message board, let's say: mywebsite.com, that allows users to post their messages. Currently the board makes all links clickable, ie. when someone posts something that starts with:

http://, https://, www., http://www., https://www.

then the script automatically makes them as links (ie. adds the A href.. tag).

THE PROBLEM - there is too much spam. So my idea is to automatically remove the above http|s/www so that these don't become 'clickable links.' HOWEVER, I want to allow posters to link to pages within my site, ie. not to remove http|s/www when the message contains link/s to mywebsite.com.

My idea was to create two arrays:

$removeParts = array('http://', 'https://', 'www.', 'http://www.', 'https://www.');

$keepParts = array('http://mywebsite.com', 'http://www.mywebsite.com', 'www.mywebsite.com', 'http://mywebsite.com', 'https://www.mywebsite.com', 'https://mywebsite.com');

but I don't know how to use them correctly (probably str_replace could work somehow).

Below is an example of $message which is before posting and after posting:

$message BEFORE:

Hello world, thanks to http://mywebsite/about I learned a lot. I found you on http://www.bing.com, https://google.com/search and on some www.spamwebsite.com/refid=spammer2.

$message AFTER:

Hello world, thanks to http://mywebsite.com/about I learned a lot. I found you on bing.com, google.com/search and on some spamwebsite.com/refid=spammer2.


Please note the user enters clear text into the post form, so script should only work with this clear text (not a href etc.).


Solution

  • For anyone looking for an answer - I posted a related (more specific) question which solved the problem: PHP - remove words (http|https|www|.com|.net) from string that do not start with specific words