Search code examples
phpregexreplacepreg-replacesubstr

Truncate a URL while ignoring DIV tags


We use the following code to display a value in the output of a Wordpress site page. However, occasionally the output is too long to fit within the box we've set for it so we'd like to truncate it.

    $markup = str_replace('%%', get_post_meta($post_id, '_sf_submission_field_'.get_the_ID(), true), htmlspecialchars_decode(get_post_meta(get_the_ID(), 'markup', true)));

    $text = preg_replace('#(script|about|applet|activex|chrome):#is', "\\1:", $markup);
    $ret = ' ' . $text;
    $ret = preg_replace("#(^|[\n ])([\w]+?://[\w\#$%&~/.\-;:=,?@\[\]+]*)#is", "\\1<a href=\"\\2\" target=\"_blank\" rel=\"nofollow\">\\2</a>", $ret);
    $ret = preg_replace("#(^|[\n ])((www|ftp)\.[\w\#$%&~/.\-;:=,?@\[\]+]*)#is", "\\1<a href=\"http://\\2\" target=\"_blank\" rel=\"nofollow\">\\2</a>", $ret);
    $ret = preg_replace("#(^|[\n ])([a-z0-9&\-_.]+?)@([\w\-]+\.([\w\-\.]+\.)*[\w]+)#i", "\\1<a href=\"mailto:\\2@\\3\">\\2@\\3</a>", $ret);
    $ret = substr($ret, 1);

    echo $ret;

Using substr as follows $ret = substr($ret, 0, 30); is would be great, however, part of the input string has styling div tags and other text which cannot be truncated. So my question is how can I truncate JUST the part of the string that has a URL in it... and in turn not truncate the href itself as it still needs to be a clickable link.

Here is a sample input string: <i class="icon-twitter-squared"></i> http://www.stackoverflow.com/reallylongurl

...I'd like only the http://www.stackoverflow.com/reallylongurl to be truncated to www.stackoverfl... for example - it needs to remain clickable as the original untruncated URL.

Many thanks for your suggestions!


Solution

  • Update: To get the link that is not part of href and also as you asked in the comment you can use this regex:

    (?<!href=")https?://(.{9}).*?/\w+
    

    Working demo

    enter image description here