I am trying to take the URLs that are in single HTML paragraphs and extract them with PHP's preg_replace_callback
. Right now, WordPress does this with:
preg_replace_callback( '|^\s*(https?://[^\s"]+)\s*$|im', 'callback_function', $string );
But that matches a URL on it's own line -- no HTML around it. What I need to do is to match the URL from something like this:
<p>http://youtube.com/</p>
I don't care about the space before or after the paragraph tag, all I want to do is extract that URL to replace it with more detailed information with preg_replace_callback
.
Any help out there?
UPDATE: Okay, I have a post's text wit a number of paragraphs like this:
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis et nunc vel felis vulputate tincidunt. In dapibus tempus sollicitudin. Nullam quis ultricies tortor. Ut malesuada aliquet enim. Aliquam et lobortis urna. Sed commodo malesuada malesuada. Donec cursus nisi nec mauris venenatis pharetra. Curabitur ut leo purus.</p>
<p>http://youtube.com/</p>
<p>Etiam non odio tellus, vel imperdiet nunc. Praesent rutrum sagittis purus, quis pretium eros varius ut. http://google.com/ Ut id orci eu lacus aliquam luctus. Sed dolor quam, suscipit eu dapibus feugiat, lacinia vitae augue.</p>
From that text, all I want to extract is that http://youtube.com/ in the paragraph on its own. I see there is a Google.com link in another paragraph, but I don't want that. All I want is that link (or links) in their own paragraph alone. It would pass to my callback 'http://youtube.com/' as the argument.
You could try this: http://regex101.com/r/rN4vB3
/<p>\s*(https?:\/\/(?:(?!<\/?p>).)+)\s*<\/p>/
The logic is that we look for a <p>
tag that starts with http
, and then just get everything else in there until we hit a </p>
. The first backreference will hold the URL.
This might not be an optimal solution, but should do what you asked.