I need to extract Twitter ids for a PHP script using regex. It works great as long as the URLs is coded with double quotes...
<a href='http://www.twitter.com/singlequotes'>Twitter Single Quotes</a>
<a href="http://www.twitter.com/doublequotes">Twitter Double Quotes</a>
// regular expression
/<a [^>]*\bhref\s*=\s*"\K[^"]*twitter.com[^"]*/
I have tried using "|'
, ["']
and many other things, that are not working. Would be very thankful, if you could help me with this. Thanks!
This is as fast as you can go. No capture group is needed.
href=['"]\K[^'"]+
Look for a single or double quote after href=
then match everything that isn't a single or double quote. That is as simple as it can be made.
p.s. If you are concerned with spaces near the =
then use:
href *= *['"]\K[^'"]+
PHP Implementation (PHP Demo):
$in='<a href=\'http://www.twitter.com/singlequotes\'>Twitter Single Quotes</a>
<a href="http://www.facebook.com/doublequotes">Twitter Double Quotes</a>
<a href=\'http://twitter.com/singlequotes\'>Twitter Single Quotes</a>
<a href="https://www.facebook.com/doublequotes">Twitter Double Quotes</a>';
$companies=['twitter','facebook'];
$out=preg_match_all('/href *= *[\'"]\Khttps?:\/\/(?:www\.)?(?:'.implode('|',$companies).')\.com[^\'"]+/',$in,$out)?$out[0]:[];
var_export($out);