I am not sure what is the correct syntax for the url_regex used in WWW::Mechanize.
I am collecting all the links from a web page that start with an http:// and they are of the following format:
http://google.com
and not,
http://google.com/dir/
http://google.com/dir/dir2/
So, I use the following:
@links=$mech->find_all_links(url_regex=>qr/^http:\/\/.*?\//)
And this still captures the URLs with sub paths in them.
I have tested my regex on regexpal.com and it works good. But for some reason, url_regex expects a different syntax.
Thanks.
You should use:
@links=$mech->find_all_links(url_regex=>qr/^http:\/\/[^\/]*\/?$/)
which reads:
String has to start ^
with http://
followed by any combination (even none/empty) of characters others than slash [^\/]*
followed by optional slash \/?
at the end $
.