Search code examples
matchlwp

Perl LWP Find links in page that contain a specific word


I am really stuck. I am using LWP. I desire to push specific links from an HTML document into an array. But:

while ($edocument =~ m/href\s*=\s*"([^"\s]+)"/gi) {
#dostuff
}

Will process all the links. I just want the links that have the word 'test' in the url.

I have tried all kinds of combinations like. (too many attempts to list)

  while ($edocument =~ m/href\s*=\s*"([^"\s*test*]+)"/gi) {

I have been reading and reading and I really need a clue for this embarrassing situation.

Can someone help?

In addition, I only need ONE match of the word test per $edocument as well. Kind of like last I guess in a loop.

Also tried variations of

@links = $edocument =~ m/<a[^>]+href\s*=\s*["']?([^"'> ]+)/ig;

Then ran @links through a unique sub. But still, just need links with the word 'test'.


Solution

  • What about the following regexp:

    while ($edocument =~ m/href\s*=\s*"([^"\s]+test[^"\s]+)"/gi) { #dostuff }
    

    This regexp only matches urls with substring test in it.