Search code examples
perlwww-mechanize

WWW::Mechanize: Download a specific image


I'm trying to download expression data for DNA sequences. On the page, the graph (a png image) is always the 6th, 7th, or 8th image on the page, but I do not want to download 2 extra images every time.

Inspecting the image on the page yields <img src="../trash/hgc/gtexGene_genome_6d0b_5d5220.png" border="1">, though those last few numbers in the link to the image change every time.

In my code, I have

my $image = $mech1->find_image( alt_regex => qr/gtexGene/i );;
$mech1->get($image -> URI);
$mech1->save_content("exp.png");

which is not working.

How can I download the image given only what some of the contents of its link are?


Solution

  • You are using alt_regex, which is doing a pattern match on the alt attribute. What you want is the src attribute, so you need to use url_regex instead.

    url => 'string', and url_regex => qr/regex/,

    Matches the URL of the image against string or regex, as appropriate. The URL may be a relative URL, like foo/bar.html, depending on how it's coded on the page.

    So your code should read like this.

    my $image = $mech->find_image( url_regex => qr/gtexGene/i );
    

    Only use the /i modifier for case-insensitivity if you really want it to be case-insensitive.