I'm trying to download expression data for DNA sequences. On the page, the graph (a png image) is always the 6th, 7th, or 8th image on the page, but I do not want to download 2 extra images every time.
Inspecting the image on the page yields <img src="../trash/hgc/gtexGene_genome_6d0b_5d5220.png" border="1">
, though those last few numbers in the link to the image change every time.
In my code, I have
my $image = $mech1->find_image( alt_regex => qr/gtexGene/i );;
$mech1->get($image -> URI);
$mech1->save_content("exp.png");
which is not working.
How can I download the image given only what some of the contents of its link are?
You are using alt_regex
, which is doing a pattern match on the alt
attribute. What you want is the src
attribute, so you need to use url_regex
instead.
url => 'string',
andurl_regex => qr/regex/,
Matches the URL of the image against string or regex, as appropriate. The URL may be a relative URL, like foo/bar.html, depending on how it's coded on the page.
So your code should read like this.
my $image = $mech->find_image( url_regex => qr/gtexGene/i );
Only use the /i
modifier for case-insensitivity if you really want it to be case-insensitive.