I try to access and use different pages in NCBI such as
http://www.ncbi.nlm.nih.gov/nuccore/NM_000036
However, when I used perl's LWP::Simple 'get' function, I do not get the same output I get when I save the page manually (with the firefox browser 'save as html' option). What I do get from the 'get' function lacks the data I require.
Am I doing something wrong? Should I use another tool?
My script is :
use strict;
use warnings;
use LWP::Simple;
my $input_name='GENES.txt';
open (INPUT, $input_name ) || die "unable to open $input_name";
open (OUTPUT,'>', 'Selected_Genes')|| die;
my $line;
while ($line = <INPUT>)
{
chomp $line;
print OUTPUT '>'.$line."\n";
my $URL='http://www.ncbi.nlm.nih.gov/nuccore/'.$line;
#e.g:
#$URL=http://www.ncbi.nlm.nih.gov/nuccore/NM_000036
my $text=gets($URL);
print $text."\n";
$text=~m!\r?\n\r?\s+\/translation="((?:(?:[^"])\r?\n?\r?)*)"!;
print OUTPUT $1."\n";
}
Thanks in advance!
Content you're searching is generated by JavaScript. You need to parse your HTML (from the first response) and find ID for the data you want:
<meta name="ncbi_uidlist" content="289547499" />
Next you need to make another request to the URL in the form: http://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?val=ID_YOU_HAVE
Something like this (untested!): my $URL='http://www.ncbi.nlm.nih.gov/nuccore/'.$line;
my $html=gets($URL);
my ($id) = $html =~m{name="ncbi_uidlist" \s+ content="([^"]+)"}xi;
if ($id) {
$html=gets( "http://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?val=" . $id );
$text=~m!\r?\n\r?\s+\/translation="((?:(?:[^"])\r?\n?\r?)*)"!;
print OUTPUT $1."\n";
}