I am writing some code which will search a website and return the number of search results. A bit like this Number of Google Results from Excel but the site I am using is sciencedirect.com
Science direct is a bit odd in that the URL of the results page does not contain the search term, so finding the URL to send your search term is more complex. I have been reading the source code of the advanced search page and this is the relevant part of the code:
<form name="Form1" method="get" action="/science">
<input type="hidden" name="_ob" value="MiamiSearchURL">
<input type="hidden" name="_method" value="submitForm">
<input type="hidden" name="_acct" value="C000053194">
<input type="hidden" name="_temp" value="all_search.tmpl">
<input type="hidden" name="md5" value="9e299e9289462d7805ab0a5dcc9cff5c">
<input type="hidden" name="test_alid" value="">
<div class="contentMain" style="margin:1px 0 0 1px;"><div class="contentShadow"><div class="contentBorders">
<div class="searchFormBg">
<div style="text-align:right;">
<a href="/science?_ob=HelpURL&_file=stadv_main_all.htm&_acct=C000053194&_version=1&_urlVersion=0&_userid=1495569&md5=77d715b68200140e79de4e6c4228507e" target="sdhelp" onClick="var helpWin; helpWin=window.open('/science?_ob=HelpURL&_file=stadv_main_all.htm&_modifyAlert=Y&_acct=C000053194&_version=1&_urlVersion=0&_userid=1495569&md5=5aa55cdf2635b42c8858e3379a022f8d','sdhelp','scrollbars=yes,resizable=yes,directories=no,toolbar=no,menubar=no,status=no,width=760,height=570'); helpWin.focus()" class="noul icon_qmarkHelpsci_dir">Search tips</a>
</div>
<div>
<a name="Skip Search"></a><label class="searchFormLabel" for="SearchText">Search </label>
<input type="text" class="inputBox" name="SearchText" id="SearchText" value="" size="60" maxlength="256"> in
<select name="keywordOpt" id="keywordOpt" size="1">
<option value="ALL" selected >All Fields</option>
<option value="TITLE-ABSTR-KEY" >Abstract, Title, Keywords</option>
<option value="AUTHORS" >Authors</option>
<option value="SPECIFIC-AUTHOR" >Specific Author</option>
<option value="SRCTITLEPLUS" >Source Title</option>
<option value="TITLE" >Title</option>
<option value="KEYWORDS" >Keywords</option>
<option value="ABSTRACT" >Abstract</option>
<option value="REFERENCES" >References</option>
<option value="ISSN" >ISSN</option>
<option value="ISBN" >ISBN</option>
<option value="AFFILIATION" >Affiliation</option>
<option value="FULL-TEXT" >Full Text</option>
</select>
</div>
<div class="searchFormField">
<select name="addTerm" id="addTerm" size="1">
<option value="0" selected > AND
<option value="1" > OR
<option value="2" > AND NOT
</select>
</div>
<div>
<input type="text" class="inputBox" name="addSearchText" id="addSearchText" value="" size="60" maxlength="256"> in
<select name="addkeywordOpt" id="addkeywordOpt" size="1">
<option value="ALL" selected >All Fields</option>
<option value="TITLE-ABSTR-KEY" >Abstract, Title, Keywords</option>
<option value="AUTHORS" >Authors</option>
<option value="SPECIFIC-AUTHOR" >Specific Author</option>
<option value="SRCTITLEPLUS" >Source Title</option>
<option value="TITLE" >Title</option>
<option value="KEYWORDS" >Keywords</option>
<option value="ABSTRACT" >Abstract</option>
<option value="REFERENCES" >References</option>
<option value="ISSN" >ISSN</option>
<option value="ISBN" >ISBN</option>
<option value="AFFILIATION" >Affiliation</option>
<option value="FULL-TEXT" >Full Text</option>
</select>
</div>
<div style="margin:5px 0;">
<label class="searchFormLabel" for="source">Include</label>
<div style="float:left;"><input style="cursor: pointer;" type="checkbox" id="journals" name="source" value="srcJrl" CHECKED></div><div style="float:left;padding-top:2px;margin-right:5px;" class="astPad"><label for="journals">Journals</label></div>
<div style="float:left;"><input style="cursor: pointer;" type="checkbox" id="books" name="source" value="srcBk" CHECKED></div><div style="float:left;padding-top:2px;margin-right:5px;" class="astPad"><label for="books">All Books</label></div>
</div>
<div style="clear:both;"></div>
<div>
<label class="searchFormLabel" for="Subscribed">Source</label>
<select name="Subscribed" id="Subscribed" size="1" onChange="checkFavoriteJournals(this, 'sources','Y', '');" style="width:200px;">
<option value="0" SELECTED>All sources</option>
<option value="1" >Subscribed sources</option>
<option value="2" >My Favorite sources</option>
</select>
</div>
<div style="clear:both;"></div>
<div>
<label class="searchFormLabel" for="Subject">Subject <span class="SDtxtNoteSmall">(select one or more)</span></label></div>
<div>
<div style="margin-right:10px;display:inline;float:left;"><SELECT Name="srcSel" Multiple Size = "4"><OPTION VALUE="1" SELECTED > - All Sciences -<OPTION VALUE="5"> Agricultural and Biological Sciences<OPTION VALUE="6"> Arts and Humanities<OPTION VALUE="18"> Biochemistry, Genetics and Molecular Biology<OPTION VALUE="7"> Business, Management and Accounting<OPTION VALUE="8"> Chemical Engineering<OPTION VALUE="9"> Chemistry<OPTION VALUE="11"> Computer Science<OPTION VALUE="12"> Decision Sciences<OPTION VALUE="13"> Earth and Planetary Sciences<OPTION VALUE="14"> Economics, Econometrics and Finance<OPTION VALUE="15"> Energy<OPTION VALUE="16"> Engineering<OPTION VALUE="17"> Environmental Science<OPTION VALUE="220"> Immunology and Microbiology<OPTION VALUE="19"> Materials Science<OPTION VALUE="20"> Mathematics<OPTION VALUE="21"> Medicine and Dentistry<OPTION VALUE="22"> Neuroscience<OPTION VALUE="466"> Nursing and Health Professions<OPTION VALUE="23"> Pharmacology, Toxicology and Pharmaceutical Science<OPTION VALUE="24"> Physics and Astronomy<OPTION VALUE="25"> Psychology<OPTION VALUE="26"> Social Sciences<OPTION VALUE="487"> Veterinary Science and Veterinary Medicine</SELECT></div>
<div class="txtSmall" style="display:inline;">Hold down the Ctrl key (or Apple Key) <br>to select multiple entries.</div>
</div>
<div style="clear:both;"></div>
From this I have constructed this URL http://www.sciencedirect.com/science?_ob=MiamiSearchURL&_method=submitForm&_acct=C000053194&_temp=all_search.tmpl&md5=9e299e9289462d7805ab0a5dcc9cff5c&test_alid=&keywordOpt=TITLE-ABSTR-KEY&source=srcJrl=1&source=srcBk=1&Subscribed=0&onchange=Y&srcSel=1&DateOpt=2&SearchText=test
Which works on a &[inputname]=[value]& basis
Which should search for "Test" based on my criteria, unfortunately it returns an error "A source must be selected" the source is set with the Subscribed=0 part of the URL and seems to be working because changing the value changes the source.
The difference between the source and other fields is that source is that it uses not
so my question is how do I change the URL so that it returns results.
It looks like science direct uses a variable that has a unique value for each search (md5=9e299e9289462d7805ab0a5dcc9cff5c in your example). This value changes even if you do the same search.
I haven't done any detailed investigations, but if they do some checks on their side (such as checking cookies, ip address etc.) it can be difficult, if not impossible to do.