Search code examples
web-scrapingpostcoldfusioncfhttp

Please help: How can I scrape this web page?


There's a site that offers a search service. You enter a number, search, and it returns results. What I want to do is run that search programmatically through coldfusion instead of having to go to the site and search manually.

This is what the form in the web page I'd like to read/scrape looks like (as seen when viewing the page source):

<form id="frmNumID" name="frmNum" action="" method="post">

    <TABLE border=0 cellPadding=0 cellSpacing=0>
     <TR>
      <TD align="center">
         <label class="NumLabel" for="Num" ACCESSKEY="1">ENTER NUM:</label>
        <input class="NumInput" id="Num" name="inputNum"  onfocusin="select()"  title="Num Input" tabindex="1" type="text" value=""  size ="29" maxlength="17" >&nbsp;&nbsp;

      </TD>

      <TD align="center">
         <input class="NumInput" title="Submit Num" tabindex="2" type="image" src="/include/pics/SubmitBtn.jpg" value="submit" ACCESSKEY="2">
      </TD>
     </TR>
     </TABLE>

     <TABLE border=0 cellPadding=0 cellSpacing=0>
     <TR>    
      <TD colspan="2" align="center">

        <input type="radio" name="displayType" value="NONE"   Checked  />No Pictures&nbsp;&nbsp;                          
        <input type="radio" name="displayType" value="STUFF"    /> Other Stuff&nbsp;&nbsp;                
        <input type="radio" name="displayType" value="MORESTUFF"    /> More Other Stuff  
      </TD>
     </TR>

    </TABLE>
    <div id="NUMMsg"></div>

  </form>

The only field I really care about is the Num input field. I want to post a value to that field, run the search, and get the results in my coldfusion code. This is what I have so far:

<cfhttp url="http://www.someurl.com/"
        method="POST">
    <cfhttpparam name="Num" type="FormField" value="123456789123456" />
</cfhttp>
<cfdump var="#cfhttp.filecontent#" />

But when I go to the page the dump just says "Connection Failure". What am I doing wrong?


Solution

  • Ok, this website suggested a solution: http://australiansearchengine.wordpress.com/2009/09/28/cfhttp-connection-failure/

    They suggested adding the following cfhttpparam tags:

    <cfhttpparam type="header" name="accept-encoding" value="deflate;q=0">
    <cfhttpparam type="header" name="te" value="deflate;q=0"> 
    

    Now I no longer get a connection failure :)