I'm using the PHPCrawl class to spider websites and build a list of links. It all works well, if slowly, and I then use the links to perform other tasks.
I'm encountering a problem where the first time I run the script it completes with no result, then the next time I run it it works as expected. It's failing about 30% of the time.
I thought at first that this was a network or workstation issue, but the same problem occurs on a different machine in a different location using a different ISP.
Has anybody else used this class and encountered the same problem?
After extensive testing I've found that it seems to be related to the streamTimeout setting.
The problem here is that setting it too high results in a very slow crawl. Tinkering with the connectionTimeout seems to mediate this a little.