Search code examples
web-crawlerelasticsearch-5stormcrawler

Can I configure storm crawler to add the host url to the front of the url route during crawling?


I want to crawl the urls like this which are not having host in front of it.

<div class=pro-info>
    <a href="/being-human-mens-solid-polo-t-shirt/p-202971521">
</div

Can I add the host part of url in front of these urls using a configuration file in stormcrawler?


Solution

  • The URL will be made absolute during the parsing. There shouldn't be anything special to do to get the full URLs.