We had an issue yesterday that prevented gsa crawler from loging in to our website to crawl. Because of this many of the URLs are indexed as the login page. I see a lot of results on the search page titled "Please log in" (title of the login page). Also when I check Index Diagnostics the crawl status for these URLs are "Retrying URL: Connection reset by peer during fetch.
".
Now the login problem is resolved and once a page is re-crawled the crawl status goes to successful and it is picking up the page content and the search results show up with the proper title.. But since I cannot control what is being crawled there are pages that still haven't been re-crawled and still have the problem.
There is not a uniform URL that I can force a re-crawl. Hence my question:
Is there a way to force a re-crawl based on the crawl status ("Retrying URL: Connection reset by peer during fetch.
")? If that is to specific how about a re-crawl based on crawl status type (Errors/Successful/Excluded
)?
Export all the error url as csv file using "Index> Diagnostics > Index Diagnostics"
Open CSV and apply filter on crawl status colum and get urls having the error you are looking for.
Copy those urls and goto "Content Sources > Web Crawl > Freshness Tuning>Recrawl these URL Patterns" and paste and click on Recrawl
That's it. You are done!
PS: If error urls are more (>10000,If I am not wrong), you may not be able to get all of them in a single csv file. In that case you can do in batches.
Regards,
Mohan