Search code examples
content-management-systemadobeaemjcrday-cq

Find broken links (internal) for all authored links (components input & straight html) in Adobe CQ5.5 SP 2.1


We ran into an issue after installing SP2.1 on CQ5.5 which effects references update for all pages under a page that has been renamed using the "websites" console of CQ5. The issue is described here:

http://blogs.adobe.com/dmcmahon/2012/12/13/cq5-5-sp2-1-linksreferences-are-not-updated-following-moverename/

The hotfix fixes future page name changes and updates the references in all other pages , whether the links are authored as html directly or through input widgets such as "pathfields".

However , we have discovered this bug pretty late and there have been lot of page re-naming done which resulted in broken links on existing pages where we have used pathfield component in dialog boxes for authors to refer to other pages. I would like to write some custom code using the LinkChecker api under the com.day.cq.rewriter.linkchecker package. I am not able to find any sample code that CQ5 actually uses to perform the "reference updates" on page renames , to serve as a starting point.

I need inputs based on your experience , whether Linkchecker API is the best way forward or if there is some other API for checking all the authored links and generate a report on which links / pathfields have broken links .

Help appreciated.

I have checked: 1. the external link checker tool, which does report broken links, but only if the link is to some other external domain, so not useful in our case.


Solution

  • Linkchecker is a Sling rewriter. Rewriters are strictly associated with the request. They operate on the HTML code generated by the CQ before it's returned to the client. If I understand correctly, you want to look for broken internal links in the whole site and the Linkchecker won't be very useful here.

    Consider using Groovy console to crawl over the /content/your_site looking for strings starting with /content. Then use resourceResolver to check if the found path exists. Sample script implementing this algorithm can be found here.