No longer maintained - see the newer version for Umbraco 7
Diplo LinkChecker (Umbraco 4)
Note: This is an obsolete version for Umbraco 4.
Please check out https://our.umbraco.org/projects/backoffice-extensions/diplo-link-checker for a completely new version for Umbraco 7.
Diplo LinkChecker is a simple HTTP based link checker for Umbraco that adds a new dashboard section to the Content tab. It can be used to find any broken links within an entire Umbraco website. It has been tested against version 4.8 and up and is built against the NET 4.0 framework.
Note: This version (1.5) is for Umbraco 4.8 and above. Install an earlier version for 4.7.2 and earlier.
Features
LinkChecker will check both internal and external links
LinkChecker checks the entire page contents - not just Rich Text areas
LinkChecker will also check all assets, such as the paths to images, included CSS files, JavaScript etc.
Links within each page are checked asynchronously for speed
Once a link has been checked it won’t be checked again if it occurs elsewhere in the site
Pages are batched into manageable chunks (by default 5) so that users are not left waiting for ages for the entire site to be checked
Quick link to edit any page with broken links in it
You can now select the node to start checking from
Advanced options can be configured via an included XML .config file
How It Works
Diplo LinkChecker works in the following manner:
It generates a list of every page in your Umbraco site
It then breaks these down into small batches (by default 5 pages)
It then makes an HTTP request to each page in the batch in turn
It then extracts all the links, images and included assets from the page and makes an HTTP HEAD request to each one in turn
LinkChecker then examines the status code returned by the server to determine whether the link is OK (200), Not Found (404 etc.) or requires a Warning (other status codes)
Caveats
Currently LinkChecker needs to be run manually (ie. you have to push a button to start it) - it is not automated.
Note: No dinosaurs were harmed in the production of this code.
Updates:
1.1 contains a fix where invalid protocols caused a "URI prefix not recognised" exception. This is due to the way the underlying .NET Framework deals with HTTP web-requests.
1.2 deals with non-standard protocols such as "file://" and "ftp://" by showing a warning for those links.
1.2.1 adds extra robustness to deal with unexpected Umbraco errors. Check the umbracoLog for any erros trapped.
1.3 added (experimental) feature to select the start node to start checking from