-
Link Checking - Keeping it all together
The problems with an ever changing web site is that it allows errors to be introduced which can affect the user experience - the most common of which are broken links which may either mean that images don’t display or worst still that a page doesn’t display at all. Using a content management system helps significantly in this respect as this would typically look after all your internal links, but there is always the possiblity that some manually managed links e.g. to other websites can become broken and effectively give end users a negative impression of your site.
I use a wonderful little tool called Xenu Link Sleuth for checking links on sites - both as a quality check prior to deployment and as an occasional health check. You just tell it what the URL of your site is and off it goes and quickly (and I mean quickly) checks all the links on all the pages for dud links. You can then either view the results and filter them in tabular format, or create a report which will tell you which links are broken and importantly, which page has the link to that resource.
Visit the Xenu LinkSleuth page and download
Automated link checking is a great way of ensuring that your links are up to date. It is not a pure substitute for human checking though e.g. if you have a link on your site that says contact us but goes to your investor relations section, then it is unlikely that an automated link checker will flag this up as an issue. Beware also lapsed domain names when linking to external sites - if web site owners don’t keep up their domain name (this happens quite often with small sites) then they will typically lose control and the site either reverts to the domain registration company or be snapped up by a domain holding company. Under such circumstances an automated link checker probably wouldn’t notice the change as a page will still be served - only a manual check will be able to do this.






Using a link trawler is ok, but a much better way of handling broken links is to ensure that every link is actually served through a template first. The template can then check the link before serving it, and if there is a problem, tell the user. This way, the user can then be told more information about the error e.g. classic 404 (missing page) or if the server is under too much load or whatever (by examining the results.) I do this on all my websites and this check can be implemented in any just about any development language.
I also run daily scheduled task to look at all my external links (which are created through a custom-built CMS) and then remove and inform the administrator when there is a problem with them. All common sense stuff really….