By Melissa Eaden
Since the dawn of the internet, we've had broken links. The 404's of the internets and their like are all too easy to come across in everyday searches. In the realm of software testing, one can often struggle to stay ahead of this kind of issue when there are other more important things pending in the testing queue. The hope is that this guide can give some direction on how to set up tests for various types of broken links and how to gauge vulnerability of your app or website for the potential of broken links.
Broken Link Definitions
These are some terms and definitions you'll find throughout this article. You might know these already, but it does set the foundation for examples later.
The 404 error is one of the most recognizable errors on the web. To understand where 404 came from, and why it's associated with broken links, we only have to look at the history of the internet. HTTP/HTTPS codes were created to make it easier to figure out why a page was failing to load elements, scripts or even a whole page. It's the same idea as an application, to have relevant error messages that can give someone a general idea of what the problem might be. For fun, check out this page of 404 error messages to see how they have been customized.
A hyperlink is a link created in one file type to link something to it. These links could be nearly anything like documents or image files.(1) These are pretty common to find on websites, emails and even apps. They can be updated by updating the link itself, or updating the content the link is pointed to. With some hyperlinks, you can tell they are broken by the small icon that can look like this:
Other representations can be found as well. These kinds of links can display this icon when broken or blocked by a security protocol. A right click on the icon lets you know for sure which one it could be.
Another kind of link often used in applications are reference links. These kinds of links are used by developers to easily source libraries of code without having to import the whole library into the application. Many open source projects have these kinds of links. And many open source projects are libraries that developers link to on a regular basis. Examples would be Google fonts, Twitter Bootstrap, and node package manager or npm.
Dead Link, Broken Link and Orphan Pages
A dead link or broken link normally happens when content is deleted, moved, renamed or mistyped(2). Orphan links or pages are those that may contain valid content but the user can't get to it because it wasn't linked to anything or the link was mistyped(2). These are the two most common ailments to links and why users are unable to find content.
Link rot is a condition, also referred to as link death, link breaking or reference rot, in which resources the links are pointing to become permanently unavailable. It can also describe the effects of failing to update web pages with valid links in a timely manner.(3)
A more prevalent kind of 404, and worth mentioning, is when a website or app is unavailable due to an outage. This is assumed to be a temporary condition because events happening for the company or unforeseen acts of nature outside of the company's control. Once the event has passed, usually the website or app becomes available again. While it could be due to many things, and even broken reference links, this article won't focus on outages.
Why Things Break
Often, users assume that things on the internet work by magic. The internet fairy keeps everything together 24/7 and nothing ever really breaks. When in fact, stuff is breaking all the time. We, the development teams, are the stressed out maniacs that try to stay one step ahead of issues that could bring the website down.
Here are some everyday happenings that could result in broken links.
Moving a line can cause problems, think about what happens when you move a whole page or pages of content. Things break. A lot. The process is easier now, but in the early days of the internet, some sites were just abandoned rather than moved because it was just too complicated. With the prevalence of service architecture, much of this has been simplified. However, broken links and orphan pages still happen, these can make your site appear unprofessional or it could tarnish your company's reputation, but are usually easy to identify and fix.
With the current push to use https protocols on even non-sensitive internet pages, pages that weren't already using the secure protocol can find themselves orphaned or broken depending on how the page was setup to resolve https. Some linked content may inadvertently become unavailable because it no longer uses the same protocol as the site it's linked to.
Another common issue while dealing with website migrations is routing procedures. A good methodology and good planning, will allow users to get to the correct page using the previous web address. This method of redirect allows website and network managers to maintain the level of traffic from the old URL while redirecting users to a new site or content. This is often used when broken links are found as well. Instead of giving the customers a 404 page, while the content is being replaced or deprecated, websites can recognize a 404 and redirect the user to the homepage instead, with a simple message saying the page they were trying to reach is unavailable.
This could be anything from changing logos to changing the layout of a whole page, while trying to maintain some of the previous content. Depending on how the site is setup or built, content could be referenced like a library or it could be stylized and built in the application. If a previous style guide is present, and it is deprecated in place of the new style guide, it could lead to style fragmentation, by where the old style guide is still asserting properties because the newer version hasn't been propagated to all of the web applications pages. This can often happen when web apps are using a great deal of micro services.
Depending on the organizational structure, separate teams could be responsible for their own web pages. When another team controls the style guide, much like a micro service, if other pages are not build to be able to handle updates or cannot accept the updates due to dependencies, then the styles can diverge. This could cause content issues or it could even cause the page to be unrenderable. Hopefully, the page is designed so it can fall back to the previous style guide or content, but if not, it can appear broken and unusable.
As glorious as open source projects are, they can and will throw curve balls into the mix. Most open source projects are self-policing and have commit standards the code goes through before it's added to the project. While that might be fine for the project itself, these changes can cause dependency issues in web and mobile applications that use one or more open source libraries.
One of the most recent examples was felt far and wide by developers and their users when a developer on the npm open source project unpublished core functionality. When this happened npm rolled back the event, but not before its effects were felt across the internet. Many companies using the open source library took steps to obtain a previous version for their applications and have the apps reference that version in the event the most current version becomes unreliable. While this can be easy for some companies to do, others don't have the staff or storage space to house a private copy of an open source library. They are at the mercy of project.
In some instances, even using a stable version of an open source project can cause other issues if the version control is not managed and updated in a timely manner. If, for instance, updating to the latest version of Angular or React break other dependencies, you could be left with older versions of the open source library that might not be supported.
Other external dependencies can include but are not limited to: browsers, OS packages, Mobile OS packages, browser extensions and plug-ins, and in some cases even hardware. It's a good idea to have a test plan dealing with version control. It's probably the most predictable issue any web or mobile app can have and it can be the least predictable in how a version change could cause havoc.
How To Break Linked Things
From a testing perspective, figuring out what your application does when something breaks can be really valuable knowledge. This gives you information about what the user would see if a hyperlink goes dead, if a reference link becomes unreliable.
- Does it present an error?
- Is it an ugly error or a neat looking 404 page?
- Does the site use routing and does that routing go where it's supposed to when a link fails to resolve?
- Does the site or app have a fail safe or fall back if a library is deprecated or unstable?
- Does the site have disaster recovery options for server outages?
These are some of the questions which could be answered in your exploratory testing.
Warning: Before you try any of these things at your workplace, either make sure you are using a localized copy of your app or pairing with a programmer to test these methods in a sandbox environment. If you have your own website to play with, then using these ideas there is also a good option.
Altering a URL in a Hyperlink
I'm using my own web page as an example of what happens when you do some of these exercises.
Step 1: Find a hyperlink and the corresponding code block.
Step 2: Alter some part of the link in the code block by removing or adding to the URL.
Play with this a little bit and see where you can remove or add to parts of the URL and still resolve it. Record what you find out for later use and see if you can come up with a plan to keep the issue from occurring.
As an example, if I didn't want to rely on the link to wikipedia, I could find the original location of the content and use that as the alternate reference of the hyperlink. I could also find the original content, download it, set up my own repository and reference the repository. Codepen.io, where I have this webpage actually allows you to upload files if you pay for the premium version. As this page is mostly practice for me, I opted for the free version.
This is what happened when I took the word "House" out of the hyperlink:
As you can see, Wikipedia routes its users to a default page, with the article name it tried to look for in the header. While this isn't exactly what I would want my users to see, this shows an example of a graceful fail. This is also a great example of a redirect or a 301 code.
A little less graceful failure example is removing one of the routing functions of the URL. In this example, I removed this part of the URL: "/wiki/"
The 404 page is very clean and helpfully points me to what I might have been looking for. It's much better than a plain text error which contains no useful information for the user.
Comment Out a Reference Link
Reference links are generally located at the top of any page of code. However, some could be embedded further down the page. It depends on what the application is trying to do with the reference link, if the app has some lazy loading functionality to optimize resources, or maybe the library reference isn't needed until an action occurs on the page.
Example of a reference link:
<link href="https://fonts.googleapis.com/css?family=Denk+One" rel="stylesheet">
This one is specifically for the google font library. I can add the link, then apply the font-family to my style element. This propagates the font-family either to all of the page, or to specific parts depending on where I reference it.
When I comment out the font style reference link, the site's font-style changes:
Codepen.io is nice to the newbie developer and includes a default font style called "Lato". In doing this exercise I realized my original font link didn't work either, because I forgot to reference it in my style element. When I commented out the original font I thought I was using, my site font-style didn't change. If that happens on a bigger project, it could be a good indication that some refactoring needs to happen.
While the two listed above can cover a lot of ground, there are a few other things you could do, with permission, to test how your application handles other kinds of broken links.
Other examples are:
- Creating an orphaned page which has links to the website or app but isn't linked from the actual website or app and see what happens.
- Link one page to another and then take one page offline.
- Hyperlinks generated in auto response emails
- Removing page access to a child or parent directory
Pair up with another tester or a developer type and generate some ideas. Vulnerabilities could be exposed through this kind of testing and it's worth exploring to see what might break.
Methods of Finding Broken Links
One of the easiest ways to find broken links is by enlisting the help of a tool or two to scan a page. The tool usually creates a report and then lets you know what is erroring based on the calls it monitors. There are a lot of tools out there, and some can even be integrated into your Continuous Integration system. There are a list of those tools in the next section.
Don't underestimate the power of an error log. You might have some error checking built into your website or app already. If you have access to those logs, or someone makes copies for teams to look at, then you already have half the job done for you.
There are other methods and tools to find broken links which require more setup and monitoring. It could be using a tool like Splunk, which can be a favorite for a lot of DevOps organizations. Or you could have something more home grown. Ask your nearest DevOps person to find out what it might be. Another option could be Google Analytics(8). GA is a marvel at tracking nearly anything happening with a website once it's wired up correctly(10/11). These kinds of error logging tools can also be set up and monitored by either the development team or DevOps.
Knowing when you are having outages on your site and being able to effectively manage those outages is one of the prime directives of the DevOps team. If you don't have a team that specifically handles these things, then the next person to ask, beside a senior developer, might be your Database administrator (DBA). If you have an extensive data structure, it's very likely they have monitoring active to detect issues. They usually have recorded network errors, such as 404s to an error log table in the database. The table could be queried for the most recent information and that can inform you how to proceed.
Below are some demos for tools that check broken links. Included in this section is a table rating different tools that are available and how well they fit into your daily testing cycles.
This is an online website checker. You input your URL and click check. It also asks for a small validation code as well, to prevent bots or folks from using it for large scale automation.
I found a broken link, but I'm not sure what it's linked to. It did bring up Codepen.io 404 page.
This online webtool has a URL checker and it also has a free account available that runs faster and can also be used online. It also has a version which can be used internally with a CI model via their API. And access to a version that can be used in Slack. It uses a curl command for the CI/Slack calls.
When I submitted my URL on the homepage I found it was taking a long period of time to get a "monkey" for a test report. I'm guessing a large amount of processing happens at night so their API is probably being hit with high volumes or it's just that slow, at least on the homepage. Once I signed up for a free account, the testing process seems to run much faster.
Check My Links - Chrome Extension
This is a really easy one to find in the chrome extension store. I had it loaded up in less than 10 seconds. The first thing it does is direct you to an options page. You can look it over, but unless you are ready to get into the weeds with this particular application, probably best to leave everything with its defaults.
This extension is really handy, sitting in the top right corner of the browser chrome. You load up your site or app and click the button. The best part, is that it appears to be very friendly to use behind a firewall. I ran it on the sandbox site at work, and it gave me a pretty quick response. It might not be able to hit everything though. It doesn't give a detailed report, only an aggregated number and highlighted parts of the webpage in the various indicator colors.
Check My Link Example:
Other Available Tools
Below are more tools, including the ones I've already reviewed, which I've taken a little bit of time to look at and comment on. This is by no way all the tools available, nor is it a comprehensive test analysis of any of these tools. I picked a several criteria to help narrow down what might be useful for any testing done with these tools. I defined "Easy to Use" as an app or tool that could be understood and used by me in less than 30 seconds. If I could or couldn't get an accurate report from inside a firewall from a tool that is online, then I noted that with "Use Behind Firewall". If the tool created a report or offered some kind of report, I based "Easy to Read Report" on how long it took me to figure out where my 404's were, if there were any and what the tool was trying to tell me. "Can Tool be Automated" is completely based on documentation I found with the tool. I didn't have the time to automate any of these, so if some I've noted as not automation friendly, but someone has managed to automate them, please leave a comment and share! "Export Report" I added at the last minute, thinking it might be important for people to have this test artifact to stick into a story management board. If I could get a report to easily make some kind of file, I noted it here.
|Header 1||Header 2||Header 3||Header 4||Header 5||Header 6|
|Application||Easy To Use?||Use Behind Firewall?||Easy To Read Report?||Can Tool Be Automated?||Export Report?|
|deadlinkchecker.com (online tool)||✓||✗||✓||✗||✓|
|monkeytest.it (online tool)||✓||✗||✓||✓||✓|
|Check My Links (Chrome Extension)||✓||✓||✗||✗||✗|
|Domain hunter Plus (Chrome Extension)||✓||✗||✓||✗||✓|
|Screaming Frog (application)||✓||✓||✓||✗||✓|
|W3C's Link Checker (online tool)||✓||✗||✗||✗||✓|
|404checker.com/404-checker (online tool)||✓||✗||✗||✗||✗|
|error404.atomseo.com (online tool)||✓||✓||✓||✗||✓|
|Xenu Link Sleuth (download)||✗||✓||-||-||-|
|Google Webmaster Tools (online tool)||✗||-||-||-||-|
Be wary of some tools out there. I did come across one that used dark patterns, and lots of them, before it would let me download the file. Even then, I decided not to use the tool for fear there might be other things lurking in the application.
The Screaming Frog app has a lot of great information about broken links and crawling websites. The dashboard gives you not only information on your links and redirects, it also has a response time logger for links it finds. I was super impressed with this tool. It didn't appear that the free version of the application has automation, but the paid might. It's still worth visiting their website to read over the info they have there.
I wasn't able to get much information on the last two in the list, but they are worth mentioning. Xenu Link Sleuth might be a useful tool, but the website is from 1995, and it's very hard to read and navigate to useful information. The download instructions are hard to read, and it just doesn't strike me as easy to use since it seems to have a lot of setup involved. Google Webmaster Tools are slightly easier in that you have some more usable choices, it's tied in with a google account and you can use tagging or google analytics. This makes GWT not so easy to use from the outset, but after setup, it could be really nice.
I Found Them, Now What Can I Do With Them?
Depending on your organizational structure you might already have ways of reporting issues like broken links. You might even have priority levels depending on the content the link is referencing. That's great!
If you or your group is looking at some of these things for the first time and building a strategy for testing and reporting broken links, the following are some helpful tips to track and resolve broken link issues.
Severity and Reporting
First, determine the severity based on the content. If it's a link to the checkout cart, that might be a higher severity than a link to a static page explaining the org structure of your company. If you have a list of all the different kinds of hyperlinks in your website or app, grouping them by severity will be helpful for future reference.
If at some point you are faced with reporting a large number of broken links, the best thing to do is to group them by page. If several are located on the same page, create one defect for that page and list the links. Making separate defects for each link can hamper progress and annoy your development team. Note the error the link returns when it does not resolve. Depending on what's happening with the website or app, the error message can point to more accurately to where the link is failing.
Mel's Story Corner
Recently, during a release cycle, a large number of assets were erroring out when clicked. These assets were showing our default error page, with the error message. They also logged the user completely out of the application. It happens if the error is bad enough. I ended up testing large groups of the same assets to figure out which ones were failing.
The error itself, pointed to a duplication of sub-assets in the database which was causing the main asset to fail to resolve. When the sub-assets were updated, some were duplicated by accident. To me, it seemed very random, based on what was failing, hence the large amount of asset testing I had to do. Once the duplicate sub-assets were discovered, the developers determined that removing the duplicates should resolve the issue with the parent assets.
In this instance, the duplicate assets were causing broken links because the parent assets couldn't resolve completely and load without all of the sub-assets resolving as well. The error message helped uncover the issue. Knowing the dependencies of your application is important!
In the story above, I was working closely with the developers. I did create a defect, but I only created one reference defect while testing other assets. Since the error message was the same for all the assets, it would have been a waste of time to create multiple defects.
If you don't have a set policy for handling broken links, ask your team. Collaborate with them. If it's a sprint cycle or a release, the team might handle what you find differently depending on the situation.
Don't Be Afraid Of The Code
If it's a simple fix, and you have access to the code, write a ticket, and then submit a pull request! Don't be afraid of trying to fix something. A broken link is a good first step into learning the pull request and code review process. Plus, it can help dedicated coders work on more critical parts of the application or site, while you pick off the easier tasks. You might just be surprised just how easy it is to fix broken links.
- What's the difference between an orphan link and a broken link?
- Link Rot
- Website Management - Broken Links
- Jargon - Broken Link
- Different Types of Hyperlinks
- Forthea.com - Use of Google Analytics to detect 404s
- Volume Nine Blog - 404 Tools
- Tag Management With GA
About Melissa Eaden
One fateful day in November of 2015, Melissa Eaden attended her first Ministry of Testing, Test Bash in New York. She won a conference ticket by submitting an essay to Richard Bradshaw. She didn't know then that she would later become Rosie's assistant minion, in the cause, which forever grows, taking software quality to whole new levels! By day, a mild mannered dog owner living in Austin, Texas. By night, she happily waits for the next mission of quality excellence for her to accomplish.