Error 404, or Not Found, is a standard HTTP protocol status code. It indicates that the client can communicate with the server, but that the server has not found what was requested or has been configured so that it cannot complete the request. Error 404 should not be confused with “server not found” or similar errors, where the connection to the server just cannot be established.
Overview
When communicating over HTTP, a server must respond to a request, such as a Web browser request for a Web page, with a numeric response code and an optional, required, or disallowed message (based on the status code). In code 404, the first digit indicates a client error, such as an incorrectly typed Uniform Resource Locator (URL). The following two digits indicate the specific error you are experiencing. HTTP’s use of three-digit codes is similar to using such codes in older protocols such as FTP and NNTP. At the HTTP level, a 404 response code is followed by a human-readable “reason phrase”. The HTTP specification suggests the phrase “Not found,” and many web servers by default emit an HTML page that includes both the 404 code and the phrase “Not found. “
A 404 error is often returned when pages have been moved or deleted. In the first case, it is best to use URL mapping or URL redirection by returning a 301 Moved Permanently response, which can be configured in most server configuration files, or by rewriting the URL; in the second case a 410 Gone must be returned. Because these two options require special server configuration, most websites don’t use them.
404 errors should not be confused with DNS errors, which appear when the URL provided refers to a server name that does not exist. A 404 error indicates that the server itself was found, but that the server was unable to retrieve the requested page.
Soft 404 Errors
Some websites report an “not found” error by returning a standard web page with a “200 OK” response code, incorrectly reporting that the page has loaded successfully; this is known as soft 404. The term “soft 404” was introduced in 2004 by Ziv Bar-Yossef et al.
The 404 softs are problematic for automated methods of finding out if a link is broken. Some search engines, such as Yahoo and Google, use automated processes to detect soft 404s. They can occur as a result of configuration errors when using certain HTTP server software, for example with Apache HTTP Server software, when a 404 error document (specified in an .htaccess file) is specified as an absolute path rather than a relative path (/error.html). This can also be done on purpose to force some browsers (such as the outdated Internet Explorer) to display a custom 404 error message instead of replacing what comes with a browser-specific “descriptive” error message (in Internet Explorer, now obsolete, this behavior was triggered when a 404 is offered and the HTML received is shorter by a certain length and can be manually disabled by the user).
There are also “soft 3XX” errors where content is returned with a status of 200 but comes from a redirected page, such as when missing pages are redirected to the domain’s home page/root.
Proxy server
Some proxy servers generate a 404 error when a 500 range error code would be more correct. If the proxy server is unable to fulfill a request for a page due to a problem with the remote host (such as host name resolution errors or rejected TCP connections), this should be described as an internal 5xx server error, but could instead provide a 404. This can confuse programs that expect and act on specific responses, since they can no longer easily distinguish between an absent web server and a missing web page on a present web server.
404 intentional
In July 2004, British telecom provider BT Group implemented the Cleanfeed content blocking system, which returns a 404 error to any request for content identified as potentially illegal by the Internet Watch Foundation. Other ISPs return a “prohibited” HTTP 403 error under the same circumstances. The practice of using fake 404 errors as a means of hiding censorship has also been reported in Thailand and Tunisia. In Tunisia, where censorship was severe before the 2011 revolution, people realized the nature of false 404 errors and created a fictional character called “Ammar 404” representing “the invisible censor. “
Microsoft Internet Server 404 Secondary State Error Codes
The web server software developed by Microsoft, Internet Information Services (IIS), returns a series of secondary status codes with its 404 responses. Secondary status codes take the form of decimal numbers added to the 404 status code. Secondary status codes are not officially recognized by IANA and are not returned by non-Microsoft servers.
Secondary status codes
Microsoft’s IIS 7. 0, IIS 7. 5, and IIS 8. 0 servers define the following HTTP secondary status codes to indicate a more specific cause of a 404 error:
- 404. 0 – Not found.
- 404. 1 – Site not found.
- 404. 2 – ISAPI or CGI restriction.
- 404. 3 – RESTRICTION of the MIME type.
- 404. 4 – No manager configured.
- 404. 5 – Denied by filter configuration request.
- 404. 6 – Verb denied.
- 404. 7 – File extension denied.
- 404. 8 – Hidden namespace.
- 404. 9 – Hidden file attribute.
- 404. 10 – Request header too long.
- 404. 11 – The request contains a double escape sequence.
- 404. 12 – The request contains high-bit characters.
- 404. 13 – Content length too large.
- 404. 14 – Request URL too long.
- 404. 15 – Query string too long.
- 404. 16 – DAV request sent to static file manager.
- 404. 17 – Dynamic content mapped to the static file manager using a wildcard MIME mapping.
- 404. 18 – Query string sequence denied.
- 404. 19 – Denied by the filtering rule.
- 404. 20 – Too many URL segments.
Custom error pages
Web servers can typically be configured to display a custom 404 error page, including a more natural description, the branding of the parent site, and sometimes a site map, search form, or 404-page widget.
The protocol-level phrase, which is hidden from the user, is rarely customized. Internet Explorer, however, did not display custom pages unless they were larger than 512 bytes, choosing instead to display a “simple” error page. Another problem is that if the page does not provide a favicon and there is a separate custom 404 page, additional traffic and longer load times will be generated on each page view.
Many organizations use 404 error pages as an opportunity to inject humor into what might otherwise be a serious website. For example, Metro UK shows a polar bear on a skateboard, and web development agency Left Logic has a simple drawing program. During the 2015 British election campaign, major political parties used their 404 pages to target political opponents or show relevant policies to potential supporters.
While many websites send additional information in a 404 error message, such as a link to a website’s home page or a search box, some also try to find the correct web page that the user wants. To do this, extensions are available for some content management systems (CMS).
Charitable initiatives
NotFound. org (in collaboration with Telefono Azzurro, Missing Children Europe, Famous and Amazon), collecting reports of missing children in the European Union, has launched an initiative for the personalization and use for solidarity purposes of the 404 error page. By installing an application, through the integration of an iframe in the 404 error page, it shows a banner with the photo and description of a missing child.
Monitoring 404 Errors
There are a number of tools that scan a website to find pages that return 404 status codes. These tools can be useful for finding links that exist within a particular website. The limitation of these tools is that they only find links within a particular website and ignore the 404s that result from links on other websites. As a result, these tools lose 83% of the 404 on websites. One way to work around this problem is to find 404 errors by analyzing external links.
One of the most effective ways to find out 404 errors is to use Google Search Console, Google Analytics, or the scanning software.
Another common method is tracking traffic to 404 pages using log file analysis. This can be useful to understand more about what 404s users have achieved on the site. Another method to monitor traffic to 404 pages is by using JavaScript-based traffic monitoring tools.
References (sources)
|
