Hidden Services in TOR
Tor is a service run by a network of volunteers to allow people to use internet anonymously. Normally tor is used to browse the web without being tracked or identified.
One less known feature of the tor service is the ability to provide what is known in tor as hidden services. Hidden services are basically servers that provide services through the tor network. When you think about tor the first thing you think of is anonymous web browsing. However, for hacktivists and dissidents it is very useful not only to be able to browse the web without being identified, but also providing web pages for people in a way that such webpages can not be tracked or shutdown easily.
In the tor network there are thousands of ‘hidden services’ accessible only for people using the tor network, providing access to forbidden information about very different topics. Those sites have a hidden DNS address with the .onion tld, for example example.onion. Sites ending in .onion can not be easily tracked or shutdown, and the owner can not be easily identified.
One of the most complex things about setting up a hidden service, is configuring the web server in a way that doesn’t leak information about the real IP address of the server, or the country location etc. The more complex the site, the more difficult it becomes to setup a real hidden service that doesn’t leak service information in any way.
During the last years, the F.B.I. has been able to identify and shutdown certain hidden services, using social engineering, information leaks and browser vulnerabilities. The most famous example is The Silk Road, a well known black market hidden inside tor, used for selling drugs and similar stuff.
Of course, the administrators behind hidden services try its best to not leak any information about the physical location of the server providing the service, or any other information that could lead to the identification of the owner of the hidden service.
Leaking the timezone
The HTTP protocol allows the client to inform the server about its compression capabilities. If the client and server share support for a specific compression format, the server can decide to compress the http response in order to save bandwidth and time. All major web servers and browsers support compression. The most common formats used for HTTP compression are gzip and deflate.
Gzip is a compression format that allows relative fast data compression with decent compression ratios.
As a compression format, gzip specifies a data header to be included in the resulting compressed data, this header includes information about the compressed data, the operating system that compressed the data, and most importantly: the date when the data was compressed, in theory in universal time (UTC).
The header is as follows, as you can see in Foreniscs Wiki:
|0||2||0x1f 0x8b||Magic number to idenitfy gzip streams|
So, if this header is present in any gzip compressed data, we can make a gzip compressed request to any webserver, wait for the gzip compressed response, check if the bytes starts with 0x1f 0x8b, and check for the compression date to know the exact date configured at the server that serves the page.
With normal webservers, this is only useful in a very limited scenarios, because the geopraphical position of the server is not hidden in any way, and can be known easily knowing the server IP address, that is not hidden at all. However, in a Hidden Service, the information about the server timezone can be very useful to identify the possible countries where the server is running.
The GZIP specification clearly states that universal time should be used instead of local time for the MTIME header field. However, I have found lots of sites sending local times instead of universal times. It seems that maybe the flaw is in Microsoft Windows, but further investigation is needed to clarify which implementations are not following the specification and are leaking the local time.
This, of course, its NOT a TOR fault and its not a bug in the tor protocol and IS NOT a problem with the GZIP spec, but with certain implementations. Its just a obscure feature of the gzip format that has ben wrongly implemented by some vendors, and made available in the HTTP Protocol by default in most web servers.
The good news is that lots of webservers are preconfigured to fill the date field of the gzip header with ‘0’s, maybe because of performance issues, who knows. After some research, I found that around 10% of the webservers leak the remote date when compressing HTTP Responses with gzip, and only some of the servers that includes the remote date in the headers fails to use UTC instead of local time.
Clock Skew identification
Even the implementations that are sending the universal time instead of the local time, in other words, even the correct implementations that are not filling the MTIME with zeros, but sending the correct universal time are prone to identification through clock skew attacks as you can read in the previous work by Murdoch, 2006
However, in this scenario the universal time provided in correct gzip implementations is just another side channel to mount the attack
Proof Of Concept
I have developed a little php script that uses curl (command line) to get the remote server date if available in the gzip compressed HTTP Response. It will only work in web server that allows for compression of HTTP Responses, and fills the ‘date’ field of the gzip header with the correct date instead of zeroes.
I have tested it with some servers, an example of servers where a date is sent in the gzip header are instagram.com, reddit.com and bing.com. In this example reddit.com and instagram.com are sending universal times, as the specification states. bing.com is sending local times.
Of course, because of privacy concerns, I’m not going to provide information on which hidden services are leaking the remote date.
Examples of use:
user@localhost:~$ php time.php bing.com The server that processed the request on: bing.com has local date set to: Sunday 21st of February 2016 01:21:21 PM user@localhost:~$ php time.php reddit.com The server that processed the request on: reddit.com has local date set to: Sunday 21st of February 2016 09:21:25 PM user@localhost:~$ php time.php instagram.com The server that processed the request on: instagram.com has local date set to: Sunday 21st of February 2016 09:21:30 PM user@localhost:~$
In this example all three servers are including times in the gzip headers, but reddit.com and instagram.com are providing universal times, while bing.com is providing local times.
The Proof of concept is available here:
GZIP in tor itself
the TOR protocol itself uses gzip for some of its communications, however this issue was already known and taken into account when developing tor, as stated by Tim Wilson-Brown in the tor-onions mailing list.
TOR itself does not suffer from this issue, even though it uses gzip compression internally to compress directory documents. Hidden services and clients do not produce or recompress directory documents, so they could never be affected. And tor authorities use deflateInit2 to initialise compression for votes and consensuses, which zeroes the gzip header. From the deflateInit2 documentation in zlib.h:
"windowBits can also be greater than 15 for optional gzip encoding. Add 16 to windowBits to write a simple gzip header and trailer around the compressed data instead of a zlib wrapper. The gzip header will have no file name, no extra data, no comment, no modification time (set to zero), no header crc, and the operating system will be set to 255 (unknown). If a gzip stream is being written, strm->adler is a crc32 instead of an adler32."
You can see the entire conversation about this in the tor-onions mailing lists
From the moment I found this potential issue I was affraid that this could be affecting the privacy of tor users even in remote ways. It has been a bit complicated to understand why this was happening and why while the gzip specification clearly states that the time should be universal, some servers where sending local times instead. Even with the confusion of early sharing this findings I believe that has been more constructive to openly discuss this potential issue than to keep it secret while I try to understand better the impact. I believe that the most reponsible thing was to contact the onion tor mailing list, like I did, and to diffuse this article to raise concerns and get help understanding if this could be an issue.
Thanks to HDM, brlewis and Henryk Plotz for joining the discussion and providing aditional information regarding the issue and helping clarify the potential impact it could have.
Last updated at: 2/22/2016 8:50:16 PM UTC. Corrected some mistakes and added more information provided in the comments