Author: Marc Liberatore Email: liberato@cs.umass.edu Date: 15 August 2011 (This is a [markdown] formatted file, best rendered with [multimarkdown].) [markdown]: http://daringfireball.net/projects/markdown/ [multimarkdown]: http://fletcherpenney.net/multimarkdown/ # Overview This file describes the measurement code and results: sha1sum filename 47776cca495dec1e592168df2d9aa72a137f3a13 optack-code.tar.bz2 457175985caacfa7726162e12458154686468841 optack-traces.tar.xz as referenced in the following paper: @inproceedings{Prusty:2011, Author = { Swagatika Prusty and Brian Neil Levine and Marc Liberatore}, Booktitle = {Proc. ACM Conference on Computer & Communications Security (CCS)}, Keywords = {forensics; peer-to-peer; anonymity}, Month = {October}, Sponsors = {CNS-1018615}, Title = {{Forensic Investigation of the OneSwarm Anonymous Filesharing System}}, Url = {http://forensics.umass.edu/pubs/prusty.ccs.2011.pdf}, Year = {2011} } This file should contain sufficient information (along with the associated programs and scripts) to reproduce the process of these measurements. If you use these data, programs, or scripts in your own published research, please refer to them by citing the above paper. # Measurement Parameters There are three main independent variables in our measurements: The type of path (direct or proxied); the path itself (that is, which hosts are along it); and the type of download performed (a well-behaved download through `wget`, or through our implementation of an optimistically ACKing client. The terminology we use to refer to hosts is as follows: - The *initiator* host, which runs `wget` and our attack program to retrieve a file (not at the same time). The initiator host corresponds to the investigator, who aims to determine if another host possesses a file or is merely relaying it. - The *adjacent* host, which runs answers requests for a file either through a locally running web server, or by proxying the request to a third host. The adjacent host corresponds to a OneSwarm peer being investigated as the source of a file. - The *distant* host, which answers requests for a file through a web server. The distant host corresponds to a OneSwarm peer-of-the-peer being investigated. We use as our target file an 10MB file from the centos distribution, located at `centos/5.6/isos/x86_64` in the mirror's hierarchy: ls -l -rw-r--r-- 1 guest guest 10921984 2011-05-04 08:30 CentOS-5.6-x86_64-netinstall.iso md5sum CentOS-5.6-x86_64-netinstall.iso 02cf3a5e32aaa5eed27af775ad292beb CentOS-5.6-x86_64-netinstall.iso There are also dependent variables we measure: - RTTs (through `ping`) between all pairs of hosts; - Paths (through `traceroute`, `tcptraceroute`, or `tracepath`) between initiator and adjacent hosts, and adjacent and distant hosts (where possible, depending upon permissions at hosts); and - Throughput (through logs and tcpdumped packet header traces collected at the initiator) # Requirements / one-time setup ## On the initiator host The measurement script requires that you have a recent version of Python 2 installed. The script use pexpect 2.4, which is included. You must be able to compile the attack program, which requires a C++ compiler and libpcap. Use `make` to build the program. Additionally, you must have root access, to run the attack program, to run `tcpdump`, and to configure `iptables` on a linux-based initiator. If you are using a Mac OS X host as the initiator, be sure to turn "stealth mode" on in the firewall configuration; the scripts handle the appropriate `iptables` rules on Linux. Tou may also need to tweak the scripts to use `traceroute` rather than `tracepath`. Other OSs will require similar tweaks. You must have `tracepath` (part of `iputils`) installed else you won't get route information. You must also have `tcpdump` installed. ## On the adjacent host The adjacent host must also be reachable by `ssh`. Also, you must have `tracepath` (part of `iputils`) installed, as above. Then, there are two steps you need to take on the adjacent host to prepare it for measurement: setting up a proxy, and setting up a web server. ### Proxy setup First, verify you can proxy data using netcat (`nc`). You'll use `nc` and a named pipe to act as a transparent tcp proxy. The named pipe is necessary to carry responses back to the originator, as shell pipes are unidirectional. See for details. First make the named pipe. This step need only be done once. mkfifo backpipe Then start the netcat proxy. This process will exit when the remote host closes its connections. nc -l 8081 0backpipe Now, from another machine, attempt to retrieve a file from `mirror.mojohost.com` using the adjacent host as your ip address. For example, if the adjacent host was `boston.cs.ucsb.edu`, you could try: wget http://boston.cs.ucsb.edu:8081/centos/timestamp.txt If the file is retrieved successfully, and the `nc` process exits, then the proxying setup was successful. Note that there are several different implementations of `nc` floating around; you may need to tweak the scripts to deal with your particular version of `nc`'s command line options. ### Web server setup You can use [lighttpd], pronounced "lighty", to server files; the scripts are written assuming you go this route. [lighttpd]:http://www.lighttpd.net/ To compile, you may need to disable certain features depending upon the host's setup. We're only using it to serve static files, so a minimal configuration works fine. ./configure --without-pcre --without-bzip2 --prefix=$HOME/lighttpd-install && make -j2 && make install worked on our hosts. The `lighttpd` binary will be in `$HOME/lighttpd/sbin` in this case. Next, create a directory `httpd/centos/5.6/isos/x86_64` somewhere appropriate, and put the file located at there. Then create a simple configuration file `lighttpd.conf`. The following template should work (modify the document-root appropriately): server.document-root = "/path/to/created/httpd/directory/" server.max-keep-alive-idle = 15 server.max-read-idle = 15 server.max-write-idle = 15 server.port = 8080 mimetype.assign = ( ".html" => "text/html", ".txt" => "text/plain", ".jpg" => "image/jpeg", ".png" => "image/png", ".iso" => "application/octet-stream" ) *NOTE*: You've now hardcoded the web server to be on port 8080; the measurement script will not change this regardless of the value set in the script itself. Verify your configuration file is syntactically valid: lighttpd -t -f lighttpd.conf Start `lighttpd` with the no-background flag `-D` so that it can be cleanly killed: lighttpd -D -f lighttpd.conf Verify you can retrieve the file (again, assuming you're using `boston.cs.ucsb.edu`) using `wget` from a different machine: wget http://boston.cs.ucsb.edu:8080/centos/5.6/isos/x86_64/CentOS-5.6-x86_64-netinstall.iso Then kill lighttpd with SIGINT (CTRL-C at the terminal). You've finished web browser setup. ## The distant host Presumably the distant host is a third-party web server. Make sure you can retrive target file from this host, and you're done. # Configuring and Running the Experiments In our experiments, we used three sets of hosts, as follows: Valid initiators: * prismslab.cs.umass.edu * isectestbed.uta.edu * kurtz.cs.wesleyan.edu Valid adjacent hosts: * prismslab.cs.umass.edu * isectestbed.uta.edu * kurtz.cs.wesleyan.edu * boston.cs.ucsb.edu Valid distant hosts: * mirror.hmc.edu * mirror.mojohost.com * mirrors.cmich.edu Most of the work is done by the sole class, `MeasurementConfiguration`, defined in the measurement script. It is a fairly straightforward translation of a manual measurement process into pexpect. The `all_measurements()` function performs the measurements. You'll need your own hosts, accounts, and passwords to actually run the script. # Measurement Results Results are captured into local directories, named --- Within these directories, files with the following names are created: ---[-]-. along with clearly-named ping and tracepath output, these files are the output of the measurement processes: the `.pcap` files are tcpdump-created packet header logs, and the `.log` files are the output from `wget` or the "attack" (that is, optimistic ACK) program. ## Interpreting the Results Generally, the output is self-explanatory. Notably, the measurement script times the total execution time of sub-processes and appends it to the `.log` files. It also appends `TIMEOUT` when the entire subprocess takes longer than a preset time (120s by default) to complete. When preparing the manuscript, we used the maximum bandwidth the attack program reported generating at the sender; we used the bandwidth reported by `wget`, except in a few of the cases where isectestbed.uta.edu was the initiator, and where `wget` bandwidth was quite low; in these cases we took the maximum reported before it timed out.