CPE 401/601 Computer Communication Networks

Spring 2009

Network Lab 2 : Proxy HTTP Server

Due on Tuesday, Mar 3 at 12:00 pm


Your assignment is to write a proxy HTTP Server in C or C++ that is capable of some simple filtering based on server domain name. Your proxy server must be able to handle GET, HEAD and POST requests (it can refuse to process any request that specifies any other HTTP request method) sent by a client speaking HTTP version 1.0 or 1.1. Your proxy should work with any HTTP client (browser), we will be testing it with a custom HTTP client written just to torture your proxy server.

NOTE: It is not acceptable to a submit the code for a proxy server you find on the WWW (or anywhere). It is acceptable to look at any code and borrow ideas, but the code you submit must be written by you. If you borrow specific ideas from code not written by you, you must acknowledge this is your source code and in your README file. You may not share code (in any form) with anyone else in the class, or anyone who has taken the class in previous years.

Filtering

Your proxy must be capable of filtering out requests to web servers within some DNS domains. For example, your proxy could be told to filter out any request made to a server whose name ends in "doubleclick.com". When your server detects a request that should be filtered, your server should return an HTTP error 403 (forbidden), this means you need to send back a HTTP status line that indicates an error.

Your proxy will get a list of domains that should be filtered on the command line. The first command line argument will be the port number you should use to receive connections, the remaining arguments (if any) are for domains that should be filtered. Below is an example of a command line that could be used to run your proxy on port 1234 and to filter out request to doubleclick.com and yimg.com:

./proxy 1234 doubleclick.com yimg.com

The filtering is all based on domain names, you do not need to worry about IP address filtering. If you get requests that specify an IP address instead of a domain name you do not need to worry about filtering them. For example, if you get the request line:

GET http://128.213.1.1/foo HTTP/1.1

You don't have to worry about whether 128.213.1.1 is actually in any of the domains you have been told to filter.


What is and is not required:


Server Output

To keep track of all requests, your server should print one line (to standard output) for each request serviced. The line should include the host name or IP address of the client, and the original request-line sent by the client (not any of headers that accompanied the request). You should print one line for each filtered request as well (indicating that your proxy did not process the request). For example, the following might be the output generated by your server if it received some requests from a client running on amele-2.cse.unr.edu:

> lab2 1234 doubleclick.com slashdot.org
amele-2.cse.unr.edu:  GET http://www.cse.unr.edu/
amele-2.cse.unr.edu:  GET http://www.cse.unr.edu/images/layout/stack_noslogan_rescaled.png
amele-2.cse.unr.edu:  GET http://www.cse.unr.edu/images/homepage_rotation/programmingcontestlowerdivisionawards2005.jpg
amele-2.cse.unr.edu:  GET http://www.cse.unr.edu/gminor/images/ghost.jpg
amele-2.cse.unr.edu:  GET http://www.cse.unr.edu/images/icalicon.png
amele-2.cse.unr.edu:  GET http://www.cse.unr.edu/images/linkicon.png
amele-2.cse.unr.edu:  GET http://www.google.com/intl/en/logos/powered_by_google_135x35.gif
amele-2.cse.unr.edu:  GET http://www.w3.org/Icons/valid-xhtml10
amele-2.cse.unr.edu:  FILTERED GET http://www.slashdot.org/foo/blah 
amele-2.cse.unr.edu:  FILTERED HEAD http://www.slashdot.org/
amele-2.cse.unr.edu:  POST http://www.fbi.gov/insecuresubmission.cgi

Note that it is not necessary to include the HTTP version number in the output (but feel free to do so if you want).


Persistence

HTTP 1.1 supports persistent connections by default. Feel free to have your proxy deal with persistent connections, but this is not required for this project. However, implementing concurrency is 10% bonus. If you chose not to deal with persistence, you will probably want to do something like the following (or your proxy will not work well with some clients/servers):

NOTE: There are special rules for proxies when it comes to handling persistence, check out sections 8.1 and 14.10 in RFC 2616 HTTP 1.1



Deliverables

You must submit all the source code necessary for us to build and test your proxy server. You must also include Makefile that can be used to build your server on the ECC workstations. If you don't know how to use make or create a Makefile, refer to the sample TCP client and Server code.

You must also include in your submission a file named README that includes your name and a brief description of your submission, including the name of each file submitted along with a one line description of what is in the file. If your code is not complete, tell us what works and what doesn't. If you are submitting code that does not compile, please tell us that as well. If any of your code was written by someone else, you are required to tell us about it (this must also be documented in the code itself). Finally, feel free to include a description of any problems you had or anything else you think might be helpful to us.



Grading

Your project will be tested to make sure it works properly - a custom HTTP client will be used to test the basic functionality of your server (not a browser) and to also make life tough by sending nonsense requests, long requests, slow requests (that arrive 1 character per second), rude requests that drop the connection before even completing the request, etc. We will also send valid requests to servers that misbehave. You are not required to protect the client from misbehaving servers, but your server must not crash or become unstable no matter what the server sends back.

Here is a rough breakdown of the grading:

Basic Functionality:
Get and Head Requests (polite client/server)
30%
POST requests (polite client/server) 20%
Dealing with impolite clients/servers 25%
Error handling, Style/Code structure, etc. 25%
BONUS: Concurrency 10%
BONUS: Persistent connections 10%

NOTE: 25% of your homework grade depends on the how "well your code is written". These points include the following:

IMPORTANT: It is not acceptable to write this project in a single function (or even just a couple of functions).


Submitting your files

Submission of your homework is via WebCT. You must submit all the required files in a single tar or zip file containing all the files for your submission.

Acknowledgement: The assignment is modified from Dave Hollinger.