Your assignment is to write a proxy HTTP Server in C or C++ that is capable of some simple filtering based on server domain name. Your proxy server must be able to handle GET, HEAD and POST requests (it can refuse to process any request that specifies any other HTTP request method) sent by a client speaking HTTP version 1.0 or 1.1. Your proxy should work with any HTTP client (browser), we will be testing it with a custom HTTP client written just to torture your proxy server.
NOTE: It is not acceptable to a submit the code for a proxy server you find on the WWW (or anywhere). It is acceptable to look at any code and borrow ideas, but the code you submit must be written by you. If you borrow specific ideas from code not written by you, you must acknowledge this is your source code and in your README file. You may not share code (in any form) with anyone else in the class, or anyone who has taken the class in previous years.
Your proxy must be capable of filtering out requests to web servers
within some DNS domains. For example, your proxy could be told to
filter out any request made to a server whose name ends in
"doubleclick.com". When your server detects a request
that should be filtered, your server should return an HTTP error 403
(forbidden), this means you need to send back a HTTP status line that
indicates an error.
Your proxy will get a list of domains that should be filtered on the command line. The first command line argument will be the port number you should use to receive connections, the remaining arguments (if any) are for domains that should be filtered. Below is an example of a command line that could be used to run your proxy on port 1234 and to filter out request to doubleclick.com and yimg.com:
./proxy 1234 doubleclick.com yimg.com
The filtering is all based on domain names, you do not need to worry about IP address filtering. If you get requests that specify an IP address instead of a domain name you do not need to worry about filtering them. For example, if you get the request line:
GET http://128.213.1.1/foo HTTP/1.1
You don't have to worry about whether 128.213.1.1 is actually in any of the domains you have been told to filter.
Your server must use a port number specified as the first command line parameter (from argv[1]). Your program will be run by an automated system that requires that your server understand that it should use the port number specified on the command line!
Your server should do filtering based on domain names as described above. All command line arguments following the port number are domains that should be filtered, there can be 0 of these or more.
Your server must handle GET, HEAD and POST request methods.
Your server should refuse to process any HTTP request method other than GET, HEAD and POST. You should send back an HTTP status code of 405 (Method not allowed) if you receive any other request method.
Your server must forward the appropriate HTTP request headers to the server and response headers back to the client.
Your server does not need to be concurrent (although it's not hard and will make your proxy much more usable). However, implementing concurrency is 10% bonus.
If you decide to write a concurrent server using fork(), you need make sure you don't leave zombie processes around.
We must not be able to kill your server simply by sending an invalid request.
We must not be able to kill your server by stopping or killing the client (this includes pressing the STOP button in Netscape or IE).
Your server cannot get larger (use more memory) every time it processes a request (no memory leaks!).
Your server does not need to support all the fancy features of HTTP version 1.1, we are looking for basic functionality (by fancy, I mean things like persistent connections, pipelining, etc.). See the note below about HTTP persistence.
To keep track of all requests, your server should print one line (to standard output) for each request serviced. The line should include the host name or IP address of the client, and the original request-line sent by the client (not any of headers that accompanied the request). You should print one line for each filtered request as well (indicating that your proxy did not process the request). For example, the following might be the output generated by your server if it received some requests from a client running on amele-2.cse.unr.edu:
> lab2 1234 doubleclick.com slashdot.org amele-2.cse.unr.edu: GET http://www.cse.unr.edu/ amele-2.cse.unr.edu: GET http://www.cse.unr.edu/images/layout/stack_noslogan_rescaled.png amele-2.cse.unr.edu: GET http://www.cse.unr.edu/images/homepage_rotation/programmingcontestlowerdivisionawards2005.jpg amele-2.cse.unr.edu: GET http://www.cse.unr.edu/gminor/images/ghost.jpg amele-2.cse.unr.edu: GET http://www.cse.unr.edu/images/icalicon.png amele-2.cse.unr.edu: GET http://www.cse.unr.edu/images/linkicon.png amele-2.cse.unr.edu: GET http://www.google.com/intl/en/logos/powered_by_google_135x35.gif amele-2.cse.unr.edu: GET http://www.w3.org/Icons/valid-xhtml10 amele-2.cse.unr.edu: FILTERED GET http://www.slashdot.org/foo/blah amele-2.cse.unr.edu: FILTERED HEAD http://www.slashdot.org/ amele-2.cse.unr.edu: POST http://www.fbi.gov/insecuresubmission.cgi |
Note that it is not necessary to include the HTTP version number in the output (but feel free to do so if you want).
HTTP 1.1 supports persistent connections by default. Feel free to have your proxy deal with persistent connections, but this is not required for this project. However, implementing concurrency is 10% bonus. If you chose not to deal with persistence, you will probably want to do something like the following (or your proxy will not work well with some clients/servers):
Proxy-Connectionor
Connectionrequest headers.
Connection: closeto all requests you send to servers. This tells the server that you don't want persistence, so the server should close the connection once the response is complete.
NOTE: There are special rules for proxies when it comes to handling persistence, check out sections 8.1 and 14.10 in RFC 2616 HTTP 1.1
You must submit all the source code necessary for us to build and test your proxy server. You must also include Makefile that can be used to build your server on the ECC workstations. If you don't know how to use make or create a Makefile, refer to the sample TCP client and Server code.
You must also include in your submission a file named
README that includes your name and a brief description of
your submission, including the name of each file submitted along with
a one line description of what is in the file. If your code is not
complete, tell us what works and what doesn't. If you are submitting
code that does not compile, please tell us that as well.
If any of your code was written by someone else, you are required to
tell us about it (this must also be documented in the code itself).
Finally, feel free to include a description of any problems you had
or anything else you think might be helpful to us.
Your project will be tested to make sure it works properly - a custom HTTP client will be used to test the basic functionality of your server (not a browser) and to also make life tough by sending nonsense requests, long requests, slow requests (that arrive 1 character per second), rude requests that drop the connection before even completing the request, etc. We will also send valid requests to servers that misbehave. You are not required to protect the client from misbehaving servers, but your server must not crash or become unstable no matter what the server sends back.
Here is a rough breakdown of the grading:
| Basic Functionality: Get and Head Requests (polite client/server) |
30% |
| POST requests (polite client/server) | 20% |
| Dealing with impolite clients/servers | 25% |
| Error handling, Style/Code structure, etc. | 25% |
| BONUS: Concurrency | 10% |
| BONUS: Persistent connections | 10% |
NOTE: 25% of your homework grade depends on the how "well your code is written". These points include the following:
Error handling (check every system call for an error!).
Safe code (avoiding buffer overflow, etc).
How well we can understand your code. There is no required format for your code, there is no requirement like "you must have one comment for every 2.35 lines of code". Feel free to provide what ever level of commenting you believe is appropriate to make sure that other competent programmers could easily understand and make changes to your code.
IMPORTANT: It is not acceptable to write this project in a single function (or even just a couple of functions).
Submission of your homework is via WebCT. You must submit all the required files in a single tar or zip file containing all the files for your submission.
Acknowledgement: The assignment is modified from Dave Hollinger.