$29
Overview:
MyWebServer Checklist
Firefox Browser tools (Quick: Ctrl-Shift-E to raise console. Network / Inspector tabs | drag top up for larger console window.)
All MyWebserver programs MUST communicate with the Firefox browser.
In this program you will follow through the steps of capturing the http stream between existing clients and servers, and write a web server that supports this same protocol. It builds on the JokeServer, which application does much of the same work. While the text of the assignment is quite long, the application itself is quite straightforward, and you might be surprised at how easily it can be written.
There are four+ phases in the development process:
1. Capture the HTTP protocol first-hand by developing some hacking / debugging skills (hacking in the good sense).
2. Return simple, static files on request from a browser client.
3. Return dynamically created HTML (build a directory HTML page dynamically)
4. Accept FORM input from the user and do back-end processing on the server to return computed values in (simple!) dynamically-created HTML.
5. Add features of your own choosing, if you like.
See the MyWebServer Tips file for some suggestions once you get coding.
Run at port 2540 in the server directory!
In all cases these following specifications take precedence: The web server must run at port http://localhost:2540. It must, by default, serve files from the directory in which the web server is started, including dog.txt, cat.html. The source code should be contained in a single, stand-alone file name MyWebServer.java ready to compile and run. Subdirectories should be recursively traversed from the default directory in which the server is started.
Grading procedure:
1. Run our various plagiarism checkers on your submission.
2. Extract your zip file into a directory, and run a script file that:
1. Executes > javac MyWebServer.java
2. Populates the new directory with .txt files, .html files and .java files such as dog.txt, cat.html, MyWebserver.java and the file addnums.html (with an action statement that points to port 2540 on localhost), then creates subdirectories and populates those with .txt files and .html files.
3. Executes "> java MyWebserver" to start your webserver at port 2540.
3. In firefox read your directory listing for the directory where the server is running, using port 2540.
4. Select checklist-mywebserver.html from your listing and read it.
5. Browse the .txt .java (treated like .txt) and .html files with which we have populated your directory.
6. Select the addnums.html file and submit data through it.
7. Select http-streams.txt and read it.
8. Select serverlog.txt and read it.
9. Select MyWebserver.java, read your source code, and look at the comments. Note: you should display .java files the same as .txt files by sending the data as text/plain.
10. Navigate to the subdirectories and read .txt, .java and .html files there.
Special Security Note:
I expect that you will find that in its most basic form this is not a particularly difficult assignment. If so, you will soon have a viable, running webserver of your own creation. If you are developing on a machine that is also connected to the Internet this means that you might well expose all of the files on your local machine (or any remote machine where you might be running) to evil hackers from around the world who are anxious to steal information from your files. In the worst case this information would allow them write access to your disk, and/or put financial/personal information in their hands. So—be careful. Hard-code into your server that you only return files from your root server directory of unimportant files, keep your firewall on, etc. Be careful about the "../.." form of URLs, which would allow someone to retrieve files from above your server's directory. For particularly sensitive machines you can always simply unplug your Internet connection while running your server.
Server Directories
For this assignment your server must serve files from the directory where the server is started. Place all of your submission files in this same directory.
Administration:
• Submission files: MyWebServer.java, http-streams.txt, serverlog.txt, checklist-mywebserver.html You MUST use these exact names.
• Copy the checklist for this programming assignment. Fill in the blanks. Update it as you make progress. NEVER change yes to no, unless you have completed the work. Turn it in to D2L along with your assignment.
• Zip your your files into one, flat, directory, and submit to D2L (No subdirectories!) Verify that your submission has not been corrupted.
• Concatenate MyWebServer.java, http-streams.txt, serverlog.txt into a single text file and submit to MyWebserverTII at D2L
• "javac *.java" must work to compile your source code.
• Make sure that you are familiar with the assignment submission rules (see assignment one, which covers this in detail). Programs that do not precisely conform to the rules will not be graded. Please do not ask for an exception to this policy.
• Your websever must, by default, serve directories—and files—from the directory in which it runs so that we can test it. If you also want to implement something more sophisticated, such as a default webserver directory, then pass a flag as an argument to your webserver, but keep the default as the current directory.
• Refer to the InetServer PDF
document, and the lecture, along with your JokeServer if you have completed it, for the basic program on which you build. Most of you will have completed this assignment, and extended it, well in advance of the MyWebServer program.
Capturing HTTP:
• Goal: Be hackers in the good sense... See what a Web browser, and a webserver are saying to one another for simple browser requests, so that you can later copy that functionality into your own server program.
• Note that you can use WireShark (see the labs) to capture these streams, as an alternative to the hacking methods that follow. You probably can also capture the streaming data directly in the Firefox Browser (search on inspection tools). Any method is valid.
• IF YOU WANT TO DO (PART OF) THIS YOURSELF USING JAVA:
• Use the given MyListener.java code, based on Inet. Modify, and simplify, the code as desired so that it runs at port 2540 and on the console it simply displays everything sent to it, and optionally writes it to a log file as well. If you want, have it send back a valid text/plain response to the client, acknowledging receipt of the "request" ( but note that this is just some minor elegance, not really needed).
That is, if some simple client were to send the message, "ABC Hello there in Server land!
You now have a simple "listener" program which echos all input on the server console .
If you want to be fancy, your MyListener program can, in addition to the console display, also send all of the information back to the client as HTML-formatted (or plain text) data. This is not required but could be generally useful as an echo-server showing the full format of requests. Note that you will have to send back the corrent MIME type for HMTL: "Content-Type: text/html [cr/lf] [cr/lf]" (see below).
• Start MyListener and connect to it with Firefox as follows: Make valid webserver requests of MyListener by entering URLS such as http://localhost:2540/dog.txt, and http://localhost:2540/cat.html. Notice, and record, what the browser sends your MyListener program in each case (it is displayed on the server console). This is the HTTP stream that the browser sends when it is requesting files from a web server. You have now hacked it.
• Capture the console output from MyListener into some file as well (or simply copy it from the console window and paste into a file), for submission as part of the assignment.
• For example, following the above procedure, while running my listener at port 2540, I get the following information for a request of dog.txt in the root web server directory.
C:\dp\435\java>java MyListener
Clark Elliott's Port listener running at 2540.
GET /dog.txt HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)
Host: localhost:2540
Connection: Keep-Alive
(Note: you may wish to experiment with "Connection: close" with your
webserver if you are having buffering problems.)
• Put this captured output into your http-streams.txt files for submission with the assignment. Copy and paste from the console is fine. We ONLY need the data from the http streams you've captured.
• We are now going to use the HTTP stream we have just captured to manually retrieve files from a web server. As an example, you can retrieve files from my faculty account at:
condor.depaul.edu/elliott/dog.txt and condor.depaul.edu/elliott/cat.html (But note: Tech support regularly moves my directories around. If elliott does not work, try it with a tilde [~elliott]. Or, if you have a webserver on your PC you can just use that. Or you can install the SourceForge Uniform Server and run that, which is a version of the Apache server that runs on every unix machine. Or you can start the apache web server that runs on your Mac (sudo apachectl start?). But in all cases be careful because you are now serving files from your file system to the network!
• Either use Wireshark, or use/modify a MyTelnetClient.java program by modifying your InetClient, or JokeClient, so that it allows you to type in an arbitrary text string, and send this (via port 80) to some webserver. Note that while telnet is disabled on Windows by default it is still there and can be activated.
• Use your MyTelnet program to manually enter into a dialog with the condor.depaul.edu (or some other) web server. Write the appropriate input and output for your MyTelnetClient program to a log file (or copy it from your console window, or capture it in Wireshark), for later submission to D2L as part of your http-streams.txt file, but I don't need to see your source code for this simple program either.
We are working with condor.depaul.edu for convenience because that is where we put our files. However, we could just as easily manually get files from the web server at www.cnn.com if our files were on that machine.
You will connect at port 80 instead of the default telnet port of 23, because you want to tak to the web server, instead of the telnet server.
Do this by entering the shell command,
MyTelnetClient condor.depaul.edu 80 <-- or whichever server you are using
The condor.depaul.edu web server is now waiting for input from you.
You can use the following static files in the step below, or similar files that you have created on your own webserver:
http://condor.depaul.edu/elliott/cat.html
http://condor.depaul.edu/elliott/dog.txt
• Enter the valid HTTP request stream that you captured using your listener, for retrieving the file dog.txt from a web server. Note that you will have to be careful to include all of the necessary information, including carriage return / linefeeds (cr/lfs), and that you will have to make changes as needed for different servers. You could probably use copy and paste if you are clever, but unless you connect many times it is probably not worth it.
Hint: some of the information, such as "Accept" and "User-Information" is not needed by the web server, and you can find what you can leave out through experimentation.
• If you enter the HTTP correctly the web server will now send your requested file back to you as a text stream response to your MyTelnetClient program. If you enter it incorrectly you will still usually get some kind of valid response, albeit one containing an error message.
• Here is a sample session, yours will be similar, but may differ in some of the details, depending which webserver you are using, on which machine. (Note: server configurations change, so you may have to vary what you send to get a response. Follow what your browser sends. My account on condor moves all the time and you may only get a (valid!) error message.)
> java MyTelnetClient condor.depaul.edu
Clark Elliott's MyTelnet Client, 1.0.
Using server: condor.depaul.edu, Port: 80
Enter text to send to the server, <stop> to end: GET /elliott/dog.txt HTTP/1.1
Enter text to send to the server, <stop> to end: Host: condor.depaul.edu:80
Enter text to send to the server, <stop> to end:
Enter text to send to the server, <stop> to end:
Enter text to send to the server, <stop> to end: stop
HTTP/1.1 200 OK
Date: Wed, 03 Oct 2018 20:40:45 GMT
Server: Apache/2.2.3 (Red Hat)
Last-Modified: Wed, 07 Oct 2015 20:29:55 GMT
ETag: "8a1bfc-30-521899bff76c0"
Accept-Ranges: bytes
Content-Length: 48
Content-Type: text/plain
Connection: close
This is Elliott's dog file on condor. Good job!
• Note: you may get a different response. What we are looking for is SOME HTTP / HTML response from the webserver. For example, if the file has been moved somewhere else, you might get back a well-formed error message. This is fine. In either case you are successfully talking with the webserver.
• Put your captured output into http-streams.txt for submission with the assignment.
• [Note: You can use Wireshark, and also the Firefox browser console to see network traffic. In the past Firefox has allowed you to download and install a plug-in called HTTPFox (tools -> add-ons -> get add-ons). After HTTPFox is installed you'll see a small icon in the bottom right corner of your browser window. With HTTPFox you will be able to see all outgoing traffic from your web browser, as well as all of the server responses coming back. (Similar to Fiddler for IE) (Thanks Arkadiusz)]
• So, in summary: Create the simple files dog.txt, cat.html, in your home web directory (or use my files). Verify that they can be reached from the web. Retrieve your files manually using MyTelnetClient to port 80, or WireShark, or HTTPFox and add these to http-streams.txt along with your MyListener data.
• You have now captured both the request coming from a web client, and the response coming from a web server. Ta-duh.
MIME headers
For this assignment we will use two mime types: Content-Type: text/plain and Content-Type: text/html. These must be followed by two cr/lf and then your data.
MIME types are determined by the server from the file extension of the files that are requested. .html will use text/html, and .txt and .java files will both use text/plain. (This is just a trick so we can view your java source code through your webserver.)
Modify your MultiThreaded server so that it becomes a simple web server.
Goal: Your web server must correctly return requests for files with extensions of .txt, and .html [and also .java which are treated as the same as .txt]. This means that it must return the correct MIME headers (That is, the Content-type [followed by two cr/lf], and Content-length headers), as well as the data. This is a server that operates on static data.
• Copy your MyListener.java source into a file called MyWebServer.java.
• Copy over your files dog.txt, cat.html to your local machine into the directory where you are developing your web server, for later use.
• Using the manual responses you captured from the web server (see above), which contains ALL of the information that the web server sends back to a client, including, specifically the MIME type information (Content-Type:) and Content-Length:, modify your listener so that it becomes a valid web server by sending back a valid text stream, including headers, to the web client. See HTTP protocol for some hints.
• In practice you need not send back all of the responses. You WILL want to include:
HTTP/1.1 200 OK
Content-Length: 47 [Where 47 is changed to the real length of the data --
but note that you might make initial tests by just setting this value high]
Content-Type: text/plain [Where text/plain might also be: text/html]
[followed by two carriage return / linefeeds (crlf), and then the data.]
Modern browsers handle the mini favicon files (the tiny logo that can appear in the URL window) requests different ways. If your Firefox browser sends a request for a favicon, you should write code to ignore it. That is, for this assignment we just want those requests to go away anyway we can manage it. If you put a favicon.ico file in your server's root directory it may solve problems for you. Here is the WikiPedia article on favicons
The following end of line hints might be useful:
static final byte[] EOL = {(byte) '\r', (byte) '\n'};
or:
outstream.writeBytes("Content-Type: " + ConType + "\r\n\r\n");
or:
outstream.print("\r\n\r\n");
• Configure your sever so that it sends back the correct MIME type headers for .txt, and .html files [text/plain, and text/html, respectively].
• Use your MyListener, and the MyTelnetClient tricks, or WireShark, for debugging as needed.
Extend your server to include directories:
Goal: Extend your server so that it sends back dynamically constructed data: in this case the HTML-formatted current contents of a directory. This will now be a server that operates on dynamic data.
[Intermediate step: If you are struggling with this assignment, you might want to first simply create some dynamically created HTML, by sending back an very simple HTML file with dynamic data in it, such as the current time. This way you can at least say you have written back dynamic HTML to the client. Then once you are getting the text/html mime type working with dynamic data, go on to creating a directory listing.]
• Note: Most webservers no longer allow the promiscuous display of a directory's contents. But we will provide it from our server as an exercise.
• See the ReadFiles.java program for hints on how to read the contents of a directory in Java. [Note: a directory is simply a more-or-less regular file that contains the names of other files in it, along with some associated information.]
• Modify your webserver so that it correctly returns a promiscuous display of the server's directory as requested by the client. Note that you may want to include some security here, since you WILL be writing a valid, albeit simple, web server. For example, you might want to restrict access to a certain subdirectory of where the server is running.
1. The first step is to simply send back a plain text listing of the files in the directory, along with a text/plain MIME header, and the length of your data.
2. The second step is to send back some kind of formatted HTML with a text/html MIME header.
3. The third step (really not that hard) is send back the names of the files as hot-link references such that "clicking-on" them in the browser will cause your server to send back the contents of that file.
• Using our MyTelnetClient hack we used to be able to see what a regular server would send back as an html listing of hot-links for files. (For security reasons, most servers no longer give directory listings.) For example, for the condor.depaul.edu request "GET /elliott/435/.xyz/" condor we used to get back the following:
[...]
<h1>Index of /elliott/435/.xyz</h1>
<pre><img src="/icons/blank.gif" alt="Icon "> <a href="?C=N;O=D">Name</a> <a href="?C=M;O=A">Last modified</a> <a href="?C=S;O=A">Size</a> <a href="?C=D;O=A">Description</a><hr><img src="/icons/back.gif" alt="[DIR]"> <a href="/elliott/435/">Parent Directory</a> -
<img src="/icons/text.gif" alt="[TXT]"> <a href="dog.txt">dog.txt</a> 16-Sep-2005 14:09 39
<img src="/icons/text.gif" alt="[TXT]"> <a href="cat.html">cat.html</a> 16-Sep-2005 14:09 67
<img src="/icons/text.gif" alt="[TXT]"> <a href="MyWebServer.class">MyWebServer.class</a> 16-Sep-2005 14:09 222
<img src="/icons/folder.gif" alt="[DIR]"> <a href="z-directory/">z-directory/</a> 16-Sep-2005 15:08 -
</pre>
Which displays as:
Index of /elliott/435/.xyz
Name Last modified Size Description
Parent Directory -
dog.txt 16-Sep-2005 14:09 39
cat.html 16-Sep-2005 14:09 67
MyWebServer.class 16-Sep-2005 14:09 222
z-directory/ 16-Sep-2005 15:08 -
We can simplify this as follows:
<pre>
<h1>Index of /elliott/435/.xyz</h1>
<a href="/elliott/435/">Parent Directory</a> <br>
<a href="dog.txt">dog.txt</a> <br>
<a href="cat.html">cat.html </a><br>
<a href="MyWebServer.class">MyWebServer.class</a><br>
<a href="z-directory/">z-directory/</a><br>
Which displays as:
Index of /elliott/435/.xyz
Parent Directory
dog.txt
cat.html
MyWebServer.class
z-directory/
• Lastly, modify the return from your server so that it sends back links to subdirectories as subdirectory URL hot links, if you have not already done so. The only hard part is identifying a file as a directory, and typically you can look for a trailing slash ("/"). For grading we will use the convention that if the URL ends in a slash ("/") then the server will look for a subdirectory with that name. Thus, when listing subdirectories, you should send subdirectory hotlinks back to the web client with trailing slashes in your preared URL.
• For some browsers, and browser settings, you may have some difficulties with the directories—e.g., you might have to send your request twice. We may also have trouble translating between the directory systems of Unix, Mac, and Windows operating systems. So be sure to show us that your directory traversal works in your serverlog.txt file. Also, you might want to experiment with: Connection: Keep-Alive / Connection: close.
• Also, you may want to experiment with the socket.close() method if your browser is not displaying the data but all else is working.
• You should now have a relatively complete, working, web server, that can return correct MIME types for different types of files, recurse subdirectories, and return dynamically-created html. Because it is multi-threaded it should be able to handle many hundreds of requests. Good work!
Server-Side scripting and program execution.
Goal: write simple code to run arbitrary program code on the server processing user input from the web, and send the results back to the web client.
In this section we add back-end programming capability to your server, or at least simulate it. We create a simple addnums web form , accept input from a user, pass this to our webserver, process the information, and return a computed response based on the input.
For those who are more ambitious you might look into java's JNI, which allows us to call native code, by loading it into the virtual machine, and then running it. In this way we might write programs that actually run arbitrary scripts/programs under the web server.
Alternatively, for those writing in C, the "system()" function will execute any executables as subprocesses, making the running of programs and scripts trival. Note: be very security conscious of running user-input shell commands with the "system()" call, because, e.g., they might have you execute a command to erase all of your files!
Neither method is required. Instead, to keep the programming scope reasonable, we will only simulate the running of back-end scripts.
CGI (the Common Gateway Interface) has been around since the beginning of the web, so there are thousands of references on how to use it.
• Use the given web form that accepts a name and two numbers. On the "action" statement, using the GET method, call a program with the extension ".fake-cgi" with a URL that points to your MyWebServer program. E.g., you might have...
<FORM METHOD="GET" ACTION="http://localhost:2540/cgi/addnums.fake-cgi">
...which would suggest that you have a script in the /cgi subdirectory of your server, named addnums.fake-cgi that will handle the input from the current HTML form.
• Note: Although method = POST is more common, GET is a little simpler, and will suit our needs just fine.
• In your webserver, note that the file extension is .fake-cgi. Instead of looking for a file with this extension as you would for .txt and .html treat the input to the server as input to a script. In a regular server you would follow the rules of the Common Gateway Interface (hence CGI) to call a script or executable, and pass it the input, then the server would return the result on behalf of the back-end script. In our case we will simply handle the input in the server itself.
• Use your MyListener program (or WireShark or the Firefox tools) to see what the browser is sending when the form is submitted. Note where attribute-value pairs (for num1, num2, and person) are located in the string that is being sent to the listening socket. Design your own method for parsing the input from what the browser is sending you. You will have to do a little string processing in your progaram. For exmple, for form-input of "Matilda," and the numbers 4 and 5, you would expect the following to show up in your input stream:
GET /cgi/addnums.fake-cgi?person=Matilda&num1=4&num2=5 HTTP/1.1
This kind of string with attributes and values separated by "&" and A/V pairs separated by "=" all preceded by a "?" is called a query string .
• Parse the input according to the rules of CGI for GET.
• Call a method addnums() in your server to handle the input and return the HTML formatted output to the client, using the correct MIME type, Content-Length, etc. Note that you must be careful that the working memory for each invocation of addnums() is distinct from all other invocations. Your multi-threaded server may be processing many requests at once.
• Send back an HTML page that returns the user's name and also the sum of the two numbers. E.g., "Dear Matilda, the sum of 4 and 5 is 9."
• For debugging purposes, always remember that you can send the form to your MyListener program to see exactly what the browser is sending you.
Tu-duh! You have now built a multi-threaded web server that can handle files, directory traversals, and server-side scripting after getting input from the user through a web client. Good job!
What you turn in
• Capture the HTTP stream from a client using your MyListener Program or WireShark. Capture the HTTP stream from a server using your MyTelentClient program or WireShark. Concatenate these streams together, adding header comments, or in-line comments as needed or helpful, about what the file contains, and put it in a file called http-streams.txt.
• Produce simple "debugging" console output from your webserver showing a series of connections that have been made, what the request string is (dumping the first, informative, characters of the GET request is fine—no editing needed), and the file names that were returned. Rough output is fine, just showing the general working of your server. We are particularly interested in you showing that you can traverse subdirectories which are sometimes problematic for us to grade. Or, you can produce a log file if you wish with the same information in it. Put the text of this console or file log, along with clearly-delineated explanatory comments as needed or helpful, in a file named serverlog.txt.
• Put all of your source code into a single file named MyWebServer.java . Include the standard header comments, and make sure it compiles and runs at the command line. Your server MUST serve files from the subdirectory in which it is started.
• Do NOT submit either MyListener.java or MyTelnetClient.java. These were for your own utility use, and the worker methods might present a conflict with MyWebServer.java compilation.
• Fill in your checklist-mywebserver.html file representing what you have done. NEVER change a "no" to a "yes" without having completed that portion of the assignment! (See the academic integrity link.)
• Put everything IN ONE DIRECTORY. No subdirectories. Make sure that you do not have a conflict with the worker methods of MyListener, and MyWebServer.
• Collect the four files (possibly more if you have bragging rights) into a .zip file, and submit to D2L before the due date.
• Concatenate all your files except the checklist into a single text file and submit to the D2L TII link for this assignment.
• Good work!
Grading note:
You can assume we will not have any spaces in file names.
We MUST be able to retrieve files from the directory in which your MyWebserver program is running. That is, when the following files are together in the indicated subdirectory. You can assume we will put a trailing slash if we enter a directory name in the address bar of a browser. Your root directory should display if there is no further information beyond the port number.
/users/elliott/students/435/Web/
MyWebserver.class
dog.txt
cat.html
/sub-a
/sub-b
cat.html
We should be able to retrieve your files from:
http://localhost:2540/dog.txt
http://localhost:2540/cat.html
http://localhost:2540/sub-a/sub-b/cat.html
and
http://localhost:2540/ or...
http://localhost:2540
should show us:
addnums.html
checklist-mywebserver.html
dog.txt
http-streams.txt
cat.html
serverlog.txt
sub-a/
MyWebServer.class
MyWebServer.java
...or at least something similar.
As per the grading specifications above, we should be able to retrieve all your files through your webserver from this kind of directory listing.
Bragging rights (not required):
• Store the MIME types in a table of MIME types and file extensions. Read the table in when the server starts, and also again, while the server is running, if a file extension is not recognized. This way, adding a new MIME type is as simple as adding an entry in your table, and putting files with that extension in your directory. Be SURE that your MimeTypes file is included in your submission and note this in your comments.html file and at the top of your MyWebserver.html file.
• Bragging rights: HTTP has components for storing cookies on the client through the browser. Implement this, and write a small application that shows this interaction with your server such that the cookie is sent back to the server by the browser on a later invocation.
• Bragging rights: implement a security policy for your server. This can become major bragging rights, depending on how far you go.
• Major bragging rights: Implement all of the above using HTTPS as well as HTTP. (Note: this is hard.)
• Major bragging rights (not recommended): Implement true, if limited, CGI capability by spawing subprocesses to execute back-end programs in real scripting languages. But note: this is actually quite simple if you write your webserver in a native language like C, or PERL, which supports the direct spawning of shell processes on the local machine.
Side note: Unix (Apache) servers usually serve files from USERACCOUNT/public_html. For example, if I put dog.txt on this unix/Apache machine as /condor/cscfclt/elliott/public_html/dog.txt we would find it on the web as http://condor.depaul.edu/elliott/dog.txt. or http://condor.depaul.edu/~elliott/dog.txt.