Sambar Server Documentation

CGI Tutorial


Overview
The CGI (Common Gateway Interface) is an invaluable programming aide for learning how the WWW functions. CGIs provide a way for you to run a program in response to a WWW server request. Perl is the most common language used for CGI programs, but virtually any language can be used (i.e. sh, AppleScript, Visual Basic, Delphi, C, C++). The original CGI/1.1 specification is still available at the NCSA site.

HTML FORMs
Before diving into CGIs, you must understand HTML FORMs. If you've ever filled out a series of fields in a browser and clicked on the "submit" button, you've seen an HTML FORM. The data in the HTML FORMs typically provides the input to server-side programs (i.e. CGIs); the CGIs take the HTML FORM data and perform some action like placing an order with the vendor's purchasing system, or sending mail to a company employee.

The following is a simple HTML FORM which executes a Perl script that displays the FORM contents.

<form method="post" action="/cgi-bin/dumpenv.pl">
Email: <input type="text" name="email" size="25">
Message: <textarea name="message" rows=3 cols=60>
</textarea>
<input type=submit value="Send message">
</form>

The HTML FORM action identifies the CGI program that will do something with the data from the form. In the above example, the CGI dumpenv.pl is the script that will receive the form data. The "method" tells the browser how to package the content when sending it to the WWW server. There are two basic methods: GET and POST. There is very little functional difference between these two methods; the significant differences are:

Inside a FORM, INPUT, SELECT, TEXTAREA tags are used to specify interface elements. Each INPUT field in a FORM must have parameters indicating the "type" (i.e. text for textual input fields) and "name" of the field. There are numerous INPUT attributes, including:

When the user clicks on the "submit" button, the browser sends all the data from the input fields to the program designated in the "action" line. Important: Every FORM must end with </form> so that the browser knows where the form ends.

Passing FORM data
When the user clicks on the "submit" button on a form, the browser program links the name/value pairs of field data together into one long buffer:

http://localhost/cgi-bin/dumpenv.pl?email=foobar&message=This+is+a+test

Note: The above URL would be displayed in the browser if the GET "method" was used (POST methods transport the data slightly differently, but the idea is the same.) The first portion of the URL indicates what server to send the request to: http://localhost. Localhost is a special term for the local machine. The next portion of the URL indicates the CGI script to execute: /cgi-bin/dumpenv.pl. Finally, the remainder of the script following the question mark (?) is the concatinated name/value for data in an encoded format.

The server receives the request and first attempts to find the /cgi-bin directory configured for the server. Next, it determines if and how to execute the script dumpenv.pl. Important: By default, many web servers do not permit CGI execution. WWW servers can be configured to recognize CGI programs in different ways. For some, any URL that calls for a file in a certain directory (often, "cgi-bin") indicates that the WWW server should try to run whatever it finds there as a CGI program. Others can be configured to use the file extension (the ".pl" or ".cgi") to indicate that certain files are programs rather than HTML pages, graphics, or other file types. You must understand how the server has been configured to execute CGI programs before you can proceed. For the remainder of this example, we assume that the web server is set up to recognize anything ending in .pl as a Perl CGI program and that there is a "cgi-bin" directory for script execution.

The browser appends a "?" onto the end of the URI in order to indicate that what follows is data for the program to use: http://localhost/cgi-bin/dumpenv.pl?. The WWW server then parses the URL and breaks the request into the URI, http://localhost/cgi-bin/dumpenv.pl, and the URI name/value pair arguments email=foobar&message=This+is+a+test. The question mark (?) designates the separation. Whatever you have a "name=" tag in the FORM becomes the name, and whatever is submitted for that field by the user becomes the value. Each name/value pair is separated in the URL line by the ampersand (&).

Parsing FORM data
The CGI program receives the name/value pair arguments in one long line either via the QUERY_STRING environment variable or stdin. The program is then required to split the name/value pairs up and decode the strings for use.

For POST or PUT FORM data, the information will be sent to the CGI script via stdin. The server will send CONTENT_LENGTH bytes on this file descriptor. For example, the FORM sample above might send 35 bytes encoded as: email=foobar&message=This+is+a+test. In this case, the server will set the CONTENT_LENGTH environment variable to 35 and set the CONTENT_TYPE environment variable to application/x-www-form-urlencoded. The first byte on the CGI program's standard input will be "e", followed by the rest of the encoded string.

Fortunately, there are many packages available to decode CGI arguments into useable form. The CGI program sends its output to stdout. This output can either be a document generated by the program, or instructions to the server for retrieving the desired output. The following is a simple Perl script which takes HTML POST form input and displays the name/value pairs to the client:

	#!/usr/local/perl/perl
	print "CGI Variables\n";

	# Get the FORM content-type and length
	$content_type = $ENV{'CONTENT_TYPE'};
	$content_len = $ENV{'CONTENT_LENGTH'};

	# Buffer the POST content
	binmode STDIN;
	read(STDIN, $buffer, $content_len);

	# Parse and display the FORM data.
	if ((!$content_type) ||
	    ($content_type eq 'application/x-www-form-urlencoded'))
	{
		# Process the name=value argument pairs
		@args = split(/&/, $buffer);

		$data = '';
		foreach $pair (@args) 
		{
			($name, $value) = split(/=/, $pair);
	
			# Unescape the argument value 
			$value =~ tr/+/ /;
			$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;

			# Print the name=value pair
			print "$name: $value\n";
		}
	}
	else
	{
		print "Invalid content type (expecting POST data)!\n";
		exit(1);
	}

	# DONE
	exit(0);

Next, see if you can enhance the above script to accept and process FORM data passed via GET.

Environment Variables
As you can see in the above script, environment variables are used to pass information about the FORM data to the CGI program. The following is a list of some of the standard environment variables available.

Environment Variable Description
SERVER_SOFTWARE is the name and version of the server answering the request.
SERVER_NAME is the server's hostname, DNS alias, or IP address as it would appear in self-referencing URLs.
GATEWAY_INTERFACE is the revision of the CGI sepcification to which the server complies.
SERVER_PROTOCOL is the name and revision of the protocol this request came in with.
SERVER_PORT specifies port to which the request was sent.
REQUEST_METHOD is the method with which the request was made: "GET", "POST" etc.
QUERY_STRING is defined as anything following the first '?' in the URL. Typically this data is the encoded results from your GET form. The string is encoded in the standard URL format changing spaces to +, and encoding special characters with %xx hexadecimal encoding.
PATH_INFO is the extra path information, as given by the client.
PATH_TRANSLATED is the translated version of PATH_INFO, which takes the path and does a virtual-to-physical maping to it.
SCRIPT_NAME is a virtual path to the script being executed.
REMOTE_HOST is the host name making the request. If DNS lookup is turned off, the REMOTE_ADDR is set and this variable is unset.
REMOTE_ADDR is IP address of the remote host making the request.
CONTENT_LENGTH is length of any attached information from an HTTP POST.
CONTENT_TYPE is the media type of the posted data (usually application/x-www-form-urlencoded).

Returning Data
CGI programs can return content in many different document types (i.e. text, images, audio). They can also return references to other documents. To tell the server what kind of document you are sending back, CGI requires you to place a short header on your output. This header is ASCII text, consisting of lines separated by either linefeeds or carriage returns (or both) followed by a single blank line. The output body then follows in whatever native format.

If you begin your script output with either "HTTP/" then the server will send all output exactly as the script has written it to the client. Otherwise, the server will send a default header back (text/html file type) with any data returned from the script. Important: If you do not choose to write the entire HTTP header, you should not provide any special headers, as they will appear as part of the body after server processing.

If you begin your script with any of the following:

the server will append the appropriate HTTP response status (200 or 302) followed by the headers and content of your script exactly as received.

For example, to send back HTML to the client, your output should read:

        Content-type: text/html

        <HTML><HEAD>
        <TITLE>output of HTML from CGI script</TITLE>
        </HEAD><BODY>
        <H1>Sample output</H1>
        Blah, blah, blah.
        </BODY></HTML>

In the above example, the response prepended is: HTTP/1.0 200 OK
To reference a file on another HTTP server, you would output something like this:

        Location: http://www.sambar.com/
        Content-type: text/html

        <HTML><HEAD>
        <TITLE>Whoops...it moved</TITLE>
        </HEAD><BODY>
        <H1>Content Moved!</H1>
        </BODY></HTML>

In the above example, the response prepended is: HTTP/1.0 302 MOVED
Note: The Location: directive should come prior to the Content-type: directive.

© 2000 Sambar Technologies. All rights reserved. Terms of Use.