SCOUG OS/2 For You - September 1997
Programming for the Net - CGI Programming Basics
by Terry Warren
Processing the Data
Last month we explored the details of HTML forms and how to set them up to pass data to a CGI program. Now we're ready to start writing our own CGI program using REXX as our development language.
When the webserver receives the request generated by the form, it will start a process and execute the indicated CGI program. A set of environment variables are defined by the webserver for the CGI program to assist it in obtaining information about the request. Some of these variables can be interrogated to determine how to obtain the form data string since this must be done differently for GET and POST requests.
For a GET request, the entire form data string is defined in the QUERY_STRING environment variable and can be obtained by accessing that variable.
For a POST request, there isn't a Query_String variable. Instead the CONTENT_LENGTH variable is set to the total length of the string and the data itself can be read from the STDIN data stream.
The REQUEST_METHOD variable is set to the request method itself (ie GET or POST).
Once the parameter string has been obtained, it should be broken down into individual parameters for processing. How that is accomplished depends on the programming language being used. All webservers support CGI programs written in any executable language (eg C, C++). In addition, most support the PERL scripting language and JAVA. For OS/2, REXX command files are also supported and the IBM Internet Connection Server provides a REXX extension DLL (cgiutils) which simplifies the parameter processing.
Programming note: in most systems, it doesn't matter whether your program refers to environment variable in lowercase, uppercase, or mixed. The functions which are used to fetch the variables do a case insensitive compare.
Here we have a sample REXX code fragment which determines the type of request, fetches the parameter data and builds two stem variables, one containing the parameter names and the other their values:
/*----------------------------------------------*/
/* procedure to create forms list */
/*----------------------------------------------*/
getForms:
env = "OS2ENVIRONMENT"
formName.0=0
formValue.0=0
ix = 0
/* get request type */
reqtype = VALUE("REQUEST_METHOD",,env)
IF (reqtype = "POST") THEN /* handle POST request */
DO
cl = VALUE("CONTENT_LENGTH",,env)
PARSE PULL qs
END
ELSE /* handle GET request */
DO
cl = 0
qs = VALUE("QUERY_STRING",,env)
END
IF (qs = "") THEN
RETURN
DO WHILE qs != ""
PARSE VAR qs kname "=" kval "&" qs
ix = ix+1
formName.0 = ix
formName.ix = kname
formValue.0 = ix
formValue.ix = urlDecode(kval)
END
Return
/*------ End of code fragment -------------------*/
The call to urlDecode is necessary to deal with the URL-encoding process described above.
Other header parameters sent by the browser can be accessed via their environment variables. Some of these are:
- SERVER_NAME
- (hostname, DNS alias or IP address)
- PATH_INFO
- (extra path information supplied by client, can be useful as extra parameters not in form)
- REMOTE_HOST
- (hostname, if available, of client machine)
-
REMOTE_ADDR
- (IP address of client; not always reliable)
In addition, to these special headers, all other headers received are assigned to environment variables of the form HTTP_name. For example, HTTP_USER_AGENT contains the browser identification information.
GET versus POST
It might seem confusing that there are two different request methods which accomplish the same thing (I know I've always thought so). Originally, they were tied to concepts of "active" (POST) versus "passive" (GET) webserver process. In practical terms the differences can be summarized as follows:
- GET
- the parameter string is concatenated to the request structure itself. This means that the parameter values are visible in the browser's URL address area. A knowledgeable user can invoke a GET request by typing in all of this information to the URL address (without having to fill in the form or even have it displayed). For these reasons, the GET method is not considered as secure as the POST. Also, many webservers limit the length of a parameter string associated with a GET request and so this method is not suitable for forms that contain a lot of input data. On the positive side, GET can be redirected using a "location" header (described below) and is easier for a browser to reload if requested.
- POST
- the parameter string is sent as a separate stream and so is more secure. It can be arbitrarily large. It can't be invoked other than via the form itself. A POST request can't generally be redirected because the input string is consumed when processed.
Returning Data
After decoding the CGI parameters, the program logic consists of processing them and determining what information to return. All of the returned data should be formatted as an HTML response to the request (as described in the first article of this series). The webserver will generally create the standard HTTP headers for you so the general structure of your response data would be:
- Content header
- blank line
- HTML output
where Content header is:
Content-type: text/html
The blank line is ESSENTIAL since, without it, all of the remaining data will be interpreted as additional headers. The HTML output is simply output lines which contain valid HTML content. All of this information is written to the STDOUT output stream (eg, in C use printf(), C++ use cout, REXX use SAY).
A simple REXX example follows: this generates a simple HTML response saying that the form was received and processed:
SAY "Content-type: text/html"
SAY ""
SAY ""
SAY "Your form data was received and successfully processed"
SAY ""
In some cases, the output from the program might simply be an existing document or output from another webserver process. This can be accomplished by returning only a "location" header which will result in the webserver transferring the named document or URL. The blank line is still required. (This technique typically works only in conjunction with a GET method request.)
For example (in REXX), to return a document on the same webserver which is in mydata alias:
SAY "location: /mydata/document
SAY ""
To return a document on another webserver:
SAY "location: http://www.scoug.com"
SAY ""
Testing
If you want to practice writing CGI programs but don't have a webserver on which to install them, you can create a simple testing environment by simulating what the webserver would do. For example, you could create a .cmd file which defined all of the environment variables used by your program and then executed the program. You should see command line output corresponding to the HTML statements created by your program. (You might even pipe the output into a .html file and then view it in your browser.) A simple command file would look like:
SET REQUEST_METHOD=GET
SET QUERY_STRING=Action=Add&fname=Joe&lname=Programmer
myprog.exe
Remember to do the URL-encoding on your variables!
If you're more ambitious, you can download either the IBM Internet Connection Server (free eval copy) or the Apache Web server (free) which also runs on OS/2.
Summary
The HTML specification includes a simple yet fairly usable set of UI controls defned by the <FORM> and related tags. These, together with the Common Gateway Interface webserver programming model, provide the basis for a wide variety of forms-based internet transaction processing systems.
In the next article, we will look at how user interaction can be greatly enhanced on the client side by using JavaScript.
The Southern California OS/2 User Group
P.O. Box 26904
Santa Ana, CA 92799-6904, USA
Copyright 1997 the Southern California OS/2 User Group. ALL RIGHTS
RESERVED.
SCOUG is a trademark of the Southern California OS/2 User Group.
OS/2, Workplace Shell, and IBM are registered trademarks of International
Business Machines Corporation.
All other trademarks remain the property of their respective owners.
|