@database ProgrammingInREXX.guide @author Charles Daney @(c) 1992 by Quercus Systems @node Main "Programming in REXX" The definitive work on REXX programming, with a wealth of shortcuts, tips, and tactics for REXX program construction and development. This is the one and only REXX programmer's handbook covering all language details with concise examples and practical guidelines on how to write efficient, robust programs with any implementation of REXX. @{i}Programming in REXX@{ui} compares and contrasts REXX with other languages, describing its basic rules of structure and syntax. The book goes on to explore the most important REXX language idioms, instructions, and features; methods for facilitating program portability; tactics for debugging programs; and scores of other time-saving programming "secrets". @{" 0: PREFACE " link Zero} @{" 1: INTRODUCTION " link Intr} @{" WHAT REXX IS " link Intr1} @{" THE APPLICATION PROGRAMMING INTERFACE " link Intr2} @{" WHAT'S IN THIS BOOK " link Intr3} @{" THE REXX STANDARDIZATION PROCESS " link Intr4} @{" GETTING STARTED " link Intr5} @{" 2: LANGUAGE OVERVIEW " link Over} @{" GETTING STARTED " link Over1} @{" PUTTING THE ELEMENTS TOGETHER " link Over2} @{" PROGRAM STRUCTURE " link Over3} @{" THE REXX DATA MODEL " link Over4} @{" SCOPE OF VARIABLES " link Over5} @{" STRING MANIPULATION AND PARSING " link Over6} @{" OTHER FEATURES OF REXX " link Over7} @{" 3: PROGRAM STRUCTURE AND SYNTAX " link Stru} @{" PROGRAM FORMAT " link Stru1} @{" CLAUSES AND STATEMENTS " link Stru2} @{" MORE ABOUT CLAUSES " link Stru3} @{" TOKENIZATION OF STATEMENTS " link Stru4} @{" REXX EXPRESSIONS " link Stru5} @{" CLAUSE TYPE RECOGNITION RULES " link Stru6} @{" CHARACTER STRING OPERATORS " link Stru7} @{" ARITHMETIC OPERATORS " link Stru8} @{" LOGICAL OPERATORS " link Stru9} @{" OPERATOR PRECEDENCE " link Stru10} @{" NUMBERS AND ARITHMETIC IN REXX " link Stru11} @{" REXX VARIABLES " link Stru12} @{" 4: CONTROL STRUCTURES " link Cont} @{" SELECTION STRUCTURES " link Cont1} @{" LOOPING STRUCTURES " link Cont2} @{" THE SIGNAL INSTRUCTION " link Cont3} @{" 5: SUBROUTINES AND FUNCTIONS " link Subr} @{" BUILT-IN, INTERNAL, AND EXTERNAL PROCEDURES " link Subr1} @{" PASSING ARGUMENTS AND RETURNING VALUES " link Subr2} @{" SCOPE OF VARIABLES " link Subr3} @{" EXECUTE STATE PRESERVED AROUND PROCEDURE CALLS " link Subr4} @{" 6: COMMANDS TO EXTERNAL ENVIRONMENTS " link Exte} @{" THE ADDRESS INSTRUCTION " link Exte1} @{" COMMAND RETURN CODES " link Exte2} @{" 7: CHARACTER STRING HANDLING " link Char} @{" STRING HANDLING BUILT-IN FUNCTIONS " link Char1} @{" STRING-ORIENTED FUNCTIONS " link Char2} @{" WORD-ORIENTED FUNCTIONS " link Char3} @{" OTHER STRING MANIPULATION FUNCTIONS " link Char4} @{" 8: THE PARSE INSTRUCTION " link Pars} @{" SOURCES OF INPUT TO PARSE " link Pars1} @{" PARSE TEMPLATES: SIMPLEST CASE " link Pars2} @{" PATTERN MATCHING IN TEMPLATES " link Pars3} @{" POSITIONAL PATTERNNS IN TEMPLATES " link Pars4} @{" VARIABLE PATTERNS " link Pars5} @{" PARSING PROCEDURE ARGUMENTS " link Pars6} @{" PARSE IN RELATION TO OTHER FORMS OF STRING MANIPULATION " link Pars7} @{" PRACTICAL EXAMPLES OF PARSING " link Pars8} @{" 9: INPUT AND OUTPUT " link Inpu} @{" CHARACTER-ORIENTED VS. LINE-ORIENTED I/O " link Inpu1} @{" OPENING A FILE " link Inpu2} @{" FILE READ/WRITE POINTERS " link Inpu3} @{" CLOSING A FILE " link Inpu4} @{" LINE-ORIENTED FILE I/O FUNCTIONS " link Inpu5} @{" CHARACTER-ORIENTED FILE I/O FUNCTIONS " link Inpu6} @{" COMMUNICATION WITH THE USER " link Inpu7} @{" EXAMPLE: BINARY SEARCH OF SORTED FILES " link Inpu8} @{" EXAMPLE: MAKING A FILE INDEX " link Inpu9} @{" EXAMPLE: WRITING 'FILTERS' IN REXX " link Inpu10} @{" 10: THE EXTERNAL DATA QUEUE " link Queu} @{" USAGE OF THE QUEUE " link Queu1} @{" RELATION OF THE EXTERNAL DATA QUEUE AND THE STANDARD " link Queu2} @{" INPUT STREAM " link Queu2} @{" RELATION OF THE EXTERNAL DATA QUEUE AND THE OPERATING " link Queu3} @{" SYSTEM " link Queu3} @{" 11: EXCEPTION HANDLING " link Exce} @{" ENABLING AND DISABLING CONDITION HANDLING " link Exce1} @{" USING TYPE 1 CONDITION HANDLERS " link Exce2} @{" USING TYPE 2 CONDITION HANDLERS " link Exce3} @{" THE CONDITION() FUNCTION " link Exce4} @{" 12: THE INTERPRET INSTRUCTION " link Inte} @{" RULES FOR INTERPRET " link Inte1} @{" EXAMPLES OF INTERPRET USAGE " link Inte2} @{" 13: REXX ARITHMETIC " link Arit} @{" PRECISION OF ARITHMETIC " link Arit1} @{" ARITHMETIC OPERATIONS " link Arit2} @{" EXPONENTIAL REPRESENTATION0 " link Arit3} @{" WHOLE NUMBERS " link Arit4} @{" ARGUMENTS TO BUILT-IN FUNCTIONS " link Arit5} @{" BUILT-IN FUNCTIONS FOR NUMERIC FORMATTING AND " link Arit6} @{" ARITHMETIC " link Arit6} @{" ADDITIONAL MATHEMATICAL FUNCTIONS " link Arit7} @{" 14: TRACING AND DEBUGGING " link Trac} @{" THE TRACE INSTRUCTION " link Trac1} @{" PASSIVE TRACING " link Trac2} @{" INTERACTIVE TRACING " link Trac3} @{" A: REXX INSTRUCTIONS " link AppA} @{" ADDRESS " link AppA1} @{" ARG " link AppA2} @{" CALL " link AppA3} @{" DO " link AppA4} @{" DROP " link AppA5} @{" EXIT " link AppA6} @{" IF " link AppA7} @{" INTERPRET " link AppA8} @{" ITERATE " link AppA9} @{" LEAVE " link AppA10} @{" NOP " link AppA11} @{" NUMERIC " link AppA12} @{" PARSE " link AppA13} @{" PROCEDURE " link AppA14} @{" PULL " link AppA15} @{" PUSH " link AppA16} @{" QUEUE " link AppA17} @{" RETURN " link AppA18} @{" SAY " link AppA19} @{" SELECT " link AppA20} @{" SIGNAL " link AppA21} @{" TRACE " link AppA22} @{" B: REXX BUILT-IN FUNCTIONS " link AppB} @{" ABBREV() " link AppB1} @{" ABS() " link AppB2} @{" ADDRESS() " link AppB3} @{" ARG() " link AppB4} @{" BITAND() " link AppB5} @{" BITOR() " link AppB6} @{" BITXOR() " link AppB7} @{" B2X() " link AppB8} @{" CENTER() " link AppB9} @{" CHARIN() " link AppB10} @{" CHAROUT() " link AppB11} @{" CHARS() " link AppB12} @{" COMPARE() " link AppB13} @{" CONDITION() " link AppB14} @{" COPIES() " link AppB15} @{" C2D() " link AppB16} @{" C2X() " link AppB17} @{" DATATYPE() " link AppB18} @{" DATE() " link AppB19} @{" DELSTR() " link AppB20} @{" DELWORD() " link AppB21} @{" DELSTR() " link AppB22} @{" DIGITS() " link AppB23} @{" D2C() " link AppB24} @{" D2X() " link AppB25} @{" ERRORTEXT() " link AppB26} @{" FORM() " link AppB27} @{" FORMAT() " link AppB28} @{" FUZZ() " link AppB29} @{" INSERT() " link AppB30} @{" LASTPOS() " link AppB31} @{" LEFT() " link AppB32} @{" LENGTH() " link AppB33} @{" LINEIN() " link AppB34} @{" LINEOUT() " link AppB35} @{" LINES() " link AppB36} @{" MAX() " link AppB37} @{" MIN() " link AppB38} @{" OVERLAY() " link AppB39} @{" POS() " link AppB40} @{" QUEUED() " link AppB41} @{" REVERSE() " link AppB42} @{" RIGHT() " link AppB43} @{" SIGN() " link AppB44} @{" SOURCELINE() " link AppB45} @{" SPACE() " link AppB46} @{" STREAM() " link AppB47} @{" STRIP() " link AppB48} @{" SUBSTR() " link AppB49} @{" SUBWORD() " link AppB50} @{" SYMBOL() " link AppB51} @{" TIME() " link AppB52} @{" TRACE() " link AppB53} @{" TRANSLATE() " link AppB54} @{" TRUNC() " link AppB55} @{" VALUE() " link AppB56} @{" VERIFY() " link AppB57} @{" WORD() " link AppB58} @{" WORDINDEX() " link AppB59} @{" WORDLENGTH() " link AppB60} @{" WORDPOS() " link AppB61} @{" WORDS() " link AppB62} @{" XRANGE() " link AppB63} @{" X2B() " link AppB64} @{" X2C() " link AppB65} @{" X2D() " link AppB66} @{" C: FURTHER READING " link AppC} @{" D: OPERATORS AND SPECIAL CHARACTERS " link AppD} @endnode @node Zero "PREFACE" REXX has been around since 1979. Mike Cowlishaw's authoritative and lucid book on the language appeared in 1985, though it was based largely on his language definition which had existed in written form for several years. Various REXX books have come out since 1985, but, oddly, it seems to me, none appears to have been designed to teach the language in a discursive way and with a platform-independent viewpoint. Mike's book is very good, but it is a language definition, and often doesn't go far enough in the area of explanation and motivation. All the other REXX books of which I am aware miss the mark in one way or another. There are tutorials which are superficial and somewhat lightweight. Other books are full of tables and syntax diagrams, but offer little in-depth explanation of how REXX actually works and why. Still other books are limited to particular REXX implementations. (O'Hara and Gomberg's @{i}Modern Programming Using REXX@{ui} is an exception from these observations. But it is a special case in that its aim is more to teach programming than to teach REXX.) This situation is unfortunate, as I have heard many wistful requests over the years for the recommendation of a good book from which to learn REXX. The best answer available was, "Read Cowlishaw." Now, there should be no mistake about this: REXX is a very approachable language, and most people get into it easily with the help of available documentation and some good examples. It was designed to be easy to learn and to use. But mastery of the language is another matter. It has shimmering blue depths that, I have found, very few users ever seem to plumb. In truth, REXX has some very unusual features and concepts, which are found in few other languages, especially the better-known ones. These aspects of the language are often inadequately understood even by quite proficient users of REXX, including implementors of the language. (And I have been no exception.) The very existence of some of these subtleties, of course, goes unnoticed by the more casual users of REXX. I hope the present book will help illuminate the language in general, as well as some of its more arcane details. The time certainly seems right for this, because REXX now appears to be catching on, even though it is rather late in the game for a new language to stand much of a chance to break into the ranks of popular programming languages. @endnode @node Intr "1: INTRODUCTION" REXX is, preeminently, a language for personal programming. It is quick, easy to maneuver, and fun to drive - like a sports car. It excels at the kinds of things an individual computer user needs to do for his or her own purposes: menu systems, customized front-ends to other programs, personal utilities, prototyping of new ideas, and so forth. All popular programming languages have their own distinctive place and personality. C, for instance, is a pickup truck with standard transmission and 4-wheel drive. C++ is a pickup truck with automatic transmission, leather upholstery, and a cellular phone. Ada is a World War ][ battleship, and so on. But for writing code quickly, easily, and enjoyably, REXX is hard to beat. @{" WHAT REXX IS " link Intr1} @{" THE APPLICATION PROGRAMMING INTERFACE " link Intr2} @{" WHAT'S IN THIS BOOK " link Intr3} @{" THE REXX STANDARDIZATION PROCESS " link Intr4} @{" GETTING STARTED " link Intr5} @endnode @node Intr1 "WHAT REXX IS" REXX is a modern, structured, high-level programming language that was consciously designed for ease of both reading and writing. It was designed and first implemented between 1979 and 1982 by Mike Cowlishaw of IBM. Though it was primarily developed by one individual, it was widely disseminated within IBM during that time, and improved by the feedback of hundreds of users. REXX was first made commercially available as the system procedure language for IBM's VM/CMS operating system in 1983. When IBM'S Systems Application Architecture was announced in 1987, REXX was included as the standard system procedure language. By that fact, IBM indicated that REXX would eventually be implemented on all of their strategic computing systems. An implementation for the MVS system appeared in 1988. In 1990, an implementation of REXX from IBM appeared on personal workstations in version 1.2 of OS/2. Long before then, REXX had been implemented by others on various computers and operating systems. The first such implementation, known as Personal REXX, was developed for MS-DOS by the author and Mansfield Software Group in 1985. We followed this with a version of REXX for OS/2 in 1989. ARexx for the Commodore Amiga made its debut in 1987. At the present time, IBM has widened the availability of REXX in its systems to include OS/400. Other vendors have developed versions of REXX for UNIX and Tandem computers. Thanks largely to the relative clarity and completeness of Mike Cowlishaw's original definition of REXX, there is a high degree of compatibility among existing versions of the language. Ease of use in end-user personal programming was the predominant objective in the design of REXX. Several key characteristics contribute to this ease of use. They include: * character-string orientation * dynamic data typing (no declarations) * reliable, machine-independent arithmetic * automatic storage management * protection from "crashing" * content-addressable data structures * straightforward access to system commands and facilities * few artificial limitations In this introductory section, we will touch on each of these points. It should be mentioned, also, that REXX's ease of use does not limit its appeal to end users only. The same characteristics make it useful to professional programmers as a utility programming language for "quick and dirty" jobs, because REXX programs can be developed and debugged much faster than programs in most conventional languages, even if the user is an experienced programmer. In overall appearance, REXX is a fairly conventional language, not too much unlike Pascal, C, or other languages which trace their ancestry to Algol. This is in contrast to languages like SNOBOL, LISP, or Smalltalk, which explore very different approaches to programming. Thus, REXX has much in common with other Algol-like procedural languages - variables, expressions, control structures, subroutines, and I/O facilities. The following is a complete REXX program that prompts for a filename, then asks the user to make a selection from a menu, and executes a command corresponding to the selection. The fact that the program should require no further explanation to be understood illustrates the naturalness and readability of the language. /* execute file utilities */ SAY 'Enter filename:' PULL file_name SAY 'Choose a file operation by number:' SAY '1 - Edit' SAY '2 - Print' SAY '3 - Delete' PULL response SELECT WHEN response = 1 THEN 'edit' file_name WHEN response = 2 THEN 'print' file_name WHEN response = 3 THEN 'erase' file_name OTHERWISE SAY response 'is an incorrect choice.' END EXIT One important point about this program arises from the fact that REXX originated as a system procedure language. Specifically, the capability of executing system or application commands is an integral part of the language, rather than a function which is available (if at all) only through library routines. In other words, like a UNIX shell language or the MS-DOS and OS/2 batch language, REXX automatically passes commands to the surrounding "environment" for execution. We'll have more to say about this when we explain how REXX can be used as a universal macro language. Perhaps the most noteworthy departure of REXX from other Algol-like languages is its "natural" data typing. All data is treated as character strings. Numbers, including both integers and reals, are just special cases of strings. Numbers need to be recognized as such for computational purposes only, but no explicit conversion, no "formatting", is required for communications with humans. This alone is a major aid to usability, as any lamer who was ever baffled by a "format" statement can testify. Another consequence of this approach is that data declarations are never required (or even possible). Data declarations in other languages are really provided for the convenience of the computer, not the user. They are an accommodation of the fact that computers use a variety of internal data representations for different purposes and must be told which representation to use for a given data item. REXX isolates the user from all concern with these internal representations. A further side effect of treating all data as character strings is that there need to be no inconvenient limits on the magnitudes of numeric data items. Although seldom required, hundreds of digits can be handled in REXX as easily (for the user) as five or six. Errors both gross and subtle that result from the inability to represent a number in a particular word size simply aren't possible. This also makes REXX programs much more portable between widely different computer architectures. In particular, REXX does not cause a program failure when a computation exceeds a user-definable maximum precision. It does not even generate wildly incorrect results, as other languages usually do, when an "overflow" occurs. Instead, it discards the least significant digits of a result in order to stay within the specified degree of precision. In conventional languages, data declarations not only specify internal representations to use, but also define storage allocation. Since there are no declarations in REXX, it is not necessary to worry about allocation issues (at least as long as enough storage is available). This is another great simplification. All data items, even elements of arrays, are allocated storage automatically when, and only when, they are required. They are also deallocated automatically as soon as possible. Another pleasant benefit of REXX's dynamic memory management is that REXX is almost crash-proof, even on a CPU without memory protection. One of the most unpleasant experiences in programming for end users (or professionals) is the tendency for undebugged programs to crash themselves, or even the operating system, because they have overwritten their own code, or code belonging to other applications or the system. This is essentially impossible in REXX. (The exception is such things as functions provided explicitly to give access to external memory.) Another unusal feature of REXX is the way arrays are handled. In REXX, data variables have names which are either simple or compound. A simple name is just a sequence of alphanumeric characters that contains no periods. A compound name is composed of two or more simple names connected by periods, for example, age.fred. The portion of a compound name before the first period is called the stem; it is taken literally. The remaining portions of the name are themselves variables which can be substituted. In effect, these are subscripts. In order to work with arrays of any number of dimensions, one uses a stem followed by the appropriate number of subscripts. For instance, temperature.x.y.z is an element of a three-dimensional array called temperature. If the variables x, y, z have values [of] 1, 2, 3, respectively, this element is temperature.1.2.3. There are several important points here. The first is that no storage is allocated except for array elements that have actually been assigned values. The subscripts may be as large as necessary, but as long as there are only three elements having values, only three are stored, so the array can be very sparse. But more importantly than that, the array subscripts need not be numeric - they can have any data value at all. This permits associative indexing, win which the subscripts are general nonnumeric data [(ie. strings)]. For instance, one can have an age array whose elements include, in particular, age.fred, age.sally, etc. A computation can deal with a data reference like age.person, where person is a variable that ranges over values fred, sally, etc. It should be apparent by now that the uniform representation of data as character strings is very important in REXX. This is connected with another design objective of the language, which is to place a great emphasis on symbolic manipulation. Since most system commands and application programs interact with users - or with REXX - much more with arbitrary strings of symbols than just with numbers, this is very appropriate for a system command language. The most basic operation possible with character strings is concatenation, so REXX makes it as easy as possible to express. There are several flavours of this. The following example illustrates two of them: 'The date is:' month'/'day'/'year'.' Here, strings enclosed in quotes are literals, while month, day, and year are variable names. In this expression, all of these parts are simply concatenated together, just as written. The extra blank before month is even retained, because it is actually the operator for "concatenate with a blank in between." No explicit operator is required to express direct concatenation. (An explicit operator, "||", is provided for cases where juxtaposition alone would be ambiguous.) A large number of other character-string manipulation primitives are provided in REXX by means of built-in functions. Indeed, this is one of the most agreeable features of REXX. Included are such operations as substring, replacement, translation, verification, insertion, searching, and the like. There are even operations to reverse the characters of a string or to centre a string in a given field. Since it is frequently useful to treat a string as a sequence of "words" delimited by blanks, there are a number of functions to count and extract such words. @endnode @node Intr2 "THE APPLICATION PROGRAMMING INTERFACE" There's a lot more to REXX than what we've been able to describe so far. There will be time enough for that later. For now, let's turn our attention to how REXX facilitates end-user computing by providing a universal macro language. REXX cannot be fully understood or comprehended in terms of its language features alone. There is an aspect to REXX that is ordinarily not even considered to be a part of other programming languages. Namely, all reasonable implementations of REXX come equipped with an application programming interface. This is a set of defined interfaces that allow applications written in other languages to communicate with REXX programs in various ways. Although the details of the interface necessarily differ from one implementation to another (at least in different operating systems), certain core functionality is always present. Of course, it's not unusual for professional programmers to be able to build a single application using several different languages. However, application users do not ordinarily have the priviledge of adding their own code to the application with conventional languages like C. If users are permitted to add code at all, it is in the form of special-purpose "macro" or "script" languages provided by - and specific to - the application. What REXX offers, rather uniquely with its well-defined application programming interface, is the ability to allow end users to write application extensions in a single language - for any application that supports the interfaces. The special-purpose languages embedded in applications today are variously called macro languages, script languages, batch languages, shell languages, and so forth. Macro languages (for applications like spreadsheets and word processors) and script languages (for communications programs) are especially well known. They are designed with little but their own application in mind and usually are not very suitable for general-purpose programming. Yet, because such languages are used by far more people than just professional programmers, they actually represent the most widely used computer languages today. Though there are some fortunate exceptions, many of these languages are just as hard to use as traditional programming languages. Even so, because they were conceived and implemented with one particular application in mind, they don't have the power and flexibility of traditional languages. Furthermore, since these languages are "captives" of their respective applications, they cannot be used for other applications. They have no concept of a general-purpose, standardized interface that can be used effectively by any application or by the operating system itself. In contrast to such macro languages, and to traditional programming languages as well, REXX includes a definition of how to interface to other system components at a functional level. Probably the most unfortunate thing about such macro languages is how many of them there are - each application has its own. Yet there is no useful purpose served by having a different language for each application. While these problems are coming to be recognized and better understood, it is less well known that the problems were addressed and solved long ago with REXX. What had to be done was to define both a sufficiently rich and powerful language and a set of interfaces for communicating between that language and other applications. For this to work, the interfaces are probably more important than specific details of the language. The important thing is that REXX can communicate with any application that implements the required interface, so it can act as the single "macro" language used by all such applications. Therefore, a user need only learn a single language in order to write procedures that control a number of applications. This is precisely what happened in VM/CMS with REXX, and even more dramatically with ARexx on the Amiga. In order for this approach to achieve its purpose, many software vendors must support the same interfaces in their own applications. As a result, in VM/CMS and on the Amiga there are now many applications and development tools - editors, word processors, spreadsheets, database systems, and communication packages - that use REXX as their macro language. Something even better than being able to use a single language to control multiple applications sequentially is to be able to do this simultaneously. A language such as REXX (and the associated interfaces) that supports this can then be a "glue" that permits combining powerful, general tools together in useful and interesting ways. It is an integrated agent that makes it easy to build larger systems out of simpler building blocks. It facilitates and encourages a modular, building-block approach to application development. So, the application programming interface of REXX is a feature of importance equal to the details of the language itself. Unfortunately, this is the last we can speak of it here, because the purpose of this book is to provide a comprehensive exposition of programming in REXX. The details of how to use the programming interface to communicate between REXX and an application form an interesting subject in themselves. But they inevitably vary from one operating system where REXX is implemented to the next, so they are best described separately for each specific environment. Hopefully, though, an awareness of the existence of this interface and what is means for the ability to use REXX as a universal macro language will help motivate learning of the language itself, because once you've mastered REXX, you will be able to use the same language in many different applications. @endnode @node Intr3 "WHAT'S IN THIS BOOK" Given that caution about what is not in the book, it is appropriate to say a little about what is. To begin with, the reader may well wonder how this book relates to Mike Cowlishaw's @{i}The REXX Language@{ui} (which I will subsequently refer to simply as TRL). As the inventor of REXX, Mike's words are authoritative. His book is a very well-written, clear, and concise definition of the language. Unlike most language definitions, it is eminently readable. It has formed the basis of IBM's documentation on REXX from the beginning, up through IBM's latest "Systems Application Architecture" (SAA) language specification. My debt to Mike's book is obvious. However, it is a language definition. As such, it organizes information in a way that is not optimal for actually learning the language. For instance, all keyword instructions are described in one long chapter, and all built-in functions in another. This sort of organization tends to obscure natural functional groupings. Certain instructions and functions are ordinarily used in close association, and it is sometimes difficult to perceive such affinities in TRL. In this book, on the other hand, I have tried to group language features together by function and purpose. Character string handling, for example, occupies two consecutive chapters (string handling per se and the PARSE instruction). All information on I/O is collected into another chapter. And so forth. I hope that this effort to put related information together as much as possible will be a substantial advantage in learning REXX. In a few cases, expereicne with TRL has revealed areas of the definition that are simply unclear and incomplete. This is most especially so in the treatment of I/O. In writing my own chapter on this, I struggled time and again with questions that were just not answered by Mike's book. In this case, I have tried to point out what the open questions are. It is not up to me to provide the final answers, though often my opinion will not be well concealed. The answers will come from users of the language and will ultimately be rendered into language standards by the ANSI X3J18 committee (about which I will have a few more words later). Historically, the reason for the problems with I/O in REXX is clear. This is the only area of the language which was not part of Mike's original VM/CMS implementation of REXX. Consequently, it was not subjected to the same degree of iterative refinement that benefitted other parts of the language so markedly. It is true that Mike developed a function package for VM/CMS which implemented his specification, but it never became part of the released product. There are just a few other areas where I believe I have been able to improve on the treatment in TRL. One of these is exception handling. I believe that the exposition of this was a bit too concise and did not really explain all of the issues. Another is the PARSE instruction. It is generally recognized that Mike's presentation was just a little too informal for a language definition and has allowed a great deal of misunderstanding over the years. My own exposition is also informal, but I hope it is a little more complete and that it presents the mechanics of PARSE a little more clearly. I have in some cases decided to use new terminology or terminology that differs in minor respects from that used in TRL. The primary example of this is my choice in @{"Chapter 3" link Stru} to introduce the term @{i}statement@{ui} for what was previously called an @{i}instruction@{ui}, and to restrict the latter term to what was previously called a @{i}keyword instruction@{ui}. I have also tried to make a greater issue of the distinction (in @{"Chapter 2" link Over}) between the name of a variable and the symbol used to refer to it. All that being said, I must confess that in some respects I have deliberately attempted to be less thorough and precise about some things than TRL is. The purpose of this book, after all, is to offer instruction and advice on how to write REXX programs. It does not need to bear the burden of being a complete prescriptive document on exactly what the language is. I highly recommend that the reader turn to Mike's book for that. In fact, ideally, TRL should be read along with my book and, while it is quite possible to learn REXX without TRL, no education in REXX is complete without it. One of the facets of REXX that I don't treat rigorously is arithmetic and the precise definitions of the arithmetic operators. I feel that, for the most part, users will find that REXX does "the right thing" as far as arithmetic is concerned, and that it is no more necessary to know the exact rules REXX employs in order to use it than it is to know the exact rules of a calculator. I feel that excessive preoccupation with these rules may be a stumbling block to actually learning the language. I have not attempted to give all the details for each built-in function of the language. That is, I have not tried to defined exactly what all legal arguments are for each of the functions, nor how they will behave in all cases. That kind of thing is just plain boring. It's in TRL. And the best way to learn it is by experimenting with each new function as you use it. I won't have much to say about debugging beyond @{"Chapter 14" link Trac}. REXX is rather unusual among programming languages in that it includes a precise, formal definition of its debugging commands. It is unquestionably a good thing for a language to have built-in debugging capabilities, since that is a very necessary language feature which is often slighted. Having debugging commands included in the very definition of the language tends to ensure that all implementations will contain a consistent minimum level of debugging functionality. However, it is also true that most programming environments today have debugging tools of far greater sophistication than the minimal features that REXX prescribes. At the time of this writing, such tools have not yet become common for REXX. But it is certain that they will. @endnode @node Intr4 "THE REXX STANDARDIZATION PROCESS" A moment ago I alluded to the ANSI X3J18 subcommittee which at the time of this writing had just recently begun to draft a standard for REXX. Many present users of REXX will wonder why this is necessary, given the existence of TRL, or at least whether substantial work is needed to turn TRL into a standard. I think the answer, unfortunately, is that quite a bit of work is going to be needed. And this is true despite the high quality of the definition in TRL, and despite the fact that REXX has suffered much less from divergent implementations than have most other languages which have been in active use for a decade. One reason for this is, as I have stated, that there are certain areas like I/O that just aren't adequately defined in TRL. I/O, of course, is something that is notoriously hard to standardize across diverse operating systems. However, the purpose of X3J18 is to produce a standard which will to the greatest extent possible enable true portability of REXX programs from one environment to the next. I/O is such an important part of any program that portability is a hollow and meaningless ideal if it does not extend to I/O. Exception handling is another area in need of greater precision in order to ensure a standard for a truly portable language. Portability has, rather suddenly, become a matter of much more urgency, because just in the last two or three years implementations of REXX have appeared on so many new platforms. Since REXX has been designed as IBM's SAA "Procedures Language", IBM has introduced versions for MVS, OS/2, and OS/400. Other vendors have added versions for OS/2, AmigaDOS, UNIX, and other systems. The geuine need now exists to make many REXX programs run in several of these environments. Portability of REXX programs is of the greatest importance. Even though REXX is essentially a language for personal programming, you should not assume that means you won't have to port programs that you write in REXX to different environments. In fact, if anything, it may mean a higher likelihood of the need to port a program once or more in its lifetime. You may well find that computer technology is moving so rapidly, and computing environments are diversifying so much, that the tools you write in REXX today for VM/CMS are needed tomorrow on MVS, and the day after that on MS- DOS or OS/2 or AmigaDOS or UNIX. It is the charter of X3J18 to make that work as easily as possible. REXX already has a good head start in the portability area. As we will see, arithmetic in REXX is intentionally defined so as not to be machine dependent. This is very unlike most other languages, which rely on hardware-specific data formats and arithmetic operations. It would be unfortunate to use this inherent advantage in portability by failing to promote it in other areas, like I/O. Another reason the REXX standardization is becoming a more urgent issue than in the past is that, with increasing usage, various shortcomings in the language have become apparent. These involve a variety of things, such as the absence of a true subscript notation, difficulties in dealing with the scope of variables, and inability to iterate over all variables sharing a common stem. On top of that, the value of many of the constructs of object-oriented programming is beginning to reveal itself to programmers working in both traditional and the newer graphical user interface (GUI) computing environments. It is inevitable that some of these constructs will become available in REXX within the next two or three years. So REXX may well experience much more rapid evolution in the next few years than it has the last ten. These evolutionary changes need to be made in a principled way, and with the participation of the entire (rapidly expanding) community of people and organizations now interested in REXX. Though individual language implementators will usually spearhead the introduction of particular language innovations, it is important for these changes to be subject to impartial professional scrutiny and peer review very early on, before they become de facto standards. By ANSI rules, formal participation in subcomittees like X3J18 is open to all interested parties. Even though you or your organization must bear your own expenses of participation, you should be aware that the option is available to you - even if you are representing only yourself. If you do become interested in bringing issues to the attention of the X3J18 committee, an option that will be more practical to most is to raise your concerns directly with members of the committee. X3J18 members are accessible through a large number of electronic networks: the Internet, BITNET, the UUCP network, and commercial networks like CompuServe, BIX, and MCI mail. I, or any other committee member, can provide you with mailing addresses of other X3J18 participants. Please get in touch with us if you have concerns about the language. @endnode @node Intr5 "GETTING STARTED" It is assumed that you have access to one or more REXX implementations through your work or on your personal computer. The very best way to learn REXX is by starting immediately to use it and write programs. Since REXX is a language for personal programming, it is very likely that you already have one or more pet projects for which REXX would be a very suitable language. I recommend that you plan to begin implementing these ideas in parallel with the reading of this books. You may be surprised at how easy it is to write REXX without already knowing it! Another good approach is to take some of the examples from this book or elsewhere, enter them on your computer, and start to run them and modify them to see how they work. Use the SAY and TRACE instructions liberally to step through the examples one instruction at a time, and observe how values change and results are produced. Modify the examples and improve them. @{"Chapter 2" link Over} provides an overview of REXX that, hopefully, is sufficient to let you fruitfully begin to work with the examples in the rest of the book. Unless you already know a little REXX, @{"Chapter 2" link Over} is required reading. The material in Chapters @{"3" link Stru}, @{"4" link Cont}, and @{"5" link Subr} contains the remainder of the absolute "must know" details of the language. Beyond that point, you can read the rest of the book in just about any order. There are interdependencies among chapters, as well as forward and backward references (though hopefully more of the latter). But each of the later chapters has been written to offer, as much as possible, a free-standing treatment of its topic. A good candidate for the very first example to work with is the REXXTRY program at the beginning of @{"Chapter 12" link Inte}. It occurs late in the book because it takes some prerequisites to explain exactly how it works and the language features that it uses. But it can be run without full understanding. What it allows you to do when you run it is to enter REXX statements, one or more at a time, for immediate execution. This lets you see right away what each statement does and observe the effects of simple variations. You can see at a glance how substitution of variable values works and how REXX expressions are evaluated. It is very good for working with built-in functions to get a hands-on feeling for how each works. And so, all that remains here is to say: I hope you find REXX and this exposition of it to be useful. @endnode @node Over "2: LANGUAGE OVERVIEW" Learning a new language can be intimidating, whether it is a natural language or a computer language. For computer languages, the actual difficulty of the learning process depends a great deal on the nature of the language itself. Because ease of use was a primary consideration in the design of REXX, learning it should prove to be as easy as learning almost any other computer language, and probably easier. Many readers will already know one or more computer languages. Since REXX uses concepts which are widely used in other popular, contemporary languages, it should be especially easy to learn. That is, language constructs in REXX such as literals, variables, arithmetic expressions, conditional statements, subroutines, and so forth are very similar to their counterparts in other languages like C, BASIC, or Pascal. There are, however, some constructs in REXX which are both powerful and not present in most common languages. These include such things as compound variables and the PARSE instruction. Experienced programmers can use much of their present programming knowledge and will want to focus on the distinctive features of REXX. On the other hand, for beginning programmers there are many concepts that have to be learned regardless of which language one starts with. Since REXX is intended to be used as a system command language (among other things), it will in fact be the first serious programming language that many people learn. The book @{i}Modern Programming Using REXX@{ui} by Robert O'Hara and David Gomberg is highly recommended. It uses REXX to provide an introduction to programming. While it is beyond the scope of the present text to teach programming, beginners will find that much of the REXX language is simple, intuitive, and a natural step from using operating system batch files and application macros. @{" GETTING STARTED " link Over1} @{" PUTTING THE ELEMENTS TOGETHER " link Over2} @{" PROGRAM STRUCTURE " link Over3} @{" THE REXX DATA MODEL " link Over4} @{" SCOPE OF VARIABLES " link Over5} @{" STRING MANIPULATION AND PARSING " link Over6} @{" OTHER FEATURES OF REXX " link Over7} @endnode @node Over1 "GETTING STARTED" A complete REXX program may consist of as little as a single line, for instance: SAY "hello world" This program displays the words hello world on the screen. Here, SAY is a REXX reserve word that begins an output instruction. "hello world" is a literal string. A single blank separates the verb from the literal. SAY is not always a reserved word in REXX. It is reserved and has a special meaning only when it is the first word of a clause. SAY could be used elsewhere in the program, even in the same clause, without any conflict (though this may not be good REXX style). There are two fundamental program units in REXX: the statement and the clause. There are three types of statements: REXX instructions which begin with keywords such as SAY, assignments of some value to a variable, and commands to external environments. A clause is a slightly smaller unit of executable code. Most statements, including assignments and commands, are single clauses all by themselves. Most keyword instructions are also single clauses, but a few (such as IF and DO) are more complex and consist of multiple clauses. Also, for technical reasons, a label is considered to be a clause which is neither a statement by itself nor part of a statement. We will go much furhter into these details in this chapter and the next, but for now you can think of a REXX program as a sequence of statements. In REXX terminology, the example above consists of a single statement which is also a single clause. Although clauses and lines are not the same thing, in general, REXX programs tend to be written as if they were. That is, the usual practice is to put at most one clause on a line. Multiple clauses can be written on a single line by ending each clause with a semicolon, but this usually decreases the readability of the program. It is possible to write REXX programs consisting of thousands of clauses with each one on a separate line and without using a single semicolon. So, the end of a line normally means the end of a clause as well. However, most REXX implementations place a limit on the length of a single line. Most editors or word processors used to write programs do, too. And for readability (always a very important concern with REXX), it is desirable to keep lines shorter than the width of the screen or editing window. The length of a clause in most REXX implementations is also limited, but it is usually longer than 80 characters, which is the width of most screens. Therefore, some mechanism is necessary for continuing a single clause onto additional lines. This is done by ending each line to be continued with a comma: SAY "To be or not to be:", "that is the question." This example will display the two quoted phrases on the same line of the screen, because it is really just one clause as far as REXX is concerned. It is equivalent to: SAY "To be or not to be: that is the question." Blanks play a very special role in REXX - several roles, actually. One or more blanks separate individual tokens in a REXX clause, just as in the preceding example where a blank separated the verb and the literal. Except within literals, it is irrelevant whether one or more than one blank is used to delimit tokens. Thereofre, it is common to use blanks to help format a program for greater readability. Above, for instance, they were used to line up the quoted phrases. One of the design goals of REXX was to make a language in which it is especially easy to work with character strings, and to do so in as natural a way as possible. One of the most frequent operations on character strings is concatenation, in which two strings are joined together. This is expressed in REXX simply by writing the two strings together with one or more blanks in between. So, a third way to write the example we have been using is: SAY "To be or not to be:" "that is the question." Here, there are three tokens in the clause: the verb and two literal strings. When this statement is executed, the two literals are concatentated, and a single blank is inserted between them. Only one blank is inserted in the concatenation regardless of how many blanks separate the literals. A REXX clause can contain other types of tokens. The verb SAY is a special case of a symbol token. Such a token begins with an alphabetic character and extends to the next delimiter. Delimiters are either blanks or special characters like ":", ";", "*", "+", and "\\". Such delimiter characters may be tokens all by themselves (as are each of the ones just mentioned), or they may be the start of longer operator tokens such as "\\=" (not equal). Operator tokens include the standard arithmetic operators ("+", "-", "*", "/", "**"), string operators such as "||" (concatenation) and ">" (comparison), and logical operators such as "&" (logical and) and "\\" (negation). Numbers are the other major sort of token. A number is delimited like a symbol, but it consists only of numberals, ".", "+", "-", and "e". Numbers can be integers, decimals, or in exponential notation, and may be signed. The following are all valid numbers in REXX: 666 2.718281828 -32768 6.63E-27 @endnode @node Over2 "PUTTING THE ELEMENTS TOGETHER" We now have enough terminology to examine a more interesting REXX program. The task is to convert temperatures expressed as Fahrenheit to Centigrade. The program shoudl ask the user to enter a number representing the Fahrenheit temperature and display the corresponding Centigrade value. It should terminate when the user enters nothing but blanks. Here is the program: [1] /* Convert Fahrenheit to Centigrade */ [2] DO FOREVER [3] SAY 'Enter temperature in Fahrenheit:' [4] PARSE PULL fahrenheit [5] IF fahrenheit = '' THEN [6] EXIT [7] centigrade = 5 * (fahrenheit - 32) / 9 [8] SAY 'Temperature is' centigrade 'degrees C.' [9] END Let's look at what's new here, line by line. The first line is a comment. As in C and PL/I, comments in REXX begin with /* and end with */. Comments may extend over as many lines as necessary without the need to do anything special to indicate continuation. Beginning every program with a comment is recommended as good programming style. It is also required in some implementations of REXX [such as ARexx] in order to distinguish the file from others that may use a different language. The second line (DO FOREVER) continas the REXX reserved word DO as its first token. This beings a loop which extends to the matching END instruction on the last line. Any other REXX statements can occur between the DO and the END. This includes other DO statements, so that loops can be nested up to an implementation-defined limit, each beginning with DO and terminated with END. A DO statement may have a number of optional modifiers that specify a control variable to be incremented on each iteration, or define conditions on when the loop should be ended. In this case, the keyword FOREVER means that the DO statement itself has no specified terminating condition. Other instructions within the body of the loop can cause it to terminate. In this example, that is the function of the EXIT instruction on the sixth line. The third line is a SAY instruction. In this case, the literal string has been delimited with single quotation marks (') instead of double ones ("). This is done as a convenience in case quotation marks need to be part of the literal, as in: '"Jump!" he said.' That is, all literal strings are terminated with the sort of quotation marks with which they began. An exception to this is made if the quotation character is doubled. Within a literal that begins with the same sort of quotation mark, the doubled one is treated as if it were a single one that is part of the literal, so that: """Jump!"" he said." is exactly the same literal. The next line, PARSE PULL fahrenheit, reads user input. It is the most common input instruction in REXX, corresponding to the output SAY instruction. PARSE PULL causes a read to the keyboard, which is terminated when a carriage return (ENTER) is pressed. Everything typed up to (but not including) the carriage return is assigned to a REXX variable, fahrenheit in this case. There is actually a shorter input instruction, PULL. This does input just like PARSE PULL, but has the additional side effect of converting the input to uppercase. (PULL is shorthand for PARSE UPPER PULL). Sometimes this is useful; more often it is annoying. Here is doesn't matter, since the input should be a number. A REXX variable like fahrenheit is any symbol beginning with an alphabetic character, except for a reserved word occurring at the start of a clause. (In certain kinds of statements, such as DO, there may be reserved words after the first token.) Variable names may be very long - up to 250 characters in most implementations of REXX. REXX is generally not sensitive to alphabetic case. So symbols can be written in either upper- or lowercase, or any mixture, and no distinction is made. The convention in this book will be to use uppercase for REXX reserved words. The most obvious instance in which case matters is in character string literals. Internally, REXX converts the whole program (except for literals and comments) to uppercase. The next two lines of the example: IF fahrenheit = '' THEN EXIT begin with an IF instruction that performs a test and executes other instructions according to the results. Technically, REXX considers this statement to consist of several clauses. The first beings with IF and concludes with an expression that must evaluate to 1 or 0. 1 represents true, and 0 represents false. In the present case, the expression uses the "=" operator to compare the value of the variable fahrenheit to the literal string '' (null or empty string). One of the convenient characteristics of the "=" (equality) operator when comparing strings is that leading and trailing blanks are ignored. This behaviour of ignoring blanks in circumstances where they are not meaningful is common throughout REXX; it is one of the ways in which REXX tries to be helpful. In the present instance it is useful because the user is expected to type either a number or nothing. Several blanks are taken as equivalent to nothing; it is considered irrelevant whether any blanks are entered before or after the number. Blanks are in fact stored internally. The "=" operator just happens to ignore them. Blanks in the middle of a string are not ignored. If it is important to recognize leading or trailing blanks, another operator ("==", exact equality) can be used. THEN is a reserved word in an IF clause. In fact, it is considered to be a separate clause all by itself. Its purpose it to mark the end of the conditional expression. The statement immediately following THEN is executed if (and only if) the expression after IF evalutates to 1. Here, that is the case, provided that the expressions on either side of "=" are the same except for leading and trailing blanks. (Alphabetic case is significant to the "=" operator.) The expression here is evaluated by taking the value of the fahrenheit variable and comparing it to a null string. If this value contains any nonblank characters, the value of the whole expression is 0, and the statement after THEN would not be executed. In the present example, the clause after THEN is another REXX reserved word, EXIT. This instruction terminates not only the DO loop but the entire REXX program, and allows control to return to the operating system. The statement after THEN could be any legal REXX statement, including another IF statement. In case several statements need to be executed when the condition is true (ie. 1), they can be enclosed between a DO...END pair. The end of the IF statement is recognized because the line following EXIT begins with a token other than the reserved ELSE, which would be used if there were a statement to be executed in case the conditional expression has the value 0. Since there is no such statement in this example, the next statement will always be executed. The next statement here: centigrade = 5 * (fahrenheit - 32) / 9 is an assignment, recognized by the presence of the assignment operator as the second token in the clause. Here, the value named centigrade is given the value of the expression on the right-hand side. In this case the expression is purely an algebraic one. Although the value of the variable FAHRENHEIT was stored as a character string, and could be used as such if appropriate, here it is automatically treated as a number. This illustrates one of the key characteristics of REXX: as little distinction as possible is maintained between different data types until they are actually used. REXX neither requires nor even possesses data type declarations. All data is stored internally as a character string (at least in principle). Only when values are actually used are any conversions made - if necessary and if possible. Of course, only certain character strings represent numbers, and conversion may not be possible. In our example, the user may have entered a nonnumeric value such as OK, which would cause an error when the expression is evaluated. Although the example has no error checking, REXX has various ways to do checking, and a serious program certainly should have appropriate checks. Since the algebraic expression involves division, the result will probably not be integral. REXX takes care of the decimal part automatically, even though no declarations were used to distinguish between integer, fixed point, or floating point numbers. The default precision that REXX supports is implementation dependent, but frequently it is nine digits (apart from the exponent, if any). If more or less precision is required for a particular purpose, it can be requested. The maximum precision that REXX supports depends on the implementation, but can be very large, even thousands of digits. The last interesting line of the example is: SAY 'Temperature is' centigrade 'degrees C.' The part of the clause after SAY is actually an expression consisting of the concatentation of two literal strings with a variable name. The value of centigrade is a computed number, but, because it occurs in a character string expression, it is automatically converted to the printable character representation of the number. Concatenation of strings is implicit in the expression; no explicit operator is required. The string concatenation automatically includes a blank between each operand as a convenience. The result displayed by this instruction might be, for instance: Temperature is 37.7777778 degrees C. This would be the result of converting 100° F. The result has been expressed with exactly nine digits of accuracy and has even been rounded up appropriately. No complicated formatting directions are required to produce this output, so such common forms output are simple and natural to write. However, there are ways to display the answer with less precision if desired. Concatenation works the way it does in order that the form of the expression in the program resembles the final output as closely as possible (ie. with blanks inserted). In case concatenation without [an] intervening blank is required, an explicit operator ("||") can be used. Thus: SAY 'Temperature is '||centigrade||' degrees C.' would produce the same results. Notice that the blanks have simply been moved inside the literal strings. In many cases (and this is one of them) the explicit operator can be dispensed with, so that: SAY 'Temperature is 'centigrade' degrees C.' also produces the same results. REXX parsing rules still allow this expression to be resolved into three tokens (literal, symbol, literal) with an implied concatenation operation between tokens. But because there is no blank between the tokens, none is added by the concatenation operator. Although the detailed rules of how REXX handles such expressions are somewhat complicated to fully enumerate, the net result is intuitively just what it should be, with a minimum of required symbols. @endnode @node Over3 "PROGRAM STRUCTURE" Viewed from a top-down perspective, a REXX program consists of a series of statements. There are several different kinds of statements. The example we just examined consisted predominantly of instruction statements beginning with REXX keywords like DO, SAY, PULL, and IF. The only other statement type present was one assignment, distinguished by the presence of the = sign as the second token. There are a couple of other program elements not illustrated yet. Labels, which consist of a symbol followed by a colon, may be mixed in among statements. Labels are used to define the start of an internal subroutine or function. Finally, there is a third type of statement, the command, which is unique to REXX among popular languages. This is literally the "everything else" category. REXX assumes that anything which is not an instruction, assignment, or label is a command. A command is not interpreted by REXX itself. Instead, it is passed to an external environment like an operating system or application programming that is equipped to process commands. In a typical operating system like AmigaDOS, COPY, DELETE, and SORT are examples of commands. REXX's ability to handle commands is very important, since it makes it possible to write batch procedures and application macros or scripts in REXX. Although other languages may provide such capability through library functions, it is seldom an intrinsic part of the language as in REXX. Hence, other languages are unable to handle commands as naturally as REXX does. Since a command is just another type of statement, commands in effect provide a way to extend the language by defining new directives. For instance, when REXX is used as a language for writing scripts for a communication program, the program itself may provide commands like SEND and WAIT. In use, these can be regarded almost as part of REXX itself. The benefit to an application program in employing REXX as a script or macro language is that the application needs to supply only its own specific commands, while all of the standard language facilities like variables, arithmetic, looping, and subroutines are provided by REXX. Although REXX doesn't ultimately execute operating system or application commands, it can process command statements by substituting variable values and evaluating arithmetic or character string expressions. An example of this might be: 'copy' source_file destination_file options This statement is actually a REXX expression that begins with a literal string and is followed by three symbols. The symbols are variable names for which the current values are substituted. Finally, all strings are concatenated with single intervening blanks to produce a result. This statement is a command because it is not any of the other three types, so it is then passed to the default execution environment for execution. Since commands are application-specific, they play no further part in the structure of a REXX program as such. The other three clause types (instructions, assignments, and labels) do, so we will concentrate our attention on them. There are currently about 25 (depending on how they are counted) different instruction types. Some of these instructions, specifically CALL, DO, EXIT, IF, ITERATE, LEAVE, RETURN, SELECT, and SIGNAL, define the flow of control within a REXX program. That is, they provide for testing, iteration, and subroutines. The remaining instruction types perform diverse functions like I/O (SAY, PULL), string parsing (PARSE), variable handling (DROP), and debugging (TRACE). Assignment statements can conveniently be regarded as another type of variable handling instruction, even though an assignment doesn't begin with a REXX keyword. Testing is the simplest control flow construct. As illustrated in the earlier example, REXX uses an IF...THEN...ELSE... format for this fundamental operation. The ELSE part of this construct is optional. There is no specific keyword (such as ENDIF in other languages) required to end an IF statement. Context and REXX syntax rules are sufficient to handle IFs unambiguously. A complete IF statement might be: IF time = 0 THEN SAY "Cannot compute speed." ELSE SAY "The speed is" distance/time "km/hr." RETURN Here, RETURN (to return from a subroutine or main program) will always be executed, because only one statement can ordinarily follow THEN or ELSE. The exception to this, if multiple statements are required to follow THEN or ELSE, is a series of statements bracketed by DO...END. For instance: IF time = 0 THEN SAY "Cannot compute speed." ELSE DO SAY distance "travelled in" time "hours." SAY "The speed is" distance/time "km/hr." END RETURN As always, the intendation is used only for clarity. As far as REXX is concerned, all statements could begin at the left margin. It is, however, necessary to place the ELSE on a new line unless the clause following THEN is terminated with a semicolon. IF statements can also be nested, as is commonly done when several alternatives must be handled: IF hour < 12 THEN SAY "Good morning!" ELSE IF hour < 18 THEN SAY "Good afternoon!" ELSE SAY "Good evening!" Here, the second IF statement occurs following the first ELSE. DO...END pairs can also be nested within each other and within IFs as appropriate. When a program must provide for testing many alternatives, a better way to do it than with nested IF statements is with the SELECT instruction. As with IF, the SELECT instruction really begins a compound statement, which might be something like: SELECT WHEN country = 'Austria' THEN composer = 'Mozart' WHEN country = 'Russia' THEN composer = 'Tchaikovsky' WHEN country = 'Finland' THEN composer = 'Sibelius' OTHERWISE composer = 'Beethoven' END Here, after SELECT there is a series of WHEN...THEN... pairs, concluding with an OTHERWISE and finally an END. As with IF, some true/false condition follows each WHEN, and a statement to be executed when the corresponding condition is true follows THEN. Only the statement following the first true condition is executed. If none of the WHEN conditions is true, the statement following OTHERWISE is executed. If multiple statements need to be executed for each condition, they are grouped within DO...END pairs after THEN. SELECT statements can be nested within themselves and IFs in any combination up to some implementation-defined level of complexity. The next major type of control flow is iteration, ie. looping. All REXX loops begin with a DO instruction. By itself, DO is used together with END to group statements. In that case, the enclosed statements are executed only once. However, there are more complex forms of DO that provide for a wide range of loop control. Loops may repeat the statements within their range a specific number of times (or forever). Loops may have a control variable that is incremented each time through. Loops may, finally, have logical conditions that cause them to terminate. For instance, to sum n terms of a geometric series of powers of a variable x: sum = 0 DO i = 1 TO n sum = sum + x ** i END (** is the expression for exponentation.) Here i is the control variable. It is intialized explicitly to 1 and incremented (implicitly) by 1 each time through the loop, terminating at the point where it would exceed the value of the variable n. Expressions could be used instead of constants or variables for the initial and limit values. Increments other than 1 can be specified explicitly. If initial values, increments, and limits for a control variable are specified by a variable or expression instead of a constant, the quantities are evaluated only once, before the loop is first executed. Sometimes it is desirable to decide whether to continue a loop based on quantities that are evaluated each time through. In that case, a subclause consisting of either the keywords WHILE or UNTIL (or both) followed by an expression may be used. Thus, if we wanted to sum a geometric series up to the point where each summand is less than some limit, we might revise the preceding example to read: sum = 0 DO i = 1 BY 1 UNTIL x ** i < 1e-10 sum = sum + x ** i END (x must have a value less than 1 for this to work. And, of course, this example is needlessly inefficient because it evaluates x ** i twice.) Conceptually, the difference between WHILE and UNTIL is that the former is tested at the top of the loop and the latter at the bottom. Note that no limit value for the control variable was supplied with a TO expression, but an increment was explicitly specified with a BY expression. A DO loop does not need to have a control variable. Instead, it may simply specify how many times the loop is to be executed, using a constant, variable, or full expression. As we saw in the temperature conversion example above, this repetition count can even be FOREVER. In this case, some means is still required for getting out of the loop. Any DO loop will be terminated by a RETURN instruction (which returns from a subroutine) or an EXIT instruction (which ends the REXX program). Another, less drastic way to get out of a DO loop is provided by the LEAVE instruction. It is particular useful in a loop that is processing user input, for instance: DO FOREVER SAY 'Enter next command:' PULL command IF command = 'QUIT' then LEAVE /* other command processing */ END Here, if the user types quit, the response is converted to uppercase by PULL, the LEAVE instruction is executed, and control passes to the next statement after the END instruction. A similar requirement in a loop is to be able to go back to the start of the loop from somewhere in the middle, ie. before the END instruction. This need is also frequently encountered when processing user input: [1] DO FOREVER [2] SAY 'Enter a number between 1 and 10:' [3] PULL number [4] IF number < 1 | number > 10 THEN DO [5] SAY 'Number entered was out of range.' [6] ITERATE [7] END [8] /* process the number entered */ [9] END In this example, "|" is the symbol for logical or: the compound condition in the IF instruction is true if either number < 1 or number > 10 is true. If the condition is true, the two instructions bracketed by a DO...END pair are executed. The second of these is ITERATE, which goes back to the start of the repetitive DO instruction. Note that the DO instruction on the fourth line of the example has no control variable, repetition count, or UNTIL/WHILE condition. So it is not a repetitive DO; it does not introduce a loop; and it is ignored by LEAVE and ITERATE instructions. These instructions always refer to the innermost repetitive DO loop in which they occur. The third and last type of control structure in REXX is the procedure. A distinction is sometimes made between two types of procedures: subroutines and functions. A function must return a value, while a subroutine usually does not. However, in REXX the same procedure may sometimes return a value and sometimes not, so this distinction isn't always relevant. In most languages, procedures are identified unambiguously by the language syntax, by using special keywords or symbols. This is not the case in REXX. A procedure must begin with a label (ie. a symbol followed by a colon), but not all labels introduce procedures. Whether or not a given label actually introduces a procedure depends on how it is used. There are two ways a label can be used so as to be the beginning of a procedure, corresponding to the distinction between subroutines and functions. A subroutine procedure is invoked with the CALL instruction: CALL get_user_input /* other processing */ ... get_user_input: SAY 'Enter data:' PARSE PULL data RETURN The name of the procedure here is get_user_input, which is the label before the first instruction of the procedure. A function procedure is invokved by using the name of the procedure in a function reference within a REXX expression. A function reference consists of a symbol followed immediately (with no blank spaces) by a left parenthesis. The preceding example could be modified slightly to use a function reference instead of a subroutine call: answer = get_user_input() /* other processing */ ... get_user_input: SAY 'Enter data:' PARSE PULL data RETURN data There are several things to notice about this example. Although the procedure has no parameters, it was necessary to use parentheses around an empty argument list in order to identify it as a function reference. Also, the RETURN instruction that ends the procedure contains an expression (here just a variable name) that is returned as the value of the function. In the example, the returned value is then assigned to another variable. There is one subtle point to note about REXX procedures. Unlike the END of a DO...END pair, the RETURN instruction in a procedure does not have a syntactic function. That is, the RETURN does not necessarily constitute the last line of the procedure. This is particularly true if the RETURN occurs inside some conditional (IF or SELECT) construct. In fact, in REXX it's not really possible to identify syntactically where the end of a procedure is to be found. The end of a procedure, as well as the beginning, is defined purely by the flow of control during execution rather than by syntax. This circumstance can be a source of confusion and errors in REXX programs, so programmers need to take extra care, using comments and blank lines, to make it very clear to a reader where a procedure begins and ends. It is not necessary that all procedures used in a REXX program be defined (by a label) within the program. In the first place, REXX comes with a large number of predefined built-in functions. These functions deal mostly with input/output and with character string manipulation. The character string functions are noteworthy because they augment the already strong support REXX provides for working with character strings. Many character string operations, like substring, character replacement, and blank-delimited token processing, are implemented as built-in functions. Many less common operations are, also. This support adds up to a great deal of power and flexibility for handling character strings with REXX. In addition to built-in functions, REXX also allows for external procedures, ie. procedures defined in other files. This gets into an area of the language that is implementation dependent. Most implementations of REXX allow invoking an external procedure as either a subroutine or function by using the filename in the CALL instruction or function reference. Procedures that are internal to another file, ie. defined by a label within the file, usually cannot be invoked in this way. However, other implementation-specific mechanisms such as function packages are usually available for allowing REXX programs to access external procedures, which may be either system-side or part of a specific application, and which may be written in other languages. @endnode @node Over4 "THE REXX DATA MODEL" Now that we know at the highest level what the structure of a REXX program is, it's time to look more closely at how REXX manages data. There are two primary fats to remember about the REXX data model. The first is that all data is stored (conceptually at least) as character strings. That is, REXX in general does not recognize data types. All data in REXX, without exception, can be handled as a character string. It can be concatenated with other strings. String operations like substring can be performed on it. All data can be input and output without the need to perform conversions. Certain operations in REXX, like arithmetic, do require the data to be understandable as a number, and will give an error if it isn't. But conversions in such cases are implicit and automatic. Even when data has to be treated as numeric, the user is relieved of the requirement present in other languages to be concerned with the internal representation of the number. That is, there is no distinction made between integer, binary, decimal, or floating point representations. Indeed, there is little need to be concerned with the precision of a number, ie. the size or number of significant digits in it. By default, REXX allows for nine significant digits. If necessary, this default limit can be raised, subject only to limits of the specific implementation and available space. The second primary fact about data in REXX is that the language makes no provision at all for declaring data. In other languages, data must usually be declared for at least three reasons: to specify the type of the data, to specify the amount of storage required for the data, and to specify the name used to access the data. All of these reasons are in reality designed for the convenience of the language processor rather than for the convenience of the user. REXX handles each of these details automatically. As just explained, it handles any necessary conversions implicitly. It automatically manages storage allocation. And it can always recognize variable names from context. Since REXX eliminates these needs for declaration of data, the language does not have any data declarations. All data items in REXX are referred to with a symbol name. REXX has no other way, such as pointers, to access data. This makes REXX very safe to use, since it is impossible to reference memory that has not been allocated. REXX symbols are tokens that contain only the upper- and lowercase alphabetic characters, numerals, and certain special characters ("!", ".", "?", and "_"). Not all valid REXX symbols can be used as the name of a variable, but the precise rules are not important at this point. Variable names can usually be quite long, though this is implementation dependent. Commonly the limit is 250 characters or more. REXX always converts symbols to uppercase before interpreting them. A REXX variable acquires a value when it it the target of an assignment statement or in a few other specific cases such as PARSE. Such a variable is said to be initialized. It is legal in REXX to use uninitialized variables. This is because any uninitialized variable is assumed to have a value that is the same as its name. Although there are a few times when this convention is convenient, it is usually not any more advisable in REXX than it is in any other language. While a REXX program will never crash simply because it refers to an uninitialized variable (as can happen in many languages), it certainly may malfunction and give incorrect results. Though not the default, it is possible to force REXX to raise an error condition when an uninitialized variable is used inadvertently. There are two kinds of variables in REXX: simple and compound. So far, all examples we have presented use simple variables. These behave much like variables in any other language. The other kind of variable, [the] compound variable, is one of the most significant and characteristic features of the language. Compound variables are similar to arrays in other languages, but with significant differences (as well as advantages and disadvantages). A compound variable is referred to with a symbol that contains one or more periods in it, such as: array.i two_dimensional_array.i.j database_record.type.field.name Each part of such a symbol is a simple symbol. We may speak of simple and compound symbols as (respectively) those that do not or do contain a period. There is a fundamental distinction in REXX between the symbol that refers to a variable and the actual name of the variable, although it is relevant only for compound variables. While it is true for simple variables that the symbol which refers to the variable and the variable's name are the same, this is not true for compound variables. Let us agree to call each portion of a compound symbol delimited by periods a node. The first node, up to the first period, is called the stem. The rule for mapping a symbol to a variable name is as follows: for each node in the symbol except the stem, substitute the value of the variable named by the corresponding simple symbol. As a special case, for each node corresponding to a simple symbol which names an uninitialized variable, substitute the name in uppercase. (This is, after all, the "value" of an uninitialized variable.) The stem does not undergo substitution (but it is uppercased). The result is the name of a variable. (The periods are retained, too, so the name contains at least as many periods as the original symbol.) This name, sometimes called the derived name, is then used just like an ordinary (simple) name in whatever way is appropriate for the context. To take the simplest example, suppose: x = 1 and y is undefined. Then: foo.x = 'Renoir' foo.y = 'Monet' assigns values to two variables, having derived names FOO.1 and FOO.Y, respectively. The statement: SAY foo.x foo.y displays Renoir Monet. Many different symbols can refer to the same variable if they produce the same derived name. For instance, if: z = 'Y' then: say foo.1 foo.z produces the result Renoir Monet as before. Keep in mind that there are symbols which cannot be the names of (simple) variables. Such symbols include numbers, or any symbol that begins with a number. When symbols like this occur in a node of a compound symbol, they are used literally (after being uppercased). The symbol foo.1 is an example of this. For additional examples of the general process, suppose the following assignments have been made: i = 100 j = -30 type = '01abc' field = 'salary' name = 'H.P. Lovecraft' Then the symbols: array.i two_dimensional_array.i.j database_record.type.field.name correspond to the following derived names: ARRAY.100 TWO_DIMENSIONAL_ARRAY.100.-30 DATABASE_RECORD.01abc.salary.H.P. Lovecraft Note that the value of the simple variables that are substituted may contain lowercase letters which are not uppercased. In fact, those values may contain any characters at all, even blanks, special characters, and extra periods. So variable names may contain arbitrary characters. Because variable names may contain arbitrary characters, there are many names which cannot appear explicitly in a program. This is the case, for instance, with names that contain blanks or operator symbols. Such names can be referred to only when derived from an appropriate compound symbol. The periods occurring in the original compound symbol remain in the compound name, and additional periods may occur if they form part of the value of one of the simple variables being substituted. Unlike a compound symbol, however, a compound name should be thought of as having only two parts: the initial part up to and including the first period (the stem), and everything else (which may contain additional periods). When REXX goes to look up the value of a compound variable, it first searches for the stem. Then under the stem it searches for the suffix consisting of the remainder of the name, just as if this suffix named a simple variable in a private name space defined by the stem. This suffix is called the tail. If the resulting name is not found, the original compound symbol still refers to a value which is the derived compound name, according to the normal rules by which REXX handles undefined variables. Though the details of this process for working with compound variables are somewhat involved, REXX compound variables turn out to be a very powerful and useful facility of the language. Compound variables can be used very much as arrays are in other languages. Even so, the REXX approach has several advantages. It is not necessary (or possible) to determine the size of an array in advance; storage is allocated as needed, and there can even be large gaps in the array without wasting space. Also, though compound variables can be used as if they were arrays of a specific number of dimension, they can also be used without any specific fixed dimensionality if that is convenient. REXX compound variables have the significant advantage over arrays in most other languages in that the "subscripts" need not be numeric; they can be any valid character string (up to some implementation-defined maximum length). This permits very useful associative retrieval of data. For instance, database records pertaining to individuals can be retrieved directly by the name of the individual: individual_birthday.name individual_email_address.name individual_job_title.name These symbols might be used to work with a personnel file. To access any piece of data, it is necessary to have only the actual name as the value of the variable NAME. (All current REXX implementations keep data in memory only; they do not refer directly to external files. Therefore, this example assumes the data has been loaded in from some sort of file. But in principle REXX could transparently use disk files for its data.) @endnode @node Over5 "SCOPE OF VARIABLES" A second important aspect of the REXX data model involves the scope of variable names. To begin with, each separate REXX program maintains its variables independently of every other REXX program. In other words, REXX variables named in one program file are completely unrelated to those named in another file. In fact, one REXX program's variables are completely inaccessible from any other REXX program's variables. This is usually an advantage in working with a system of multiple REXX programs, since naming conflicts cannot occur. On the other hand, it can also be an inconvenience when sharing data is necessary. With a single REXX program, the scoping rules have to do with exactly when the same variable name refers to the same data. The only time the scope of a name is an issue is when internal procedures are invoked. In the examples of procedures already given, there has been just a single scope for all names. That is, names used in one procedure will refer to the same data when used in other procedures that the first procedure either calls or is called from. Such names are global, and this is the default in REXX. It is often convenient, however, to use the same name for different data in different procedures. For instance, it is common to use variables like i and j as loop control variables. In fact, it is both a nuisance and a frequent source of errors to have to provide unique names for loop variables in all procedures. Further, to avoid unintended side effects, it is usually good practice to isolate separate procedures from each other by giving each a unique "name space" and eliminating the possibility of variable naming conflicts. Therefore, REXX provides a way for any procedure to hide its own variables from any procedure which invokes it. This is done with the PROCEDURE statement. If used, it must immediately follow the label which names the procedure: /* compute the area of a circle */ area: PROCEDURE ARG radius pi = 3.14159 RETURN pi * radius ** 2 In this example, there is one new instruction, ARG, whose purpose is to assign the procedure's argument to the variable RADIUS. Because of the PROCEDURE instruction, all variables in this procedure (RADIUS and PI) are local to the procedure and distinct from any variables with the same name in the calling procedure. In particular, they are undefined until they receive a value from an assignment or an instruction like ARG. The variables used in the calling procedure are hidden from the called procedure and from any procedure which the called procedure might call in turn. Likewise, the variables of the called procedure are hidden from the caller. However, the variables of the called procedure are not necessarily hidden from any procedures it calls, unless the latter also begins with a PROCEDURE statement. Although purely local variables are probably preferable as the rule, global variables are often very useful as an exception. It is possible to use an option on the PROCEDURE statement to explicitly name variables that are to be shared. For instance, it's inconvenient and inefficient to assign a value to PI in every procedure that uses it, since PI is really a constant. Therefore, one would normally assign its value just once in the main procedure and use the EXPOSE option to make it available in procedures that need it with the instruction: PROCEDURE EXPOSE pi If there is a chain of several calling procedures, each must EXPOSE any variables that are to be shared. REXX scoping rules are dynamic in nature, rather than static. This means that it is not possible to determine by a syntactic analysis of the program when a given name actually refers to the same data. Instead, this always depends on the exact sequence of procedures which are called. In this case, assuming PI is first assigned in the main procedure, the same data will be available to the AREA procedure only if all other procedures in the calling hierarchy either do not use a PROCEDURE instruction or else explicitly expose PI. A related issue is the way in which arguments are passed to procedures. Every procedure (or function) call may supply zero or more arguments. For a function call, the arguments are in the form of a list of values, separated by commas, and all enclosed in parentheses. For instance, to use the AREA function defined above, one might have: PULL radius SAY 'The area of the circle is' area(radius) In REXX, arguments are always passed by value. This means that arguments are evaluated when the procedure is called and only the resulting value is available to the procedure. The procedure can change the value of a variable which happens to be used as an argument only if no PROCEDURE statement occurs in the procedure or if the variable is explicitly exposed. Even if the value of a variable is changed in this way, the value passed as an argument is not changed, since it was computed when the procedure was called. Notice that in the current example, the symbol RADIUS refers to different variables in the called and calling procedures because of the PROCEDURE statement, and the fact that the same symbol is used is merely a (possibly confusing) coincidence. REXX procedures which are not functions, ie. do not return values, are invoked with the CALL instruction, and their arguments are also specified as a list of values, but the list is not enclosed in parentheses. (This is a source of frequent confusion in REXX.) If we changed the area example slightly so that it was simply a procedure invoked only for its side effect, it might look like this: PULL radius CALL area radius /* other code */ ... area: PROCEDURE EXPOSE pi ARG radius SAY 'The area of the circle is' pi * radius ** 2 RETURN There are several ways of accessing the arguments passed to a procedure. So far, we have illustrated only the ARG instruction. Although it has the appearance of a declaration, it is not. Instead, ARG is an executable instruction which causes the assignment of the first argument to the named variable, just as if an "=" assignment operator were used. If the variable in question happened to be exposed, its value would be changed even in the calling procedure. If the procedure has more than one argument, then more than one variable name can be used in the ARG instruction, each name separated from the others by commas. For instance, we might modify our example to display the area of a rectangle instead of a circle: PULL height width CALL area height, width ... area: PROCEDURE ARG ht, wd SAY 'The area of the rectangle is' ht * wd RETURN ARG assigns arguments to variables in the same order as they occur in the argument list. ARG, like PULL, is a special case of the general REXX PARSE instruction, because it is just shorthand for PARSE UPPER ARG. Hence, it can do interesting character string parsing as well. But in the most common case, as illustrated here, there is a one-to-one correspondence between arguments passed and variables. In this case, you might just remember that there should be as many commas in the list following ARG as there are in the argument list that is passed. The other main way of accessing arguments is with the ARG() built-in function. The argument passed to ARG() is the number of the argument passed to the current procedure, and its value is the value of that argument. Here is a completely equivalent form of the last example: PULL height width CALL area height, width ... area: PROCEDURE say 'The area of the rectangle is' arg(1) * arg(2) RETURN @endnode @node Over6 "STRING MANIPULATION AND PARSING" One of the most important strengths of the REXX language is its character string handling ability. As noted earlier, REXX has no explicit data types and all data can be manipulated as character strings. This is not a limitation for most applications where REXX is naturally used (application macros, command procedures, prototyping, etc.), and is actually quite convenient. Further, because REXX specializes in handling character strings, it does it very well and offers many built-in facilities for this purpose. The most frequent string operation, concatenation, can be expressed with a simple operator ("||") or in many cases none at all (direct abuttal of tokens). Equality and comparison operators for strings are the same as for numeric values, and the distinction is usually immaterial. REXX even tries to work with strings in a way that is most natural in ordinary applications, so leading and trailing blanks are ignored in the standard equality and comparison operators. Alternative "exact" equality and comparison operators are also available when leading and trailing blanks should not be ignored. String handling is greatly facilitated by the fact that storage allocation and management in REXX is completely automatic. It is never necessary to specify the (maximum) length of a string or to allocate space for it. Providing temporary storage for intermediate results is also handled transparently, and there is no need for "garbage collection". REXX has two other significant features designed for manipulating character strings. The first is a collection of string-oriented, built-in functions and the second is the PARSE instruction. A number of REXX's string handling functions provide services commonly available in other programming languages. Some examples are: SUBSTR() substring of argument string LENGTH() length of argument string POS() position of one argument string in another COPIES() arbitrary number of copies of argument string REXX string functions extend far beyond such standard capabilities, however. One interesting group of functions is based on the frequently occurring situation of regarding a string as a sequence of words delimited by blanks. Strings of this sort include natural language text (after punctuation is removed) as well as short lists ("bread eggs butter onions tomatoes"). In this category are functions like WORD(string,n), which returns the @{i}n@{ui}th word in the string, and WORDS(string), which returns the total number of words in the string. There are quite a few other string functions for miscellaneous purposes, some of which have surprisingly powerful capabilities. Among these are COMPARE(), which determines whether or not two strings are identical and otherwise returns the first position in which they differ; INSERT(), which inserts one string at an arbitrary position in another; STRIP(), which removes any specific character from the beginning or end of a string; VERIFY(), which tests a string for the occurrence or nonoccurrence of a specified set of characters; and TRANSLATE(), which replaces any desired characters with specific others. To show a bit of the flavour of string handling in REXX, here is a little program that takes a time in the form HH:MM (hours and minutes) and displays the value in English: PULL hours ":" minutes numbers = "one two three four five six seven eight", "nine ten eleven twelve" teens = "eleven twelve thirteen fourteen fifteen", "sixteen seventeen eighteen nineteen" tens = "ten twenty thirty forty fifty" hr = WORD(numbers,hours) SELECT WHEN minutes = 0 THEN min = "o'clock" WHEN RIGHT(minutes,1) = '0' THEN min = WORD(tens,minutes % 10) WHEN LEFT(minutes,1) = '0' THEN min = "oh-"||WORD(numbers,minutes) WHEN LEFT(minutes,1) = '1' THEN min = WORD(teens,minutes - 10) OTHERWISE min = WORD(tens,minutes % 10)"-"||, WORD(numbers,minutes // 10) END SAY 'Time is' hr min'.' For instance, when the input is 10:33 this program displays: Time is ten thirty-three There are a few features of REXX used here which haven't been explained yet, such as the use of a literal in the PULL instruction and the "%" (integer division) and "//" (remainder) arithmetic operators. However, apart from illustrating string handling in REXX, the main point to be made here is how transparently REXX deals appropriately with data as either numbers or strings. Arithmetic can be performed directly on character strings when appropriate. In particular, notice how the variable MINUTES can be used as easily with string functions (RIGHT(), LEFT()) as with numeric operators. Of course, a real program would have error-checking to ensure that only valid numbers are involved. This example also illustrates how one often uses lists of words separated by blanks instead of arrays. The WORD() built-in function is used to access specific elements of the list. The use of the PULL instruction here also bears further discussion. PULL is really just a shorthand form of the PARSE instruction. The example could have been written equivalently with the line: PARSE UPPER PULL hours ":" minutes instead. The full interpretation of this instruction is: "read a line of input from the user, assign everything before ":" to the variable HOURS and everything after ":" to MINUTES. The PARSE instruction (or its equivalents implied by PULL and ARG) is used frequently in REXX programs. It is able to take strings from a number of possible sources and break them apart into constituent parts using a fairly natural notation. The part of the instruction that tells how to parse the string is called the parse template. The simplest form of a template is just a list of variable names. The input string is divided into blank-delimited words which are assigned, in order, to the variables. If there are more words than variables, the entire remaining part of the string is assigned to the last variable. If there are more variables than words, the excess variables are assigned the null string. This construct is useful in reading several numbers from a user, or tabular data from a file. For instance: DO i = 1 BY 1 WHILE LINES(file) \\= 0 PARSE VALUE LINEIN(file) WITH avg.i.1 avg.i.2, avg.i.3 avg.i.4 END uses the LINEIN() function to read a line at a time from a file into compound variables with the stem AVG. Each line of the file contains four numbers separated by blanks, but otherwise in a free format. The file is easy to maintain with a text editor because there is no need for a restriction to specific column numbers. (The LINES() function is nonzero until the end of the file is reached, which makes it convenient for terminating input loops.) It is often helpful to be able to automate the processing of computer files produced by various applications. When such files are in a report format suitable for reading by people, they are more of a problem to process by another program. For instance, a report may have on a single line: Name: Sam Spade Birthdate: 10/4/57 SSN: 000-00-0000 In many languages, this would require a lot of work to interpret, because (for instance) the name might be a variable number of words. A single PARSE instruction: PARSE VAR line 'Name:' name 'Birthdate:' birthday, 'SSN:' ssn handles the whole thing and assigns each data item to an appropriate variable. @endnode @node Over7 "OTHER FEATURES OF REXX" REXX has many additional surprises. You already know enough of the language to begin writing interesting and useful REXX programs. But there are a number of useful features of the language we haven't had time to describe yet. We have the rest of the book for that. We'll just give a few indications here of some of the highlights. One of the most important is the extensive library of built-in functions. The fact that there are a large number for character-string manipulation has already been mentioned. And most REXX implementations add many more of their own which are specialized for specific environments. Some of the standard functions are: DATE() returns the current date in a variety of formats TIME() returns the current time in a variety of formats and permits elapsed timing VALUE() allows access to variables whose names are determined at runtime SYMBOL() indicates whether a given variable has been initialized or not DATATYPE() returns the character or numeric type of a variable There are also a number of built-in functions for file I/O. Though REXX's I/O model is relatively simple, it does encompass files which are organized as either a sequence of characters or a sequence of records. Subject to the capabilities of the underlying file system, the I/O functions permit random access to any location in a file. More specialized I/O functions are usually provided with each particular REXX implementation. REXX has a simple but convenient model of exceptional event handling. This allows programmers to make their code more robust by providing handlers for a variety of exceptional circumstances, such as uninitialized variables, I/O errors, and user-generated interrupts. The handlers can either attempt to recover from the condition, or at least permit graceful termination of the program, with [an] appropriate error message and cleanup of any resources that may have been in use. Lastly, REXX has simple debugging capabilities as part of the language definition. Through the TRACE instruction, it is possible to trace program execution at varying levels of detail. You can request a trace on each statement executed, the evaluated results of an expression, or even the intermediate results during expression evaluation. The trace can be nonstop or interactive. During interactive tracing, you can execute any REXX statement, so you can display and change variables, call subroutines, issue system commands, and so forth. It is also possible to reexecute most program statements during interactive tracing, so that debugging can often continue after errors without a need to rerun the program. @endnode @node Stru "3: PROGRAM STRUCTURE AND SYNTAX" In this chapter we begin a more formal introduction to the REXX language. The purpose is to lay out the basic rules by which all REXX programs can be written and understood. There will be a number of definitions of terms that have already been introduced informally. There will also be lists of such things as possible token types that can occur in a REXX statement and valid operators. This will be the most formal chapter in the book. Later chapters will return to a more expository style that focuses on specific kinds of language features like control structures, built-in functions, and string handling facilities. Even so, this treatment will not amount to a precise formal definition of the language, which can be found in Cowlishaw's @{i}The REXX Language@{ui} or IBM's SAA documentation. Instead, our objective is to present the features of the language organized according to how they are employed to do useful work. Bear in mind, too, that although REXX is a fairly well-standardized language, there are many details which have been left to the discretion of each implementation. You should consult your implementation's user guide for information on such specifics. @{" PROGRAM FORMAT " link Stru1} @{" CLAUSES AND STATEMENTS " link Stru2} @{" MORE ABOUT CLAUSES " link Stru3} @{" TOKENIZATION OF STATEMENTS " link Stru4} @{" REXX EXPRESSIONS " link Stru5} @{" CLAUSE TYPE RECOGNITION RULES " link Stru6} @{" CHARACTER STRING OPERATORS " link Stru7} @{" ARITHMETIC OPERATORS " link Stru8} @{" LOGICAL OPERATORS " link Stru9} @{" OPERATOR PRECEDENCE " link Stru10} @{" NUMBERS AND ARITHMETIC IN REXX " link Stru11} @{" REXX VARIABLES " link Stru12} @endnode @node Stru1 "PROGRAM FORMAT" In general, a REXX program is contained entirely in a single file. There are no language provisions (as there are in C, for example) for dividing a program into multiple files while still allowing a procedure in one file to access procedures and variables in another file. This is not to say that a large program cannot be broken down into a number of separate files. It is just that such files can invoke each other only as external procedures which have no straightforward means of using the data and internal procedures of each other. Some implementations of REXX may provide a way to allow more interaction between external programs. In addition, an implementation may allow for multiple independent programs in a single file, as a sort of "library". For our purposes, however, we will always assume that a REXX program is coextensive with one file. The most fundamental unit of a REXX program is the token. There are basically four types of tokens: @{i}Symbols@{ui} Symbols consist of a group of legal symbol characters (alphanumeric characters and a few special ones like "_" and "."), delimited by blanks, operator characters, or other special characters. Numbers are a special case of symbols. Other than numbers, symbols are usually variable names, but may simply be used as literals if they are not valid as either a name or a number. @{i}Literal strings@{ui} Literal strings begin with a quote character: either ' or ". They continue until a matching quote. There are three subtypes: character, hexadecimal, and binary strings. @{i}Operators@{ui} Operators are groups of consecutive operator characters (such as "+", "*", "-"). Two operator characters, even if separated by blanks (or even a comment), are part of the same operator token. @{i}Special characters@{ui} Special characters are punctuation like ",", ":", ";", "(", and ")" and are tokens by themselves. They act as delimiters and also have additional syntactic functions. In order to separate a program into its constituent statements, any REXX language processor (compiler or interpreter) performs a process called tokenization. This process involves examining a source program character by character in order to identify symbol, literal, and operator tokens. Users of REXX should know a bit about how tokenizing works in order to understand various features of the language (such as how statements are recognized!), so we will describe the tokenizing process in more detail a little further on. @endnode @node Stru2 "CLAUSES AND STATEMENTS" The next syntactic level of a REXX program is the clause. This is the most important unit from the standpoint of understanding program execution. Every REXX program consists of a sequence of clauses which are processed as units, one at a time. In general, a clause is indivisible and will always be executed fully if it is executed at all (unless errors occur). There are four types of clauses. One is a label (a symbol followed by ":"). Labels are not executable; they merely identify a location within a program. The other types of clauses are executable and are better understood in terms of a related program unit that we call a statement. Statements are the basic units of work in a REXX program. A REXX program consists of a sequence of statements, some of which are preceded by labels. Statements themselves consist of one or more clauses. Some statements (such as DO...END) may even contain other statements nested within. There are three types of statements: @{i}Assignments@{ui} Any statement whose first token is a symbol and whose second token is the assignment operator "=" is an assignment statement. If the first token is a symbol that is not a valid variable name (ie. if it begins with a number or a period), the statement is still formally an assignment, though it is in error. (If the first token is a literal or an operator, the statement is technically not even an assignment.) The first token may be a valid REXX keyword, such as SAY. In this case, a variable named SAY is assigned a value. This is legal REXX; it does not affect the keyword SAY, and should cause no problems (but it may be confusing to read). Here are some examples of assignment statements: name = "Peter Jairus Frigate" address = "10 Downing Street" 1 = 2 /* will cause an error! */ And these are not assignments: 'stuff' = 'nonsense' /* evaluates to 0 */ * = 'asterisk' /* syntax error */ @{i}Instructions@{ui} Any statement that is not an assignment but whose first token is a REXX keyword (there are about 25 of these) is an instruction. Instructions are directives which are part of the REXX language, and which are used mostly to determine the flow of control (IF, DO, CALL), perform I/O (SAY, PULL), manipulate character strings (PARSE), or set options (NUMERIC). @{i}Commands@{ui} Any statement that isn't an assignment or instruction is, automatically, a command. Such a statement is first evaluated as a REXX expression, and the evaluated result is passed to some other program (an application or the operating system) as a command to be handled by that program. In REXX terminology, a program that handles commands is called a command environment or simply an environment. (This is unfortunate, as various operating systems often use the term to mean something quite different.) Given this typology of statements we can list two other types of clauses: assignments and commands. Every assignment or command is considered to consist of a single clause. Most instructions are also just single clauses. The sole exceptions are instructions that begin with the keywords IF, DO, and SELECT. These instructions always consist of several clauses and may even contain other statements. For instance, in the statement: IF today = 'Tuesday' THEN SAY 'This must be Belgium.' ELSE SAY 'This must be France.' IF today = 'Tuesday' is a clause, THEN and ELSE are each clauses, and both SAY instructions are clauses. The SAY instructions are also statements that are embedded within the IF statement. The fourth type of clause, THEN, consists of individual instructions except for IF, DO, and SELECT, as well as certain specific constituents of the latter three. (For instance, the THEN in an IF statement is always a clause by itself.) The reader should note that we have adopted a terminology which is somewhat different from Cowlishaw's in @{i}The REXX Language@{ui} and other references based on it. The latter use the term @{i}keyword instruction@{ui} to mean the same thing as what we have called simply an @{i}instruction@{ui}. They use the term @{i}instruction@{ui} to mean what we have called a @{i}statement@{ui}, ie. a more general concept encompassing assignments, (keyword) instructions, and commands. We have found that users easily confuse Cowlishaw's terms @{i}instruction@{ui} and @{i}keyword instruction@{ui}, so we have decided to use @{i}statement@{ui} for the more general concept. To illustrate further: a statement like SAY "Hello" is also, simultaneously, a clause and an instruction, and we may refer to it interchangeably as statement, clause, or instruction. An assignment or an operating system command is a single clause, and also a statement, but not an instruction. A keyword like THEN, ELSE, or OTHERWISE is a clause but is neither an instruction nor a statement. A label, also, is a clause that is neither an instruction nor a statement. Any statement may be preceded by one or more labels. A label is simply a symbol followed by a colon (":"). Labels are used as the names of internal procedures, ie. the target of a function reference or a CALL instruction. Labels are also used to name the target of a SIGNAL instruction. Labels are not required to be unique within a file, but only the first occurrence of a label is used as the target of a function reference, CALL, or SIGNAL. Since, in our terminology, a label is not a statement, we can exclude the possibility that a label may occur within a DO, IF, or SELECT statement, if we provide that only statements may be embedded within DO, IF, or SELECT. Although one can construct contrived examples of programs that have labels inside a DO loop, for instance, and where the example might even be expected to execute properly, it doesn't seem like there is any useful purpose in allowing this possibility. (Most current implementations do not rigorously enforce this limitation, however, therefore treating labels as if they were also a type of statement.) @endnode @node Stru3 "MORE ABOUT CLAUSES" The concept of clauses is very important in REXX, since the clause is the basic execution unit. In writing REXX programs you need to be aware of clauses, becuase of the rules that specify when semicolons or continuation characters are required. For instance, since: SAY "Enter today's date"; SAY "Use mm/dd/yy form." contains two clauses on one line, the semicolon has to be used to separate them. But: SAY, "Enter today's date. Use mm/dd/yy form." is a single clause. The comma must be used after SAY to continue to the next line. The code is syntactically valid without the continuation, but will not work as intended. REXX tends to require more use of line continuation characters than do other languages, such as C, where a positive indication (the semicolon) of the end of a statement is required. So the occasional need for continuation characters is the price we must pay for being able to leave out semicolons most of the time. Another reason for paying attention to clauses is that implementations typically have limits on the length of a single clause. Simply continuing a clause to extra lines will not overcome such limits. Yet another reason is that statement type recognition depends on certain details which are revealed at the clause level of analysis. For instance, keywords like SAY are recognized only at the beginning of a clause, so it is important to know where clauses begin. This isn't always obvious. In an IF statement like: IF n > 100 THEN SAY "Value of 'n' is out of range." there is an explicit REXX rule which says THEN is a clause by itself, which makes it possible to recognize the SAY instruction. (This rule also means that an IF statement can be continued to another line before or after THEN without a continuation character being required.) Finally, several features of REXX are tied to the definition of a clause. For instance, during interactive tracing one clause at a time is executed before a pause. Certain events, such as external conditions like HALT, are recognized only at clause boundaries. And the built-in functions DATE() and TIME() and synchronized so that they give consistent results within one clause. But, as we have seen, the concept of a clause includes a hodgepodge of different things. We can, perhaps, clarify the concept somewhat by defining how clauses are delimited. Basically, a clause is just a sequence of tokens that is terminated by one of five things, whichever comes first: a semicolon. the end of a line (provided the last token on the line isn't a comma). the keyword THEN, if the first token of the clause is IF or WHEN. (In this case, THEN is a separate clause and not part of the clause it terminates.) the keywords THEN, ELSE, and OTHERWISE are clauses by themselves when they occur in the appropriate context (after IF or SELECT). a colon (if it is the second token of a clause). This definition seems somewhat legalistic and complex, since it involves several alternatives and special associated exceptions or special conditions. However, as is true generally in REXX, while the formal rules of the language are sometimes a little convoluted, this is because of an attempt to codify rules that are in practice intuitively simple and clear. The intention of the rules is just to make things work out "the way they ought to". In this case, the intent is to allow clauses to be, for the most part, identified with separate lines of a program, and yet allow for several short clauses to appear on one line and for long clauses to be continued across several lines. Normally, the end of a line defines the end of a clause, with no special punctuation required. The next line of the program is automatically interpreted as the start of a new clause. The maximum length of a line in REXX is implementation dependent, but is usually at least 250 characters. However, it is usually inconvenient to edit or print programs whose lines are longer than the width of a screen or editing window, so it is more normal to use lines no longer than 80 characters. Many REXX expressions involve character-string literals or other lengthy elements, so it is common to need to continue a single clause to two or more lines. This is done by ending the line with a comma. This must be in addition to any comma that is required for syntactic purposes (as in function references or CALL instructions). String literals (enclosed in quotation marks) must be complete on one line and cannot be continued to additional lines. Concatenation of strings solves the problem of dealing with very long literals. Unlike literals, comments can span multiple lines. In fact, in the middle of a comment, a continuation character is not even required at the end of a line. Therefore, ending a line with an open comment is another way to force continuation. Line continuation by means of commas is handled during the tokenizing process, before any syntactic analysis is done. When a comma is detected at the end of a line, the REXX language processor replaces the comma with a blank and continues reading the next line of the file. Because blanks are sometimes significant characters in REXX, it is important to observe the effect that continuation has, since it can change the meaning of an expression. In this example: SAY "ruby", "emerald", "amethyst" the line that is displayed is ruby emerald amethyst, because the strings are concatenated with a blank in between each, since blanks are significant when they occur between two literals. If it were necessary to break a line beteen two strings and not concatenate them with a blank, then the first line should have a concatenation operator ("||") before the comma, since blanks are not significant on either side of an operator. Thus: SAY "ruby"||, "emerald"||, "amethyst" displays rubyemeraldamethyst. Also, because the comma that indicates continuation is removed, you should be careful with CALL instructions or expressions involving function references, where the comma has a syntactic function. So, if area is a function of two arguments, the statement: SAY "The area is" area(height,, width) requires two commas in order that the area function isn't passed a single value consisting of the concatenation of height and width. Sometimes it is convenient to place several clauses on the same line. In this case it is necessary to separate each clause from the preceding one with a semicolon. This might be done with several short assignments: a = 0; b = 0; c = 0 In general, however, placing several clauses on the same line makes a program harder to read and modify, so it's best not to get into this habit. There is one case where you may need to be careful to use a semicolon. The IF instruction has the form: IF condition THEN statement; [ELSE statement] REXX views this as several clauses. The first clause is IF condition. In an IF instruction, THEN is a reserved word. It automatically marks the end of the first clause without requiring a semicolon, but it cannot be used as the name of a variable. In contrast, ELSE is not a reserved word. Therefore, ELSE can be used as the name of a variable (though it's not a good idea), but a semicolon is required just before it. For readability, it is generally advisable to write an IF statement on several lines, in a consistent manner. For instance: IF condition THEN statement ELSE statement is a format that uses indentation to reveal clearly the separate parts of the instruction. In this format the semicolon is unnecessary, since the clause before ELSE is terminated by the end of the line. @endnode @node Stru4 "TOKENIZATION OF STATEMENTS" Tokenization is the process of building meaningful program elements out of the smallest identifiable elements of a file, ie. characters. Tokenization is the first step that a language processor performs in interpreting a REXX program. It is important to understand a little about tokenization in order to correctly read and write REXX statements. For the purposes of tokenization, characters are classed into one of several types: @{i}Symbol characters@{ui} These are the characters which can occur in REXX symbols. The class includes upper- and lowercase alphabetic characters (A-Z, a-z), numerals (0-9), and a few other "special" characters ("!", "?", ".", "_"). This constitutes the minimal set of allowable symbol characters. Different implementations of REXX may include others, such as currency symbols ("$") and characters of non-English alphabets. For portability, it would be best to avoid using characters outisde of the minimal set in symbols. @{i}Operator characters@{ui} This class includes all the characters which may occur in REXX operators, specifically "+", "-", "*", "/", "%", "|", "&", "=", "<", ">", and "\\". Some implementations also recognize alternatives for the negation symbol ("\\"), such as "~", "^", or "¬". (The negation symbol has proven especially troublesome in ASCII-EBCDIC conversion. You should probably stick with "\\" unless it presents a conversion problem.) @{i}Special characters@{ui} There are a few other characters which are used as punctuation by REXX: ",", ":", ";", "(", ")", " " (space), and both single (') and double (") quotes. @{i}Invalid characters@{ui} All other characters are not valid in REXX programs except in comments or quoted strings. If used outside comments or quoted strings, such characters will be flagged as errors. Even in comments and quoted strings, certain control characters (such as newline) may not be used transparently. During tokenization the following operations are performed: 1. The occurrence of any symbol character marks the beginning of a REXX symbol or number. All subsequent characters up to the first nonsymbol character are part of a single symbol or number. (A number is considered to be a symbol.) All aphabetic characters in a symbol are converted to uppercase. 2. The occurrence of any operator character marks the beginning of a REXX operator. Spaces adjacent to any operator character are ignored and removed from the program. All subsequent characters up to the first nonoperator character (other than a space) are part of a single operator. (Whether a particular multicharacter operator is actually valid isn't determined until later.) 3. Special characters are always treated as tokens by themselves. Adjacent blanks are always removed, except for a blank preceding a left parenthesis or following a right parenthesis. Such blanks are meaningful, since they distinguish a function reference from a symbol or literal followed by a parenthesized expression. 4. Comments are recognized as beginning with the sequence /* (as in PL/I and C). Inside a comment, any characters are valid. No character sequence except for /* or */ has any special meaning inside a comment. Comments may be nested, so the sequence /* can occur in a comment only if it introduces a nested comment. Comments can be continued on as many lines as desired without requiring a continuation character. A comment is terminated by the sequence */. Once a complete comment has been recognized, it is removed from any further processing, except for source code displays. A comment marks the end of any symbol it happens to follow, but it can occur in the middle of a multicharacter operator such as "**" (not necessarily a good idea!). 5. The occurrence of either quote character marks the beginning of a REXX string literal. Either kind of quote character may be used, so that if one is required inside a literal, the other kind can be used to delimit the literal. Alternatively, a quote character can be used inside a literal that it delimits by doubling the character. Except for this rule, any sequence of characters is valid inside a literal, including /* and */. The literal is terminated by the first undoubled quote character of the same kind as [was] used to begin the literal. If a B or an X (either case) follows the final quote character and is followed by a nonsymbol character, it is also part of the literal (binary or hex string). 6. Some special cases are handled on an ad hoc basis. For instance, if E or e immediately follows a number and is immediately followed by "+", "-", or another number, then the whole is taken as a single symbol which is a number in exponential notation, such as 6.023E+23. A few examples should illuminate the significance of these tokenization rules. In the statement: SAY "The price is $"price there are three tokens: SAY, "The price is $", and price. The fact that the last two tokens are adjacent to each other means that concatenation is implied (the "abuttal" operator). SAY "The price is $"/* display the price */price also has three tokens and is completely equivalent to the previous example. In the statement: IF amount > = 100 THEN SAY "Value is out of range." there is only one operator (>=) rather than two, because blanks adjacent to operator characters are removed. The statement: SAY +3 contains three tokens: SAY, +, and 3. +3 is not a single numeric token, but rather an operator (unary +) followed by a number. Parenthesized expressions behave like symbols or literals, in that adjacent blanks are treated as a blank concatenation operator rather than being ignored. Thus the statement: SAY "The speed is" (d/t) "km/sec." might display The speed is 3.4 km/sec. If it is necessary to concatenate a symbol or literal to a parenthesized expression with no intervening blank, the concatenation operator ("||") must be used to avoid producing a function reference: SAY "The price is $"||(units * unit_price)"." (Note that a concatenation operator isn't required after the right parenthesis.) Based on these rules there are seven types of tokens recognized by REXX: @{i}Symbols@{ui} A symbol is a string of consecutive symbol characters. A symbol may play various roles in REXX. It may be a keyword like SAY or CALL if it occurs at the beginning of a statement. It may be another reserved word like THEN or WHILE if it occurs in an appropriate context. Some, but not all symbols can be the names of REXX variables. (A variable name cannot begin with a numeral or a period.) A symbol which is neither a valid name nor a valid number is converted to uppercase and treated as a literal. Implementations usually have only a very loose limit on the length of a symbol, typically the same as the limit on the length of a clause (perhaps 250 characters or more). @{i}Numbers@{ui} A number is a special case of a symbol or literal string which obeys certain specific restrictions. A number can be composed entirely of numerals, or of numerals with one period (decimal point). It can also have an exponential suffix, which is E (or e) followed by a whole number or by "+" or "-" and a whole number. The following are valid numbers: 9999999999999999999999999 99999999999999999999e9999 6.023E+23 1.4142135 ' 666 ' '+ 2' '313233'x Note that although the last three examples are written as string literals, they are also legal numbers. (Leading and trailing blanks and a leading plus or minus sign are allowed in a number, and '313233'x is the same as '123' in ASCII. The following are symbols that are not valid numbers: 1.05.03 100K 333e.5 As symbols, there is a maximum length of a number that can be expressed literally in a program. However, depending on the implementation, REXX may be able to compute numbers with a much larger number of digits. There are other computer character string results and character string literals which can also be used as numbers (ie. are valid in arithmetic expressions). For instance, the string literal ' + 3' is a valid number, but not a numeric token. @{i}Character string literals@{ui} A character string literal is a sequence of arbitrary characters enclosed between either single (') or double (") quotation marks. Either quotation mark may occur in the string provided it is doubled. String literals, like all other tokens, must be entirely contained on one line. The limit on the length of a string literal is usually the same as the limit on the length of symbol tokens. As with numbers, computed character string results may, in general, be much larger. @{i}Hexadecimal literals@{ui} A hexadecimal literal is like a character string literal in that it is delimited by either single or double quotation marks. However, only the numbers 0 through 9 and letters A through F (upper- or lowercase), plus blanks, are allowed in the string. In addition, the ending quotation mark must be immediately followed by x or X, which in turn must be followed by a nonsymbol character. Blanks may be used in the string to improve readability. When blanks are used, characters must be grouped in pairs (except possibly for the first group). The following are valid hexadecimal literals: '68656c6c6f'x /* equivalent to 'hello' (ASCII) */ '88 85 93 93 96'x /* 'hello' (EBCDIC) */ '1ff ff ff'x /* the number 2**25 - 1 in binary */ Hexadecimal literals are ordinarily used to easily specify the exact machine representation of a given piece of data. This is a very non- portable feature! If the data in question involves character strings, the representation will depend on whether the encoding is ASCII or EBCDIC. If the data is a binary number, the portability problems are even more severe, because the proper representation depends on the byte ordering used on a particular machine, and on how the data will be used as well. For instance, if the string is binary data that will be written to a file and read by another (non-REXX) program, the correct representation depends on the order assumed by the other program. Suppose you need to write the number 256, in binary, to a file. Nominally, the hexadecimal representation of 256 is '0100'x. But on crap machines (eg. the Intel 80x86 series) this is actually stored in memory as '0001'x, and that is how another program might expect to read it. Hexadecimal literals are immediately converted internally to the equivalent character-string representation. They are supported merely as an alternative form of notation. In other words, as far as REXX itself is concerned, data is always a string of bytes, and a hexadecimal literal is just another way of writing a given byte string. Hexadecimal literals can be used in arithmetic provided [that] they correspond to the character- string representation of a number. That is, '313233'x is exactly the same string as '123' (ASCII representation), whereas the binary representation of 123, ie. '7b'x, cannot be used in arithmetic expression, since it is the same as the string '{' to REXX. In fact: SAY '7b'x would display {. @{i}Bit string literals@{ui} A bit string literal is also like a character string literal in that it is delimited by either single or double quotation marks. In this case, only 0s and 1s can occur within the quotation marks, and the closing quotation mark must be followed by b or B (to be followed in turn by a nonsymbol character). Like hexadecimal literals, bit string literals are supported as a notational convenience, and are internally stored as the equivalent character string. For readability, the digits 0 and 1 may be written in groups of four and separated by blanks. The first group need not have a full four digits. If the total number of digits is not a multiple of eight, the string is assumed to start with 0s. These are valid bit string literals: '1'b '110001 00110010 00110011'b /* '123' in ASCII */ '1111 0000 1111 0000'b /* 'f0f0'x */ Bit string literals are used, like hexadecimal literals, when an exact binary representation of data is required, and they suffer from the same portability hazards. They are primarily of use in specifying bit masks for the REXX built-in functions BITOR(), BITAND(), and BITXOR(). @{i}Operators@{ui} A few characters which have syntactic functions are considered to be tokens all by themselves (when used outside of a string literal or comment). These are: : identifies a label, when it follows a symbol. ; explicitly ends a clause. ( begins a parenthesized expression or function argument list. ) terminates a parenthesized expression or function argument list. , separates arguments in the argument list of a function reference or CALL instruction. @endnode @node Stru5 "REXX EXPRESSIONS" REXX expressions can occur in each of the three types of statements. In an assignment, everything to the right of the equal sign is one expression. In a command, the whole clause is an expression. In an instruction there may be zero or more expressions, depending on the type of instruction. For instance, a SAY instruction may have one expression (or none at all). A DO instruction may contain as many as five different expressions, separated by reserved words like TO, BY, FOR, WHILE, or UNTIL. (These words are reserved only in a DO instruction.) A REXX expression consists of a sequence of operators and terms that follows syntactic rules like those in most other modern procedural languages. A term is the simplest unit of an expression. It can be either a symbol, a string literal (character, hexadecimal or binary), a function reference, or an expression enclosed in parentheses. Informally speaking, an expression begins with an optional unary operator ("+", "-", or "\\") followed by one or more terms separated from each other by operators, and with optional parentheses to indicate grouping. More formally, a term is one of the following: a literal string a symbol a parenthesized expression (consisting of other terms and operators, enclosed in parentheses) a function call (a literal string or a symbol, followed immediately by a left parenthesis, zero or more expressions separated by commas, and a final right parenthesis) REXX is slightly unusual in that sometimes two terms may be written adjacent to each other with no explicit operator in between. However, in this situation it is considered that there is an implicit operator which is either simple concatenation or blank concatenation, depending on whether there is not, or is, a blank between the terms. (Blanks in excess of one in a row are removed during tokenizing.) As illustrated in various preceding examples, the explicit concatenation operator can be omitted when there is no ambiguity. (For instance, two adjacent symbols with no intervening blank would be tokenized as a single symbol, so the blank is required in this case.) An expression, formally, is one of the following: term unary_operator expression term binary_operator expression In evaluating a complete expression, REXX first evaluates all consituent terms in the expression as they are encountered and then combines them with the operators in an order that is determined by precedence rules (possibly) modified by parenthesization. Note that terms themselves, as well as expressions, have to be evaluated. A symbol is evaluated by substituting its value (which may involve substitution in the symbol itself if it is a compound symbol). Function references are evaluated by calling the function. An important rule of REXX evaluation is that terms are evaluated from left to right insofar as is possible. We have to add that last qualification, because terms can be nested (eg. one function reference may occur in the argument list of another). It's important to be clear about the order of evaluation whenever function references are involved, because functions can have side effects which alter the values of variables. For instance, consider the program fragment: /* test order of evaluation */ SAY p1 (echosub(p2, sub1(), p3) p4) SAY 'should say "P1 P2 - THIRD FOURTH"' EXIT echosub: RETURN ARG(1) ARG(2) ARG(3) sub1: p1 = first p2 = second p3 = third p4 = fourth return '-' The purpose of this example is to illustrate the order of evaluation of the expression: p1 (echosub(p2, sub1(), p3) p4) When REXX processes this it recognizes subexpressions in the following order: p1 (echosub(p2, sub1(), p3) p4) echosub(p2, sub1(), p3) p2 sub1() p3 p4 Note that the routine sub1 has the side effect of changing all four variables. The values used for P1 and P2 are determined before sub1 is called, so they enter the expression with their original value (which is uninitialized). But P3 and P4 are determined after the call to sub1, so their altered values are used. @endnode @node Stru6 "CLAUSE TYPE RECOGNITION RULES" Given all the foregoing definitions, it is possible to explain precisely how REXX classified clauses into each of the possible types. Once clause boundaries have been determined in the tokenizing process, then each of the following rules is applied in order. Whichever rule is first found to be true determines the type of the clause. @{i}Label rule@{ui} If the first token of the clause is a symbol and the second token of the clause is a colon, the clause is a label. @{i}Assignment rule@{ui} If the first token of the clause is a symbol and the second token of the clause begins with an = sign, the clause is an assignment. @{i}Keyword instruction rule@{ui} If the first token of the clause is a symbol which is one of the reserved REXX keywords (ADDRESS, ARG, CALL, etc.), the clause is a keyword instruction. @{i}Command rule@{ui} If none of the preceding rules applies, the clause is a command. To bring these rules into focus, and to demonstrate the importance of understanding them, let's consider a few REXX statements that appear very similar, yet which are actually very different. These examples bear some close study, since they illustrate points which are frequently missed by new users of REXX. Consider first these statements: EXIT(1) GOTO(1) GOTO (1) In the first of these, there are four tokens: EXIT, (, 1, and ). Although EXIT and ( are adjacent, they are separate tokens. The second step in processing a REXX program after tokenization is statement classification. At this stage, EXIT is recognized as a keyword. That leaves the rest of the statement, (1), as an expression which is evaluated at the time the statement is actually executed. In the second example, tokenization produces four tokens, just as before. However, the statement is classified as a command, because the statement isn't an assignment, and GOTO is not a language keyword. Later, when the statement is executed, it is significant to note that the tokens GOTO and ( are adjacent, because that means the expression is to be treated as a function call. After the function is called and the value is known, that result is used as a command to be passed to the default command environment. In the third example, there are still four tokens. Internally, however, the blank between GOTO and ( is remembered, and when it is time to evaluate the expression, it is treated as the concatenation of GOTO and 1 instead of a function call. Here are four more examples which illustrate statement classification: SAY a = b SAY = b 'SAY' = b say: a = b The first statement is a SAY instruction, not an assignment, because the equal sign is the third token rather than the second one. The second statement is in fact an assignment rather than a SAY instruction, even though the first token is a keyword, because the rule for recognizing assignments is applied first. In this statement, SAY is just an ordinary symbol that names a variable being assigned a value. This will have no effect on any other SAY statements in the program because keywords are recognized, where they occur as the first token, before symbols are evaluated. SAY used elsewhere in a statement would be handled properly as a symbol, however. The third statement is merely an expression which is processed as a command after evaluation. It is not an assignment because the first token is a string literal rather than a symbol. (The expression evaluates to 0 or 1 depending on whether the variable B has the value SAY.) The last example is an assignment preceded by a label. It is not a SAY instruction, because the label rule is applied first. Getting down to really fine points: 1 = 'xyz' is still an assignment statement, although it will cause an error when executed, and even though it makes sense as an expression which could be evaluated and executed as a command. This is because 1 is a symbol, so the assignment rule still applies. And both: x == 1 and: 2 == 1 are also considered to be assignment statements, because all that is required of the second token is that it begin with =. They are again invalid assignment statements, to be sure, and will cause errors if executed. Although they would make perfect syntactic sense as expressions that could be evaluated and executed as commands, the rules of REXX prevent this. The rationale for this is interesting. If they were evaluated as expressions the value would be 0 or 1, which is very unlikely (though possible in some contexts) to be a valid system or application command. But it is considered much more likely for programmer errors such as: x =, /* compute the next approximation */ = x - f(x) / f_prime(x) to occur, and it is better to generate a syntax error than to (perhaps silently) try to execute a nonsensical command. If you really want an expression to be handled as a command, you could use (for instance): (1 = x + 1) @endnode @node Stru7 "CHARACTER STRING OPERATORS" The first kind of operation performable on character strings is concatenation. All other string operations, except comparison (substring, reversal, etc.) are done with built-in functions. Concatenation comes in two forms. Blank concatenation means to concatenate two strings with a blank in between. This mimics the way words are concatenated into sentences in a natural language. It is expressed by writing two terms together separated by one or more blanks. Simple concatenation is the same except that no blank is inserted. It can always be expressed with the || operator. In many cases, when there is no possible ambiguity, it can also be expressed by writing the terms together with no intervening blank. The only time this isn't possible is when a symbol is being concatenated to another symbol or a parenthesized expression. Here are some examples: "Hello" "world." /* same as "Hello world." */ "Never"||"more!" /* same as "Nevermore!" */ "Price: $"amount Concatenation can be performed on strings that are valid numbers. The result may itself be a number, but usually isn't. For instance: '1234'||'5678' /* is a number */ 123'.'456 /* is a number */ '1234' '5678' /* is not a number */ The other kind of character-string operation that corresponds to an explicit operator is comparison. That is, REXX supports the notion of one string being less than, equal to, or greater than another string. There are two forms of comparison: ordinary and strict, supporting two possible senses of the meanings of greater than or less than. Both forms of comparison are binary operators that combine two strings to yield a value which is also a string. The resulting string, however, is either 0 or 1. String comparison is another example in REXX of how the language attempts to "do the right thing", even though the precise rules for what this is can be very complex. The ordinary comparison operators are: = operands are equal > first operand [is] greater < first operand [is] less >= first operand [is] greater or equal <= first operand [is] less or equal The negated forms of these are \\=, \\>, \\<, \\>=, and \\<=. Some of these are redundant, of course. For example, \\> is the same as <=. In the case of ordinary comparison, REXX recognizes a special case when both operands are numeric - that is, when both are valid REXX numbers. In that case the comparison is done in the numeric sense. So the following expressions all have the value 1: 1 < 2 2 < 10 2 < 1e2 Note that in the latter two cases, if comparisons were done as if the operands were character strings, the value of each expression would be 0, since 2 is higher in the character collating sequence than 1. On the other hand, if either operand is not a valid number, then an ordinary comparison is done by first ignoring leading and trailing blanks and then comparing the strings, character by charcter, using the standard collating sequence of the hardware (usually ASCII or EBCDIC). Removal of leading and trailing blanks is an important part of this operation, since in practice one often works with strings in such a way that the blanks are there but [are] irrelevant for the purpose at hand. Notice that both ordinary and strict comparison are case sensitive. This is a source of nonportability, since lowercase letters are higher than uppercase letters in the ASCII collating sequence, but (more logically) lower in the EBCDIC sequence. If portability is important, operands could first be converted to lowercase, yielding a case insensitive comparison. Ordinary comparison can be a source of errors, if you are really interested in the exact character strings. You may want to sort a number of strings that might incidentally include values that could (inadvently) be interpreted as numbers. In such a case, the strict comparison operators should be used. There is a strict comparison operator corresponding to each ordinary one. It is expressed by doubling part of the ordinary operator. Thus ==, <<, >>, <<=, >>= (and negated forms) are the strict counterparts of =, <, >, <=, and >=. In strict comparison, leading and trailing blanks are not ignored and operands are considered only as strings, never as numbers. The comparison is still done character by character using the standard collating sequence. Strict comparison is not only safer if you are not interested in treating strings as numbers, but it is also faster since no preprocessing of the operands is required. @endnode @node Stru8 "ARITHMETIC OPERATORS" REXX supports the standard binary aithmetic operators of addition ("+"), subtraction ("-"), multiplication ("*"), division ("/") and exponentiation ("**"). It also supports unary + and -. These are defined to be the same as 0 + or - another quantity. Wince numbers are not inherently either fixed-point or floating-point in REXX, division of one number by another usually produces a fractional result. The number of digits of accuracy that are retained after a division is governed by the NUMERIC DIGITS instruction. The default number of digits is nine, which is usually enough for most practical computations. The REXX language doesn't place any limit on the number of digits that can be requested via NUMERIC DIGITS. If more digits are required to express the result than allowed by NUMERIC DIGITS, the result is rounded rather than truncated. Sometimes an integer result of division is desired, with the fractional part truncated. There is a separate operator for this: %. (If you are a C programmer, this will annoy you, because % is used in C for the integer division [remainder] operator.) It is meaningful even if both division and dividend are nonintegral. When negative numbers are involved in multiplcation, division, or integer division, the sign of the result is determined by the sign law, which is that the result is positive (or 0) if both operands are the same sign, and negative if the operands are nonzero and of opposite signs. One last operator ("//") produces the remainder after integer division. The remainder, R, of the division of A by B is defined by: R = A - (A % B) * B For instance, if A = -4 and B = 3, then A % B is -1 and R = -1. Unlike the other aithmetic operators, which require only that their operands be numeric (except for division by zero), exponentiation requires that its second operand be an integer (since REXX doesn't support complex numbers.) Like the other operators, exponentiation is said to associate left to right. That is, the expression A ** B ** C is interpreted as (A ** B) ** C rather than A ** (B ** C). This is somewhat unfortunate, since (A ** B) ** C can be expressed equivalently but more efficiently as A ** (B * C), and the normal way of writing exponents as superscripts implies right to left associatively. @endnode @node Stru9 "LOGICAL OPERATORS" REXX supports one unary logical operator (negation), and three binary logical operators (and, or, exclusive or). The logical operators are used primarily in IF instructions, but could be used in any expression. The operands of a logical operator can be only the strings '0' and '1'; any other operand will cause an error. Furthermore, this must be interpreted in the sense of strict comparison. That is, for instance, the strings ' 0', '00', and '0E10' are not valid operands of a logical operator. (But '31'x would be okay, in ASCII, since it's just another way to write '1'.) Assuming that the variables A and B satisfy this restriction, then, the definitions of the logical operators are: @{i}Negation@{ui} \\A is 1 if A is 0, or 0 if A is 1 @{i}And@{ui} A & B is 1 if both A and B are 1, otherwise it is 0 @{i}Or@{ui} A | B is 1 if either A or B (or both) equal 1, otherwise [it is] 0 @{i}Exclusive or@{ui} A && B is 1 if exactly 1 of A or B is 1 (but not both), otherwise 0 Unlike some languages (such as C), REXX does not have any short circuit rules for the evaluation of logical expressions. Such a rule generally says that only as much of an expression needs to be computed as is necessary to determine the result. Thus, if you have: (expression1) | (expression2) you could avoid evaluating expression2 if you found that expression1 was 1. But REXX does not work that way, because it is felt that such rules are less intuitively natural. The REXX approach has the advantage that all terms in a logical expression will be evaluated, which may be important if any of these terms are function calls with side effects. On the other hand, it may be less efficient, since it can mean performing unnecessary computation, particularly when used in IF statements. Such statements can always be rewritten to avoid the inefficiency, but it requires effort on the user's part. Another point about logical operators is that they have nothing to do with bitwise operations on the operands. Built-in functions BITOR(), BITAND(), and BITXOR() are provided for this purpose. @endnode @node Stru10 "OPERATOR PRECEDENCE" As in most other programming languages, operators in REXX have associated with them the notion of precedence, which determines the order in which expressions involving several operators are evaluated in the absence of parentheses. The precedence of operators, in decreasing order, is: @{i}Unary operators@{ui} +, -, \\ @{i}Arithmetic operators@{ui} exponentiation ** multiplication and division *, /, %, // addition and subtraction +, - @{i}Concatenation@{ui} ||, , @{i}Comparison@{ui} =, ==, <, >, \\=, <=, >=, \\==, <<=, >>=, etc. @{i}Logical operators@{ui} and & or, exclusive or |, && The way to apply this table to an expression like: operand1 op1 operand2 op2 operand3 is to perform op1 first on its operands if op1 occurs higher in the table than op2, or else to perform op2 first. If both operators occur on the same line of the table, the leftmost in the expression is performed first. Parentheses, of course, can be used to modify the order of evaluation. That is, regardless of precendence: operand1 op1 (operand2 op2 operand3) means that op2 is to be performed before op1. @endnode @node Stru11 "NUMBERS AND ARITHMETIC IN REXX" We have seen that REXX tries as much as possible to treat all data as character strings, yet makes provision for dealing with a special class of character strings which are valid numbers. When the strings involved are valid numbers, arithmetic operators may be used, and the comparison operators behave differently from the way they would for nonnumeric strings. The REXX language places no inherent limit on the size of numbers which may be represented and used in computation, though specific implementations may do so. Ideally, any REXX implementation would allow numbers that are as long (measured in characters required for their representation) as the longest allowed character string. But even this is sometimes not the case, since some implementations use special internal representations (eg. floating-point numbers), which place a definite limit on the number of significant digits that can be maintained. Notice that the cruicial limit here is on the number of significant digits rather than the magnitude of the number, since exponential notation permits representation of very large or very small numbers, though perhaps with relatively few significant digits. As a practical matter, however, efficiency requires working with no more digits than are actually required for the problem at hand. Therefore, REXX provides a means, with the NUMERIC DIGITS instruction, to specify how many significant digits of precision will be used. The default is nine digits, which is adequate for most purposes. Numbers that have more significant digits are rounded off before use, and results of operations (like multiplication) that increase the number of digits are also rounded. Rounding is unavoidable in some situations such as division, when the result cannot be expressed in a finite form. Thus the value of 2/3 is 0.666666667 with the default number of NUMERIC DIGITS. The NUMERIC DIGITS setting is used for a variety of purposes in REXX. It controls not only the number of digits that will be retained in a result, but also the way numbers are formatted as strings and how large an integer can be to still be considered a whole number. See @{"Chapter 13" link Arit} for a full explanation of this and of REXX arithmetic in general. The most commonly observed effect of NUMERIC DIGITS other than the rounding of numbers is the way large numbers are formatted. REXX will automatically convert results to exponential notation if the number of decimal places required to the left of the decimal point exceeds NUMERIC DIGITS, or if the number of places to the right of the decimal point exceeds twice NUMERIC DIGITS. Thus, for instance, the value of the expression 1000 * 1000 * 1000 is 1.00000000E+9 with the default value of NUMERIC DIGITS, since otherwise 10 digits of precision would be required. This rule affects the way results are represented; it places no limits on the size of numeric constants that can be used in a program. It is legal to use numbers with more precision than the current NUMERIC DIGITS value. Rounding and possible conversion to exponential notation will occur only if and when an arithmetic operation or comparison is performed. Otherwise, numeric strings are not changed in any way. For instance: NUMERIC DIGITS 5 SAY 123456789 SAY 0 + 123456789 displays 123456789 as the output of the first SAY instruction, but 1.2346E+8 as the output of the second. In otherwords, adding 0 to a numeric string constant may sometimes be a useful way to force it into the proper form for the current value of NUMERIC DIGITS. When dealing with fractional numbers, it is often distracting to see results displayed with the full NUMERIC DIGITS of precision (for instance, 0.666666667 for 2/3). REXX has a built-in function called FORMAT() that can be used when more control over the appearance of output is desired. It takes three parameters: the number to format, the number of digits before the decimal point, and the number of digits after the decimal point. Thus: SAY FORMAT(2/3, 1, 4) would display 0.6667. @endnode @node Stru12 "REXX VARIABLES" The section in the last chapter on the REXX data model presented most of the important details about the nature and use of variables in REXX. To recapitulate, there are two types of REXX variables: simple and compound. Simple variables are scalar quantities, just like variables in most other languages. Simple variables in REXX are referenced by simple symbols, that is, symbols which do not begin with a digit (0-9) and do not contain any periods. In practice, the name of the variable and the symbol used to refer to it are one and the same. Alphabetic case of the symbol is ignored, as the symbol is converted entirely to uppercase during tokenization. Compound variables, in contrast, have names that consist of a stem and a tail. Conceptually, the stem is the name of a group of related quantities, and the tail identifies individual members of the collection. So a compound variable is really much like a one-dimensional array in other languages. Compound variable names are referred to by compound symbols. More precisely, a compound symbol is one of the form: s.t1.t2.t3...tn where the first part, s. (including the period), is the stem, and each subsequence ti is like a symbol symbol in that it does not contain periods, but it may begin with a digit. ti may also be null. This compound symbol refers to a variable whose name (the derived name) is: S.T1.T2.T3...Tn where each Ti is the value of the variable corresponding to the simple symbol t1, if that symbol doesn't begin with a digit, or the symbol itself in uppercase, if it does begin with a digit. The stem part S is just s converted to uppercase; no variable value lookup is performed. There is one important special case in this notation. It is that a symbol may consist of just a stem alone, that is, a simple symbol which ends in a period. LIST. would be an example. How REXX interprets this depends on the context. When a stem is used in a context that assigns it a value, the meaning is that all possible variables that have the same stem lose their existing value (if any) and take on the new value. This could happen in an assignment statement, a PARSE instruction, or in a call to the VALUE() built-in function. Similarly, in the DROP instruction, the meaning is that the values of all variables with the specified stem are dropped. And in the PROCEDURE EXPOSE function, all variables having the specified stem are exposed. On the other hand, in contexts where only the value of a variable is needed, a stem can be used just like a variable, and it is considered to have whatever value has been assigned to it, if any. You could even use a stem as the control variable in a DO loop, in which case the stem would be both assigned and evaluated. We will say a little more about this use of stems later in connection with an example. There is no limit on the length of individual components of the compound symbol; the only limit is on the length of the whole symbol. An important special case is when ti is a whole number; then it is much like a numeric array subscript in other languages. The following are all legal compound symbols: array.i restaraunt..address a.b.c.d.e.f.g.h.i.j.k.l.m.n.o.p.q.r.s.t.u.v.w.x.y.z It is possible to think of a symbol like A.i.j.k as equivalent to an array element, which would be expressed as A[i][j][k] in the C language, for instance. There are, however, significant differences between such "arrays" in REXX and arrays in other languages. Some of these differences represent advantages of REXX, but others are disadvantages. The differences are both semantic and syntactic. The main semantic difference is that, despite appearances, a REXX array does not have a specific "dimension" like an array in other languages. The main syntactic difference is that a programmer cannot use expressions or even compound variables in the "subscripts". On the positive side, because of REXX's dynamic memory management, it is never necessary to declare in advance how large an array will be. It simply grows as needed, and (usually) does not consume storage for unused elements of the array. And because a REXX array does not have a true dimensionality, it is not necessary to decide even this in advance. We say that a REXX array does not have a specific dimension because the periods in a compound symbol have syntactic meaning only in the symbol. Once the derived name has been formed by substituting all values of simple symbols, there are really only two parts to it: the stem and the tail. For instance, if we have: i = 3 j = 4 then the symbol A.i.j actually consists of just the stem, which is A. and the tail, which is 3.4. At this point, the fact that the tail still contains a period is irrelevant. So, if we also have: x = 34 y = 10 z = x/y t = "3.4" then the symbols A.i.j, A.t, and A.z all refer to exactly the same piece of data, namely the value of the variable whose derived name is A.3.4. Useful programs can actually be written that take advantage of this ambiguity of the "dimension" of a REXX array. The tail of a REXX variable can consist of completely arbitrary data, including blanks and unprintable ASCII characters. In contrast to the usual situation in REXX, blanks are significant in a variable tail. Thus, if: x = "" y = " " z = " " then A.x, A.y, and A.z refer to three completely different data items, even though x, y, and z are "equal" when compared with the normal comparison operators. This is another respect in which REXX "arrays" differ from those in other languages. Also, in connection with this example, note that the variable with the derived name A, is dinstinct from A. as a stem. Thus, in: x = "" A. = "Newton" A.x = "Leibniz" the assignment to A.x affects only the variable A.; it does not affect other variables with A. as the stem. Any reference in a program to A. (alone) is assumed to be a reference to the stem. In considering the syntactic difference of REXX arrays from those in other languages, we see that the REXX array notation is not as powerful, or at least as convenient. In particular, it is not possible to have expressions in a REXX "subscript". For instance, A.i+j is the sum of A.i and j, instead of an array with two subscripts. Even parentheses cannot be used to circumvent this problem, since A.(i+j) is actually a function call to a function named A. (In fact, Cowlishaw has stated that a distaste for complex parenthesized expressions was one factor in not using the customary notation for subscripts.) This notation is usually the most inconvenient when you simply want to use another compound symbol as the subscript. So if i.j is a value you wish to use as a "subscript", you cannot just refer to A.i.j. You must assign i.j to a simple variable first: x = i.j SAY A.x Despite these syntactical inconveniences, the great power of REXX's notation lies in the fact that "subscripts" can be nonnumeric. This allows you to build data structures which easily associate data values with data names. Suppose, for instance, that you want to work with a database of books. In REXX you can do this by having a number of arrays, each of which is subscripted by the name of the book. The names of these arrays might be author, date_of_publication, call_number, and so forth. Then if the name of a particular book is stored in the variable book_name, you can retrieve all of the other information directly by referring to author.book_name, publisher.book_name, etc. Because of this direct association from a name to a value, such data structures are sometimes called associative arrays. As far as the language user is concerned, there is no search process at all involved in looking up the author of a given book. In reality, of course, REXX does need to do a search to find each piece of data. The advantage is that this search process is all built in and transparent to the user. An example may help clarify the usefulness of associative arrays. Suppose that we want to examine text files for the occurrence of specific key words. We would like to make a copy of only those lines in the file that contain one of the key words. The list of key words will be read from a separate text file in which they will be stored one or more to a line. The following program does the job. It is written as a filter, ie. a program which reads from standard input and writes to standard output. (Standard input and standard output are concepts originated in UNIX that permit the output of one program to be used as the input of another.) /****************************************************/ /* WORDFIND: a filter that copies only those lines */ /* that contain a word from a list in another file. */ /****************************************************/ [1] ARG wordlist infile outfile [2] CALL dosdel outfile [3] dict. = 0 /* read wordlist */ [4] DO WHILE LINES(wordlist) > 0 [5] line = TRANSLATE(LINEIN(wordlist)) [6] DO WHILE line \\= '' [7] PARSE VAR line word line [8] dict.word = 1 [9] END [10] END [11] CALL lineout wordlist /* search through infile */ [12] DO WHILE LINES(infile) > 0 [13] line = LINEIN(infile) [14] DO i = 1 TO WORDS(line) [15] word = TRANSLATE(word(line, i)) [16] IF dict.word THEN DO [17] CALL lineout outfile, line [18] LEAVE [19] END [20] END [21] END [22] CALL lineout infile [23] CALL lineout outfile [24] EXIT The program begins with an ARG instruction that picks up three filenames from the program's single argument - the names of a word list file, an input file, and an output file. The second line calls a system function (which is not a standard part of REXX) to delete an existing file (if any) which has the same name as the output file. We will explain the assignment to dict. a little later. The first loop in the program reads entirely through the word list file in order to create a table of keywords. How this table works is really the point of the example, so we will again defer the full explanation until we've finished an overview of the program. In this loop, the LINES() built-in function is used to allow the loop to terminate when there are no more lines to be read, and each individual line of the file is read with LINEIN(). We are assuming that each line of the file may contain more than [one] word, so we have to extract each individual word. One REXX idiom to do this sort of thing, which may be of some interest, is the use of the instruction: PARSE VAR line word line to isolate each word of the variable line in succession. In each iteration of the loop, line is updated with the portion following the first word. The TRANSLATE() built-in function is used to convert all words to uppercase, so that the program will not be case sensitive. The second loop in the program has the same general structure. It reads one line at a time from the input file. Again, individual words are extracted from each line. Just for variety, we have used a different, more obvious (but usually less efficient) technique utilizing the WORD() built- in function to break a string of characters into words. The table created in the first loop is used in the second to identify words which were found in the word list. In order to understand how the table works, the most important statement is: dict. = 0 This is a special kind of assignment, an assignment to a stem. dict. is a very special case; it is not, strictly speaking, either a simple or compound variable, because it ends will a period. Assignment to a stem is defined to mean that all possible values having that stem take on the new value, losing any previous value. Of course, in practice REXX merely records the fact that a stem assignment has been done, rather than attempting the impossible feat of making an infinite number of assignments. But the effect is the same. Specifically, after an assignment to a stem, all possible values of variables having the same stem are now considered to be defined. And if they are used without any other assignment, the value of such variables is the value assigned to the stem. The issue of whether a variable has been defined or, in other words, whether it has a value, is important in REXX. The language allows use of undefined variables, and gives them a value which is the same as the variable's name. In the case of a simple variable, this value is just the name (in uppercase). For compound variables, the value reflects the formation of the derived name from the corresponding compound symbol as described above. For instance, in the following: SAY dict.word word = 'bah' SAY dict.word dict. = 0 SAY dict.word dict.word = 'humbug' SAY dict.word The first SAY displays DICT.WORD, since nothing has been assigned yet. The second SAY displays DICT.bah (note lowercase), since the compound variable still does not have value, though word has been assigned. The third SAY displays 0, reflecting the stem assignment. The last SAY displays humbug, because the specific compound variable DICT.bah has finally received a value apart from its assignment with all others having the same stem. What we have done by making the stem assignment, then, is to make it possible to test very easily whether a given keyword has been encountered. The first nested loops in the example under discussion simply go through and set dict.word to 1 for each value of word which is found in the list of keywords read from a file. For all possible values of word that were not listed as keywords, dict.word is 0. The second [set of] nested loops in the example use this information to determine quickly whether any particular word in the text file was contained in the list of keywords. Note that we were careful to convert words to uppercase, since a word might occur in the text with any combination of upper- and lowercase letters, and the variable DICT.something is not the same as the variable DICT.Something. (As symbols they would be equivalent, because of the rule that symbols are always treated as uppercase.) Another thing that should be noted about stem variables is that they can be used in REXX expressions anywhere an ordinary (simple or compound) variable can be used. It is only when a value is assigned to a stem variable that something special happens. The value of a stem variable is, of course, whatever has been assigned to it, just as for any other variable, or else the uppercase) symbol itself if nothing has been assigned. In this latter circumstance, compound variables based on the stem do not have the same value as the stem itself. From this discussion, it is apparent that it is a matter of great importance in REXX whether or not a variable has been assigned, even though default values are supplied if the variable is undefined because no assignment has been made. You should, however, as a matter of good practice, never use a variable that hasn't been defined. REXX even allows you to enforce this convention by providing an instruction: SIGNAL ON NOVALUE which will cause a program trap in case an undefined variable is used. (Specifically, it causes the program to start executing statements beginning after the label novalue:.) Trapping of undefined variable usage is not the default, unfortunately, so you should include the above statement in all REXX programs of any significant size. Much more discussion on signals and how they work will be presented later. One partial exception to the rule about [the] use of undefined variables is that when they are referenced in the process of forming a derived name from a compound symbol they do not raise the NOVALUE condition. For instance, in the example discussed above, a reference to dict.word would be allowed even if word hadn't been assigned, provided [that] the stem itself had been assigned. (The variable in question would then be named DICT.WORD.) Such a practice is still to be avoided, because of the hazard that word might inadvertently have a value though it was not expected to. Given the importance of knowing when a variable is considered to be defined, we will mention the ways other than an assignment statement in which a variable can be assigned a value. One such way, of which we have seen a few examples, is the PARSE instruction (and its special cases ARG and PULL), which can assign a number of variables at once. Another way, the importance of which will become apparent later, is with the VALUE() built-in function. A DO instruction may initialize a control variable. Lastly, it is even possible for external programs to access REXX variables, depending on the application program interface supplied by each specific implementation. In fact, some implementations allow such external programs to create variables with names that are impossible in pure REXX programs - eg. simple variables with names containing lowercase characters. An assignment, or one of the other means of associating a value with a variable, also allocates storage for the variable. The reverse of this is also possible: the REXX DROP instruction releases a variable's storage and returns it to an uninitialized state. Both simple and compound variables can be dropped. A stem can also be dropped, which causes all variables having that stem to have their storage released and become uninitialized. This is obviously useful in an environment where storage is scarce and there are large data items or numerous compound variables which are no longer needed. It usually isn't necessary to use DROP when you are done with a particular variable or collection of compound variables, but it's handy to have the capability if little memory is available. Also, when a PROCEDURE statement is used in a subroutine or function, all variables which haven't been exposed are automatically dropped when the procedure returns. More than one variable or stem can be dropped with the same DROP instruction. A typical example might be: j = 10 DROP list. bitmap model.j The variables or stems to be dropped are listed in the DROP instruction, separated by spaces. In determining variables to be dropped, derived names are formed in the usual way, so in the above example the model.10 variable is dropped. There is one thing to note about dropping specific compound variables. In the following: stem. = 'something' DROP stem.1 SAY stem.1 the SAY instruction displays STEM.1 rather than something, because the value assigned to a stem should not be thought of as setting a default value for variables having that stem. Instead, conceptually, the stem assignment sets all possible variables having the same stem. The DROP statement above undefines just one of those variables, STEM.1, so like any other undefinied variable, its uninitialized variable is its name. Also, use of STEM.1 in an expression will raise the NOVALUE condition if it has been enabled. @endnode @node Cont "4: CONTROL STRUCTURES" A control structure is a programming language construct that determines the sequence in which instructions will be executed. There are several primary sorts of control structures. One is a selection structure which permits choices to be made in the flow of control based on the state of the program and its data. This is represented in REXX by the IF and SELECT instructions. A second type of structure is for looping. REXX has one instruction for this: DO, which has many variations. We will consider these control structures in this chapter, and also the SIGNAL instruction, which is something like an unconditional "goto". In the @{"next chapter" link Subr} we will look at a third type of control structure: subroutines. @{" SELECTION STRUCTURES " link Cont1} @{" LOOPING STRUCTURES " link Cont2} @{" THE SIGNAL INSTRUCTION " link Cont3} @endnode @node Cont1 "SELECTION STRUCTURES" The IF instruction is REXX's basic conditional construct. Its general form is: IF expression THEN statement1 [ELSE statement2] Here, expression must be a valid REXX expression that evaluates to 0 or 1 (a boolean expression). 1 represents true and 0 represents false. Therefore, statement1 is executed if expression has the value 1, and statement2 is executed if expression has the value 0. This simple concept conceals a few subtleties. We will see these as we analyze this statement into the various clauses it contains. Recall that a clause is the basic unit of execution in REXX. Most REXX statements consist of just one clause, but IF is one of the exceptions. Let's consider a simple example: IF x >= 0 THEN z = x; ELSE z = -x If we wrote this with exactly one clause per line, it would be: IF x >= 0 THEN z = x ELSE z = -x so there are really five clauses here. Distinguishing the clauses becomes a matter of some importance when you are using REXX trace facilities to display the flow of execution. If you traced the above statement, you would see that there are indeed five separate clauses (though not all of them could be executed in any specific case). If you used the "interactive" trace option, REXX would actually stop after executing certain clauses, to allow you to display or change variables, and possibly even reexecute a clause. Clause boundaries are also important in that they represent opportunities for certain exceptional conditions to be raised. But mostly, it's important to recognize the individual clauses in order to understand REXX's rules for the syntax of IF. The first important fact is that THEN is a reserved word in an IF statement. That is, THEN cannot be used as a symbol in the expression following IF, because THEN is specifically reserved to mark the end of the first clause. You could explicitly end the clause with ";": IF x >=0; THEN ... or with the end of a line: IF x >= 0 THEN ... but this is unnecessary, since THEN in this context explicitly ends the clause. THEN is also considered to be a clause by itself, so it may be followed by ";" or the end of the line. A statement is required after THEN and may not simply be omitted. If you want to do nothing in the case the expression is true, then you must use the NOP (no-operation) instruction: IF x >= 0 THEN NOP; ELSE Z = -X However, the statement following THEN may be quite complex. It could be another IF instruction, a SELECT instruction, or a DO instruction. If you want to execute several statements in case the condition is true, this is done by enclosing them between DO and END: IF x >= 0 THEN DO SAY 'Nonnegative value of x.' z = x END ELSE ... The DO...END pair is considered to be a single statement, though it is composed of any number of other statements. This construct is also known as a simple DO group, as opposed to a repetitive DO group, which represents a loop that may possibly be executed many times. A repetitive DO group is also considered to be a single statement, and may be used following a THEN or ELSE. In contast to THEN, ELSE is not a reserved word within an IF instruction. Therefore it is required that the statement following THEN be terminated by ":" or the end of a line before using ELSE. If you like to write your IF statements to be complete on one line, this is one of the few times in REXX where you will be required to use ";". It is probably better style, however, to always put ELSE on a line by itself. Still, ELSE is like THEN in being considered to be a clause by itself. The ELSE part of the IF instruction is optional. If you need to do something only when the condition part is true, then the ELSE and the statement following it may be omitted. But if ELSE is used it must be followed by a statement, which could be another IF statement, a DO group, etc. You might also have to use the NOP instruction after ELSE. This is because if you use nested IFs, then any ELSE part is associated with the nearest incomplete IF. So in: IF x >= 0 THEN IF x > 100 THEN z = x ELSE NOP ELSE z = -x we had to insert an extra ELSE NOP so that z =-x is executed only in case x < 0, rather than in case x <= 100. There is an aspect of the way REXX handles conditional statements that has significant implications for performance. It involves the evluation of logical expressions involving the and ("&") and or ("|") operators. Any expression which contains one of these operators is always evaluated fully. This is unlike the situation in some languages like C that have short circuit rules which guarantee that a logical expression is evaluated only far enough that its value can be unambiguously determined. For instance, the value of: x & y could be determined just by looking at x, provided the value of x is 0. But REXX will always determine the value of y as well. Logical expressions are used primarily in IF statements. You might naturally want to say: IF something(x) & something_else(y) THEN CALL do_this But if the function evaluations might take some time (and merely calling a function introduces some overhead), it would be better to rewrite this equivalently as: IF something(x) THEN IF something_else(y) THEN CALL do_this Again, you may save time by ordering the clauses so that the function with the least overhead is called first, unless the other one is much more likely to determine the outcome. Handling a logical expression that involves an or instead or an and is in principle the same, though slightly more awkward, since you must replace: IF something(x) | something(y) THEN CALL do_this with: IF something(x) THEN CALL do_this ELSE IF something_else(y) THEN CALL do_this Many programming situations require you to test for any of a number of possibilities and take appropriate action. One way to do this is with a sequence of nested IF instructions, eg: IF choice = 1 THEN CALL show_help ELSE IF choice = 2 THEN CALL edit_file ELSE IF choice = 3 THEN CALL finish_transaction ELSE SAY "Invalid choice" There is nothing especially wrong with this approach, but the situation is so common that REXX provides a separate SELECT instruction to deal with it. The preceding example could be written : SELECT WHEN choice = 1 THEN CALL show_help WHEN choice = 2 THEN CALL edit_file WHEN choice = 3 THEN CALL finish_transaction OTHERWISE SAY "Invalid choice" END A SELECT instruction works exactly like the corresporiding nested IF, in that each condition is tested in turn. The instruction following THEN of the first case that is true (has the value 1) will be executed, and then control will transfer to the first instruction following END. There is usually no particular limit on how many WHEN cases can occur in a SELECT instruction, but some REXX implementations do limit the depth of depth of nesting of control structures. For this reason, the equivalent form using nested IF instructions is best avoided when there are more than a handful of separate cases. The general form of SELECT is: SELECT when-list [OTHERWISE [statement-list]] END where when-list is a sequence of one or more compound clauses of the form: WHEN expression THEN statement This is just like an IF instruction (except there's no ELSE part). That is, the expression must have a value of 0 or 1. THEN is a reserved word and cannot be used as part of [an] expression. Instead, it marks the end of the expression. For instance, in: WHEN pi * r ** 2 < 100 THEN CALL adjust_circle REXX recognized the end of the expression by the presence of THEN. A semicolon could have been used to end the clause explicitly, too: WHEN pi * r ** 2 < 100; THEN CALL adjust_circle Following THEN, [the] statement can be any simple or compound statement, including a simple or repetitive DO group or another SELECT instruction. A nonnull clause is required after THEN. The NOP instruction should be used as the statement following THEN if nothing is to be done for one particular case. OTHERWISE is not a reserved word, so if it is used, the preceding clause must be explicitly terminated with a semicolon or the end of a line, just as for ELSE in an IF instruction. The OTHERWISE part of a SELECT instruction is optional. However, if none of the WHEN cases are found to be true, an error will be generated if there is no OTHERWISE. The list of statements following OTHERWISE can be empty, though - a NOP instruction isn't required. Therefore, if it is possible that none of the WHEN conditions in a SELECT is executed and you don't want to regard that as an error, you must include the OTHERWISE, for instance,: SELECT WHEN author = "Twain, Mark" THEN author = "Clemens, Samuel" WHEN author = "Carroll, Lewis" THEN authot = "Dodgson, Charles" ... OTHERWISE END In addition, a list of two or more statements between OTHERWISE and END does not need to be enclosed in a DO...END pair. @endnode @node Cont2 "LOOPING STRUCTURES" REXX has only one looping structure, but it has various forms. The general syntax of DO is: DO [repetitor] [conditional] [statement-list] END [symbol] The statement-list, which is sometimes called the body of the loop, is simply an arbitrary sequence of statements, separated from each other by semiolons (or line-ends). It can start on the same line as DO itself, provided that that the DO [repetitor] [conditional] part is followed by a ";" to mark the end of the clause. Both the repetitor and the conditional may be omitted, in which case the structure is called a simple DO group. The DO...END pair is merely used to group a sequence of statements to form a single statement for use with an IF or SELECT instruction. The different forms that a repetitive DO group may take correspond to use of different types of repetitors and conditionals. A repetitor can be one of three things. First, it can simply be FOREVER. DO FOREVER instructions are endless loops, except, of course, there should be some way in the body of the loop to terminate it, for instance: DO FOREVER ... /* processing */ IF string = '' THEN RETURN ... /* more processing */ END In this example, some processing is done, and the internal logic calls for a return to a calling procedure if there's nothing left in string. The RETURN instruction terminates the loop in addition to returning to the caller. Notice (in this case) that the test occurred in the middle of the loop. Had it been possible to do the test at either the beginning or end of the loop, we could have used a conditional like WHILE string \\= '' or UNTIL string = '' as part of the DO clause. One other simple form of repetitor is an arbitrary expression that evaluates to a nonnegative whole number. This number tells explicitly how many times the loop should be executed: DO 3 SAY 'Never!' END The third kind of repetitor is the most complex. It has the form: assignment [TO expt] [BY expb] [FOR expf] Here assignment is just an ordinary REXX assignment of the form: smybol = expi expt, expb, expf, and expi are expressions. For instance, consider: DO year = 1800 TO 1990 BY 10 FOR count CALL census_report year END The target variable of the assignment (symbol) is the loop control variable, year. The initial value of the control variable (expi) is 1800. The terminating value of the control variable (expt) is 1990. The increment of the control variable (expb) is 10. And the number of times the loop can be expected (expf) is count. The control variable assignment here is handled just like any other assignment to the variable named by symbol. When several loops having control variables are nested, it is possible to identify a particular one on a LEAVE or ITERATE instruction by naming the appropriate control variable. The control variable is just like any other REXX variable, and it can be either simple or compound - it could even be a stem. The loop is executed with the control variable having the initial value given by expi. The loop will be terminated as soon as the control variable is greater than or equal to the quantity determined by expt if TO is present. This is true provided that there is no increment defined by expb, or the increment is nonnegative if present. However, if the increment is specified and it is negative, the terminating condition is that the control variable becomes less than the expt quantity. The default for the increment, if none is specified, is 1. Thus: DO i=1 ... END is just like a DO FOREVER loop, except that i is initially set to 1 and [is] incremented by 1 every time through the loop. It is possible, based on the original value, the limit, and the increment, that the loop may not be executed at all. In addition, FOR may be specified with a count (that must be a nonnegative value); if so, this acts as another way to terminate the loop before the limit is reached. If no limit is specified by TO, the loop will not terminate unless there is a FOR, WHILE, or UNTIL expression, or some explicit exit tests within the body of the loop. The limit (expt), increment (expb), and maximum loop count (expf) may be specified in any order. Expressions in the repetitor are evaluated from left to right. (This is important to note in case any of the expressions involve functions that have side effects. For instance, a side effect might be a change to a variable used in anothor expression, or an I/O operation.) Further, they are evaluated only once, before the loop is executed, so the increment and limit cannot be changed by any action within the loop. All expressions must evaluate to valid numbers, and any necessary comparisons of the control variable to the limit are done in accord with the prevailing value of NUMERIC DIGITS. TO, BY, and FOR are reserved words in the expressions expt, expb, expf, and expi, as are WHILE and UNTIL. That is, they cannot be used as the names of variables. In addition to any of the three kinds of repetitors, one conditional of the form WHILE expw or UNTIL expu may be used. This provides yet another means of terminating a loop. The expressions associated with with WHILE and UNTIL must evaluate to 0 or 1. But unlike the expressions associated with the limit, increment, and loop count, they are computed every time through the loop. This means that loops which use WHILE and UNTIL may be somewhat slower. For instance, in the two loops: DO i=1 TO 1000 ... END and: DO i=1 WHILE i<=1000 ... END the first will probably be faster than the second. The reason is that the termination condition can be tested more efficiently in the first case. In effect, use of WHILE or UNTIL requires that an expression (i<=1000) be evaluated every time through the loop, while a simpler numeric comparison can be performed in the first case. The difference would be even more dramatic if the termination test involved a computation of some sort. Suppose the loop limit is computed by a function call like LENGTH(x). Assuming that x does not change within the loop then: DO i=1 TO LENGTH(x) ... END would be much more efficient than: DO i=1 WHILE i<=LENGTH(x) ... END since LENGTH(x) needs to be computed just once in the first case (the first time the loop is executed). However, if x may change within the loop, you would have to use the second form with WHILE in order to guarantee that i never exceeds the length of x inside the loop. The difference between WHILE and UNTIL is that the WHILE condition is tested before the body of the loop is executed, but the UNTIL condition is tested afterwards. Consequently, use of WHILE may completely prevent the body of the loop from being executed, but UNTIL allows the body to be executed at least once. When conditionals are combined with repetitors (including possibly both a limit and a loop-count condition), it may be important to know in precisely which order each of the tests will be applied. REXX guarantees that the control variable will be incremented and tests will be performed in the following order: 1. The control variable is assigned its initial value or incremented. 2. The control variable is compared against the limit, and the loop ends if the limit is exceeded. 3. The loop count is checked, and the loop ends if the value is exceeded. 4. The WHILE expression is evaluated and the loop ends if the value is 0. (An error occurs if it is not 0 or 1.) 5. Assuming none of these tests causes the loop to terminate, the body of the loop is executed. 6. The UNTIL expression is evaluated. If it is 1, the loop ends, but otherwise things begin again with step 1. Hopefully you will avoid writing too many loops where details like this are critical, since it's easy to forget all these rule. The main time you might have to be aware of these rules is if you want to access the value of the control variable after the loop has terminated (quite a legal thing to do in REXX) or if you use functions that have side effects in the WHILE or UNTIL expressions. Except that the value ofthe control variable has to be numeric, there is no other restriction. It could be fractional, as could the increment and limit. All arithmetic on the control variable is done in the standard REXX way, which means that the values of NUMERIC DIGITS and FUZZ may be significant. When nonintegral numbers are involved, note that the limit condition is defined comparatively, rather in terms of equality. Therefore, a loop like: DO x = 1.1 BY 1.1 TO 5 ... END will terminate, even though x never equals the limit value, since the loop ends as soon as x equals or exceeds the limit value. One feature which is often helpful in a looping structure, but which is often missing in a language, is the ability either to terminate the loop or to begin another iteration ofthe loop from an arbitrary location within the body. Let's first consider leaving the loop. You might, for instance, be reading lines from a file and want to quit if some particular data item is read: DO WHILE LINES(file) \\= 0 x = LINEIN(file) IF x = '*** EOF ***' THEN LEAVE ... END Here, as we will explain in the @{"chapter on file I/O" link Inpu}, the LINES() function returns 0 once the true end of the file has been reached, allowing the loop to terminate. But, perhaps for testing purposes, we also want to quit if a special marker('*** EOF ***') is found. The LINEIN() function reads a line from the file. The next statement uses a LEAVE instruction to exit from the loop if the marker is found. The rule is that LEAVE will terminate the innermost active repetitive loop in which it occurs. It is an error to use a LEAVE instruction that is not contained in the body of a repetitive DO loop in the current procedure. LEAVE would not terminate a simple DO group, and could not even legally be used there, unless the simple DOg roup were contained within an active repetitive loop. If the simple loop is nested within a repetitive one, then both are terminated by LEAVE. For instance, if we wanted to add a message to the previous example before ceasing to read the file, we might have: DO WHILE LINES(file) \\= 0 x = LINEIN(file) IF x = '*** EOF ***' THEN DO SAY 'Prematurely ending file scan.' LEAVE END ... END Note that if you have one or more nested procedures (@{"next chapter" link Subr}), only repetitive loops in the current procedure are considered to be active, so LEAVE has no effect on a loop whioh was active when the current procedure was called. ln other words, LEAVE can't terminate a loop in the calling procedure. The execution of a LEAVE instruction is like a jump to the first statement after the END which belongs to the active loop. This jump skips any incrementing of the control variable, which therefore will have the value current when LEAVE was executed. Any WHILE or UNTIL expressions are likewise skipped. In case there are several active repetitive loops, only the one immediately containing LEAVE is normally terminated. However, it is possible to terminate an outer loop from within an inner one if the outer loop has a control variable (and if the control variable is different from that of the inner loop, as would normally be the case). This is done by using the symbol for the control variable on the LEAVE instruction: DO outer = 1 TO 10 ... DO FOREVER ... IF something THEN LEAVE outer ... END END This example terminates all DO loops illustrated, even though the inner one doesn't have a control variable. Any simple DO groups that happen to contain the LEAVE and be contained within the outer loop would also be terminated. If more than one repetitive loop has the same control variable, only the innermost one that contains the LEAVE is terminated. The name that may be specified is treated as a symbol, in that the substitution is performed in case it happens to be a compound symbol. That is, the symbol must match the symbol used for the control variable exactly, except for case. In other words, something like: DO a.3 = 1 TO 10 ... i = 3 LEAVE a.i END would not work. But it is rather uncommon, though legal, to use compound variables as loop control variables, so one isn't too likely to make this mistake. ITERATE is the other instruction that can be used within a repetitive loop to alter the normal flow of control. Unlike LEAVE, which terminates the loop, ITERATE in effect branches to the END which closes the loop and therefore causes immediate incrementing of the control variable (if any) and application of the normal tests for loop termination. Unless one of these tests causes termination, the body of the loop is then entered again from the beginning. If an UNTIL conditional is present on the DO instruction, it will be tested first, before the control variable is incremented. After that, the sequence of execution is as described earlier for incrementing the control variable and testing the loop conditions. ITERATE is often used to skip the rest of the body of a loop if some condition indicates that to do so would be unfruitment. The only alternative would be to place the rest of the loop within a DO...END pair. For instance, if a program is prompting for input, a null input line would be a cause to ITERATE: DO FOREVER SAY 'Enter a command:' PARSE PULL input IF input = '' THEN ITERATE ... END Or perhaps a program is scanning a file for lines of a particular type: DO WHILE LINES(file) \\ = 0 PARSE VALUE LINEIN(file) WITH keyword . IF keyword \\= 'Name:' THEN ITERATE ... END The rules for [the] use of ITERATE are just like those for LEAVE. For instance, it can only be used inside of an active repetitive DO loop. ITERATE will skip any enclosing simple DO groups to branch to the end of the nearest repetitive group. It has no effect on DO loops which are not active because they are not executing within the current procedure. If ITERATE is contained within nested repetitive loops, it affects only the inntermost one, unless the name of a control variable is included on the ITERATE instruction. In that case, it effectively branches to the END statement corresponding to the inntermost containing loop which has the specified control variable. Any intervening repetitive (or simple) DO groups are thereby terminated, and their control variables are left with the values they had when the ITERATE was executed. @endnode @node Cont3 "THE SIGNAL INSTRUCTION" The various control structures discussed so far cover most programming needs. There are times, however, when nothing less than a good, old- fashioned GOTO will do. REXX avoids the problem of implementing an instruction which some people view with suspicion by calling it SIGNAL instead of GOTO. There are actually several forms of SIGNAL. One form is used to enable and disable the handling of exceptional conditions (sometimes called signals). This is discussed at length in @{"Chapter 11" link Exce}. The other form of the instruction is provided to allow for explicit, direct transfer of control to another location in a program. You may think of this, if you like, a simply providing a way for the programmer to implement his or her own private exceptional condition types. The syntax of this form of SIGNAL is: SIGNAL label or: SIGNAL [VALUE] expression In the first case, label is a symbol or a literal string. It is taken as a constant, and refers to a label within the program. This usage is exactly like the way labels are used in procedure calls. In particular, if the label occurs twice in the program, only the first occurrence is used as the target of a SIGNAL. In the second case, the target of the SIGNAL is computed at run-time as the value of expression. VALUE needs to be used in the instruction if expression begins with a symbol or literal string (instead of a special character, such as a parenthesis). If you use a literal or an expression with SIGNAL, the literal or the value of the expression should be uppercase, since labels are always considered to be uppercase. When SIGNAL is executed, control transfers immediately to the instruction following the specified label. There are, in addition, certain side effects of SIGNAL. Namely, any IF, SELECT, or DO instruction that may be active is terminated. This is true no matter how deeply such instructions are nested in the current procedure. If any INTERPRET instructions are active, they too will be terminated. On the other hand, the current procedure is not terminated. There is no way by the use of SIGNAL alone to get out of a subprocedure. This can be a problem sometimes, since it means that there is no easy way to get back to the top of a program from inside a deep nest of subprocedure calls. This behaviour is consistent with the way that signals are handled in connection with exceptional conditions (@{"Chapter 11" link Exce}). The fact that SIGNAL does not cause termination of the active procedure is one way in which it is slightly less powerful than an unconditional GOTO. The fact that it does terminate active IF, SELECT, and DO instructions is another way, because it means that you cannot transfer from one place to another within a DO or SELECT instruction. Actually, most implementations of REXX allow labels inside of IF, SELECT, and DO instructions, but there is little of use that can be done with them. You may even be able to use a SIGNAL instruction to such a label without immediate error. However, the DO instruction will actually have been terminated, and an error will occur when the first END statement is encountered. You should be careful in placement of labels which will be the target of a SIGNAL. Although they may be anywhere in the program, even inside a remote DO loop, the context of execution will still be the active subroutine when SIGNAL is used. The primary use of SIGNAL is to allow programmers to define their own exceptional conditions analogous to REXX's built-in conditions. For instance, in a data entry program, such a condition might be "invalid input". The condition could be raised any time the entry of invalid data is detected with: SIGNAL invalid_input The code following the invalid_input: label would probably display an appropriate message, explain what the problem is, and request reentry of the data. SIGNAL is often reserved for fairly serious problems. For instance, in a program that usually runs unattended, the code to which control is transferred by SIGNAL may simply record diagnostic information, perform cleanup, and then do an EXIT. (You could also call a subroutine to do this.) Handlers for any of the REXX built-in conditions usually end with a SIGNAL to resume normal execution ofthe program. It is really the only alternative to EXIT or RETURN in this case. There are other valid uses of SIGNAL completely unrelated to condition handling. For instance, it can be used to perform the equivalent of a very large SELECT statement with lower overhead. Remember that SELECT must test a number of logical expressions in turn until it finds one that is true. If there are scores, or even hundreds of them, this can be very time- consuming. SIGNAL can be used, instead, to construct an n-way branch: SIGNAL VALUE 'CASE'n case1: ... SIGNAL end_case case2: ... SIGNAL end_case case3: ... SIGNAL end_case ... end_case: is largely equivalent to: SELECT WHEN n = 1 THEN ... WHEN n = 2 THEN ... WHEN n = 3 THEN ... END but much faster. Just remember that, unfortunately, SIGNAL cannot be used inside of a DO loop because it terminates the loop. @endnode @node Subr "5: SUBROUTINES AND FUNCTIONS" Most modern programming languages have, in addition to the control structures described in the @{"last chapter" link Cont}, the concept of a subroutine. This is special kind of control structure which allows a group of instructions to be called from many places within a program without having to actually be duplicated each time it is used (as is done with macros or inline functions in some languages). The use of subroutines makes it possible to decompose the design of a program into small functional units which are easily reusable. Because each subroutine (ideally) does one thing well and has explicit, well-defined interfaces to the rest of the program, it is easy to debug. A program can be designed as a hierarchy of subroutine calls in a way that matches the natural hierarchical structure of the problem being solved. REXX, of course, supports subroutines. A distinction is made between subroutines that return a value and those that do not. However, in REXX this distinction is not very hard and fast. Generally, a subroutine is invoked with the CALL instruction, which has the format: CALL name [expression] [, expression] ... Here, name is the name of the subroutine. name may be a symbol or a character string literal. In either case, the value is taken literally - even if it a possible variable name, it is not evaluated. The subroutine itself may be internal, external, or built-in, depending on where the code for the subroutine is located. A subroutine may return a value, but this is not required. If it does return a value, that value is left in the special variable RESULT after the CALL. A function, on the other hand, is invoked by its occurrence in an expression. A function call is recognized as being a symbol or character string literal followed immediately (with no intervening blank) by a left parenthesis. This is followed in turn by zero or more expressions, separated by commas, and finally ends with a right parenthesis. Thus, a function reference has the form: name( [expression] [, expression] ... ) It is required that a function called in this way return a value, and an error will occur if it does not. The RESULT special variable is not assigned in this case. Functions, like subroutines, can be either internal, external, or built-in. Note that there are no parentheses around the argument list in a CALL instruction (though there may be around each individual argument expression). This is a source of frequent confusion for beginning REXX users. You can fall into this trap unwittingly, because the following is is legal: CALL something (argument) (since the parentheses merely enclose the first argument), while the similar statement: CALL something (argument_1, argument_2) is definitely illegal. Subroutines and functions are often simply called procedures when the distinction is not important. The reason that the distinction between subroutines and functions is less important in REXX than in other languages is that there are no declarations anywhere that limit a procedure to being one or the other. A procedure can very well return a a value at all times and thus be usable as either subroutine or a procedure. Or it can return a value sometimes and not others. Or it can even determine (with the PARSE SOURCE instruction) how it was invoked and return a value only if necessary. @{" BUILT-IN, INTERNAL, AND EXTERNAL PROCEDURES " link Subr1} @{" PASSING ARGUMENTS AND RETURNING VALUES " link Subr2} @{" SCOPE OF VARIABLES " link Subr3} @{" EXECUTE STATE PRESERVED AROUND PROCEDURE CALLS " link Subr4} @endnode @node Subr1 "BUILT-IN, INTERNAL, AND EXTERNAL PROCEDURES" A much more important distinct1ion turns on the location of the code for the procedure. Many procedures are an integral part of the language. That is, they are always supplied with the language (if it is a complete implementation), they all accept the same arguments, and they (ideally) always operate in the same way. These are the built-in functions, and there are about 66 of them in standard REXX. (Different levels of the language specification have added or removed certain functions.) Specific implementations of REXX always provide additional built-in functions to take advantage of particular features of the environment. Although referred to as functions, these built-in procedures can always be called as subroutines. Internal procedures are those which are supplied by the programmer in the same file as the rest of the program. In this case, the name of the procedure corresponds to a label within the file. A label is simply a clause which consists of a symbol followed by a colon, which marks the end of the clause. The label can be any valid symbol; it doesn't need to begin with an alphabetic character or be acceptable as a variable name. If there are duplicate labels wJthin a file, only the first is ever used as the start of a procedure. The second occurrence of a label is not an error, but it can never be reached by a CALL statement or function reference. The procedure begins at the first clause following the label. On the other hand, the end of a procedure is not syntactically delimited at all. That is, there is no syntactic indication of the end. Of course, the RETURN instruction causes control to return from a procedure, but it does not necessarily mark the end of the code of the procedure. By use of the SIGNAL instruction, control could jump all over the source of a program and still remain with a procedure (though this would be considered [to be] very bad form). Procedures can also overlap; that is, one procedure could "flow into" another without the label of the second procedure being invoked by a CALL or function reference. Technically, control would still be within the first procedure. While this would ordinarily be a suspicious usage, it could be employed to allow for alternate entry points to the same body of code (perhaps with different arguments). External procedures, finally, are located somewhere outside the program source file. Exactly where is implementation-dependent. They may be written in REXX and contained in other source files, to be loaded by the operating system when required. Or they may be written in other languages and linked to by a variety of system-specific mechanisms (function packages in CMS and MS-DOS, DLLs in OS/2, etc.). The way that the name of the procedure as used in a program is related to the external function is also system-dependent. It might, for instance, be the same as the name on a particular file, or the name of an entry point in a dynamic link library. As you can see, the same name might possibly be used to refer to a procedure that is any of these three types. So the language specifies a search order that defines the order in which each possibility will be tested for. The rule is that if the name of the procedure, as specified in the CALL or function reference, is a symbol, then internal procedures are searched for first, followed by built-in procedures, followed by external procedures. This rule gives a programmer the freedom to override the definition of any built-in or external procedure at will, simply by coding an internal procedure of the same name. So if you don't like the way the SUBSTR() function works, for instance, you can write your own. However, in writing your enhanced replacement for SUBSTR(), you may find that in many or most cases you just want to use the built-in function. This possibility is provided for by the rule that if the name of the procedure in the CALL or function reference is a quoted string, then internal procedures are excluded from consideration, and only built-in built-in procedures (first) or external procedures will be invoked. There are no additional rules that would allow you to override built-in procedures with external ones. This generally works in your best interest, because stray files or libraries floating around in your computing environment can't accidentally subvert a working REXX program. (It also makes deliberate subversion by "Trojan horse" programs a little less likely.) But on the other hand, if you develop a whiz-bang extension to SUBSTR(), there's no way to incorporate it in all your REXX programs except by physically copying it into them. @endnode @node Subr2 "PASSING ARGUMENTS AND RETURNING VALUES" One thing that is common to the notion of a subroutine in most programming languages is the ability to pass values called arguments to the subroutine. This notion is derived from the mathematical concept of a function which computes some value based on the value of zero or more arguments. REXX, of course, supports argument passing. It turns out that there are subtle differences in the ways that different languages implement the passing of arguments. Sometimes the arguments are said to be passed by reference. This means that the subroutine gets access to the actual variable in the calling routine and is able to modify it directly (assuming an actual variable was passed rather than a constant or some computed quantity). There are both advantages and disadvantages to this approacn. One disadvantage is that the subroutine can have side effects, in that it can modify some of the caller's variables in addition to simply returning a value. But this can also be an advantage, especially when arrays are involved, since in many languages passing and returning entire arrays is inefficient or impossible. An alternative is that arguments are passed by value. This means only the values of variables are passed, and the subroutine is not able to modify the caller's variables by modifying the arguments. In REXX, arguments are always passed by value. There are other means, as we shall see, that enable a subroutine to modify variables belonging to the caller, if necessary. Another consequence of passing arguments by value is that the values are completely determined at the time of the call. Even if a variable passed as an argument is changed within the subroutine, any reference to the argument will retrieve the original value. For example, in: x = 1 CALL example_sub x ... example_sub: ARG y SAY x y x = 2 ARG z SAY x y z the first SAY instruction displays 1 1, because y has been set to the value of the argument, which is 1. The second SAY displays 2 1 1. Although x has been reassigned the value 2, when the value of the argument is assigned to z, we find it is still 1. In other words, no matter what is done to the variable whose value was passed as an argument, the argument value itself does not change. Further, variables which receive the argument value, like y and z, do not become "aliases" for x. In fact, it is incorrect to think of x as the argument. Strictly speaking, it is the value of x which is the argument. This is what call by value means. There are basically two ways in REXX that a subroutine can gain access to the arguments that are passed. The first is with the PARSE instruction: CALL example_sub this, that, the_other ... example_sub: PARSE ARG first, second, third PARSE is used in REXX for many things besides parsing character strings, and this is an example. PARSE is able to take one or more character strings and assign them, or portions of them, to variables. Where the strings come from is determined by the word following PARSE, in this case ARG. In the present case, ARG directs PARSE to operate on the argument strings passed to the subroutine. Here, the strings that are the values of the values of this, that, and the_other are assigned to first, second, and third, respectively. Notice the commas in the PARSE statement of this example. They are quite important. The example could have had: PARSE ARG first second third and still been perfectly legal. But what this does is to parse only the first argument string (ie., the value of this). It assigns the first blank-delimited word to first, the second word to second, and the rest to third. The other two argument strings are simply ignored. This is a frequent problem for beginning REXX programmers. In general, you will stay out of trouble by remembering the rule that the PARSE ARG instruction should have as many commas as the corresponding CALL instruction or function reference. While there may be occasions when more or fewer commas would be called for, try to follow the rule unless you know exactly what you're doing. There is an ARG instruction which is a shorthand form of PARSE UPPER ARG. It is intended to give the appearance, when used at the beginning of a subroutine, of being a declaration of the routine's arguments: example_sub: ARG first, second, third It is, however, just another executable statement, not a declaration, and it has the usually undesirable side effect of converting all strings to uppercase before parsing them and assigning [them] to first, second, and third. (It is a holdover from an earlier era of computing when it was customary to deal mostly with uppercased characters and strings.) The ARG or PARSE ARG statement need not occur at the beginning of the subroutine. It could be anywhere in the subroutine, and it could even be used several times to parse the arguments in different ways. There is another, completely different, way for a subroutine to access its arguments: with the ARG() built-in function. This has the form: ARG([argument_number], [option]) argument_number is the number of the argument to be returned. It must be a positive whole number. Thus: CALL example_sub this, that, the_other example_sub: first = ARG(1) second = ARG(2) third = ARG(3) is completely equivalent to the original example. In general, PARSE ARG will be slightly more efficient, but the use of ARG() is slightly less subject to introducing mysterious bugs due to forgetting a comma or two. Sometimes it is convenient to use ARG() directly in expressions. Also, ARG() makes it possible to distinguish between arguments that are null strings and arguments that are completely omitted. First, we have to explain about omitted arguments. Syntactically it is always permissible to supply any number of arguments (up to some implementation-defined limit) to any procedure. Whether the procedure will use all [of the] supplied arguments, or may require more than have been supplied, is completely up to the procedure to decide. If arguments before the final one are to be omitted, this is indicated by supplying just the comma that separates one argument from the next. So in: CALL example_sub this, , the_other the second argument has been omitted. If all arguments after some point are to be omitted, the commas can be left in or omitted, however you prefer. (Except you must remember that a comma which is the last thing on a line means continuation rather than an argument separator.) So: CALL example_sub this passes just the first argument. If the @{i}n@{ui}th argument has been omitted, then the value of ARG(n) will be a null string. This is the same as its value would be if a null string were passed explicitly. For example, in both: CALL example_sub and: CALL example_sub "" the value of ARG(1) is "" (a null string). So how do you distinguish between these two cases? That is the purpose of the second optional argument of ARG: the value of option can be e ("exists"), or o ("omitted"). ARG(n, 'e') has the value 1 if the @{i}n@{ui}th argument exists or 0 if it has been omitted. ARG(n, 'o') has the value 1 if the @{i}n@{ui}th argument has been omitted or 0 if not. There is no way to detect omitted arguments with parse arg. Variables corresponding to omitted arguments are simply assigned null strings. If ARG()is called with no arguments at all, it returns the number of arguments passed to the procedure. More precisely, it returns the number of the last argument which was explicitly passed. So if you had: CALL example_sub this, , the_other ARG() would have the value 3 even though only two arguments were actually passed. You can use the value of ARG() as a quick test for possibly omitted arguments, but you should use the above more elaborate test in most cases. In other words, if ARG() has the value 3, you may be sure that the third argument was passed explicitly, and that nothing after the third one was; but you would not know about the first and second arguments. Built-in functions have a special status as far as the omission of arguments is concerned. Part of the definition of a built-in function is a specification of which arguments are optional and may be omitted. In practice, all optional arguments occur after the required ones, though this is not a requirement of the language. If any required argument is omitted, an error will automatically be generated. For user-written procedures, either internal or external, it is up to the prcedure itself define what is meant by a "required" argument, and what to do if it has been omitted. The process of returning values from a procedure is a much less complex matter than the passing of arguments. You simply use the RETURN instruction, which has the form: RETURN [expression] The expression on a RETURN instruction is optional for a procedure invoked with CALL, but it is required if the procedure was invoked in a function reference. The expression is computed from values of the current generation of variables plus exposed variables (see the next section on [the] scope of variables). All variables in the current generation are then dropped, and control returns to the point at which the procedure was invoked. If the procedure was invoked by a CALL statement, the special variable RESULT is assigned the value. Otherwise, the value becomes the value of the function reference, and RESULT is not changed. If the procedure was invoked by CALL and no expression is provided on RETURN, then RESULT is dropped and becomes undefined. The EXIT instruction is like RETURN, except that it terminates the whole REXX program (contained in a single source file), and returns to the next higher level. The format of EXIT is just like RETURN: EXIT [expression] In case the current REXX program is an external routine called by another REXX program, the point of invocation of the external procedure is where control returns as a result of EXIT. Just as with RETURN, if the procedure was invoked by a function reference, the expression is required and its value becomes the value of the function reference, and RESULT is not affected. If invocation was by CALL, expression isn't required. RESULT is assigned the value of the expression if it is present; otherwise it is dropped. If RETURN is issued from the topmost procedure in a REXX program (file), it behaves just like EXIT and terminates the program. If the end of the file is encountered when executing any procedure in a REXX program, that is considered to be an implicit EXIT, and the program will terminate. Since no return value was provided, an error will occur if the program was invoked as an external function by another REXX program. REXX programs can also be invoked by other REXX programs in system- dependent ways. Sometimes the REXX programs are handled just like other system commands and are simply invoked in the usual REXX way via the ADDRESS instruction or because the statement is not an assignment and does not begin with a REXX keyword. In this case, the expression used on the EXIT or RETURN instruction may be assigned to the RC special variable, just like the return code from a system command. Depending on the system, it may be required to be a numeric value in this case. @endnode @node Subr3 "SCOPE OF VARIABLES" Another important issue which arises with procedures in all languages which recognize the concept is that of the scope of variables. This is the question of whether instructions in a subroutine can "see" variables belonging to the caller. In some languages such as PL/I, a subroutine can see its caller's variables (at least if the subroutine is nested within the calling procedure), while this is not the case in other languages such as C. In those languages which have this feature, it is referred to as lexical scoping, because the region of the program within which variable names are recognized as referring to the same data is determined syntactically by the physical location of the code. REXX in general pays little attention to the location of code. As we have seen, a subroutine can have code scattered all over a program, and subroutines can even overlap. The question of which subroutine has control at any particular time is therefore determined dynamically by the flow of control. The very same instruction may at one time be part of one subroutine, and later part of a different subroutine. (REXX is rather like assembly language in this regard.) Correspondingly, the scope of a variable name must be determined dynamica]ly rather than statically. REXX is, however, more like PL/I than C, in that a subroutine can in general "see" variables belonging to its caller. But it is recognized that this can be undesirable in practice, because it enables subroutines to have (unintended) side effects. For instance, it is common to use very simple variable names like I for loop control variables. If a subroutine is called in such a loop and also uses I, obscure bugs can easily be introduced. To limit this kind of exposure, REXX provides the PROCEDURE instruction, whose purpose is to prevent access of a subroutine to its caller's variables. PROCEDURE is entirely optional, but, if used, it must immediately follow the label at the beginning ofthe subroutine:' x = 5 CALL some_sub ... some_sub: PROCEDURE X = -5 ... In this example, [the] references to the variable X are to entirely different data items. The some_sub procedure cannot access any of its caller's variables. Although the appearance of PROCEDURE here is similar to a declaration in other languages, it is, like all REXX instructions, an executable statement. Only the rule that it must occur immediately after a label prevents PROCEDURE from being used in an IF statement (for instance). PROCEDURE actually does something when it is executed: it creates a new generation of variables, completely distinct from variables of the calling routine. All variables of this new generation that are created in the process of executing the subroutine are automatically dropped when RETURN"is executed (as if the DROP instruction had been used). But sometimes a subroutine has a real need to access a few variables of its caller, while desiring to avoid disturbing the majority of them. Normally, it would be best to pass such data to the subroutine through the argument list, as long as only a few variaJJles are involved and they are merely to be read, not updated. However, if the caller's data has to be changed, or a number of variables are involved, some alternative is needed. So what PROCEDURE takes away with one hand it can give back with the other by means of the EXPOSE keyword. If the preceding example had been: x = 5 CALL some_sub ... some_sub: PROCEDURE EXPOSE X x = -5 ... then the symbol X refers to the same variable in both the calling and the called procedure. The syntax of the PROCEDURE instruction is: PROCEDURE [EXPOSE name1 name2 ...] where any number of variable names may be listed. The names may refer to simple variables, compound variables, or stems. It is immaterial whether the variables named have already been assigned a value or not. When a stem is used, every possible variable having that stem is exposed. One or more variables in the list following EXPOSE may be enclosed in parentheses (only one variable per pair of parentheses). This has an additional effect beyond exposing the named variable. It is assumed that the variable has been assigned a value which is another list of variable names, similar in form to the list following EXPOSE, except that it may not contain further parenthesized names. All the variables named in that list are also then exposed. There are several reasons for this provision of specifying parenthesized names in an EXPOSE list. First, it facilitates handling lengthy lists of variables to be exposed, at least if the same list is used repeatedly. The most important case where you might have a long list of variables is when you want a number of variables to be global to an entire program. Such variables might include initialized tables of data, constants (eg. the number pi), quantities which are assigned only once but needed in many places, and any other kind of data that might have to be communicated between widely separated places in a program. When a variable is named in an EXPOSE list, it effectively becomes a part of the current procedure's generation of data, so it can be accessed or reassigned at will, and any changes to it persist after return from the subroutine. It can also be exposed in calls to lower level routines. In fact, it must be exposed in every level of subroutine called if it is to be available at the lowest level. This is why it's especially convenient to be able to list the names of truly global variables as the value of a single variable, and expose them using the parenthesized name rule. Here's a skeletal example of handling global data this way: /* my program */ PARSE ARG main_arg1 main_arg2 main_arg3 CALL initialize ... CALL sub1 ... sub1: PROCEDURE EXPOSE (globals) ... CALL sub2 ... sub2: PROCEDURE EXPOSE (globals) ... initialize: PROCEDURE EXPOSE (globals) globals = 'main_arg1 main_arg2 main_arg3 '||, 'screen_height screen_width message.' PARSE VALUE SCRSIZE() WITH screen_height screen_width message.1 = ... message.2 = ... ... (In this example, SCRSIZE() is a system-specific function which returns the height and width of the screen. It is not a standard part of the REXX language.) Here the arguments to the main routine (main_arg1, main_arg2, main_arg3) may be referred to anywhere in the program. The use of a procedure called initialize allows data to be initialized at the beginning of a program, without cluttering up the main code with a lot of boring assignments. One or more data tables can be defined with essentially static data. (This is the closest you can get in REXX to what is called COMMON data in FORTRAN.) There's another quite different use for parenthesized names in an EXPOSE list. It concerns the ability to write general-purpose subroutines that operate on arrays. The problem arises because REXX has no mechanism for for passing arguments by reference, as we mentioned at the beginning of this chapter. It is not enough simply to pass the name of a stem to a subroutine. Consider the following attempt at writing a generalized sort routine: array. = 0 array.1 = ... array.2 = ... ... CALL sort array. ... sort: PROCEDURE PARSE ARG x ... IF x.i < x.j THEN ... This simply does not work. In the first place, the argument passed to the subroutine is 0, which is assigned to x. We might not have done the assignment array. = 0, but then the string ARRAY. would have been passed and assigned to x. But that's irrelevant anyway, since a reference like x.i doesn't care about the value of x. If instead we had done PARSE ARG x. in the sort routine, then the stem x. would have been initialized to the passed value, but it still would not help. One solution to this problem involves the INTERPRET instruction, which creates a little REXX program on the fly and executes it. That is, we could say: INTERPRET 'if' x'i <' x'j THEN ...' In the expression following INTERPRET, x is not part of a quoted string, so that it can be evaluated. IF the value of x is array., then the whole expression evaluates to: IF array.i < array.j THEN... which is just what we need.This will work, but it's very clumsy, and very slow besides - not what one wants in a general purpose sort routine. There is another solution which involves the VALUE() built-in function. VALUE() exists in large part to overcome some of the problems with INTERPRET in simple cases. With an optional second argument, VALUE() is able to set the value of the variable named in the first argument. So we could say: IF VALUE(x'i') < VALUE(x'j') THEN ... and chieve the desired effect. But one problem remains. Since there is a PROCEDURE instruction following the label sort, array.i in the subroutine isn't the same as array.i in the calling routine. We could dispense with the PROCEDURE, of course, or we could EXPOSE the name of every possible array we might want to sort. The alternative is: array. = 0 array.1 = ... array.2 = ... ... argname = 'array.' CALL sort ... sort: PROCEDURE EXPOSE (argname) ... IF VALUE(argname'i') < VALUE(argname'j') THEN ... At last we have something that is semireasonable. It's still clumsy, since the array name isn't passed in the call to sort itself, and we still have to use VALUE(); but it works, and it's the best that can be done along these lines in REXX currently. There are a couple of other subtleties to note in exposing variables. First, the names listed in an EXPOSE list are exposed in order from left to right. This is important, because the names exposed are actually the derived variable names, that is, the names obtained after any required substitution in the tail part of the name. In the statement: PROCEDURE EXPOSE name birthday.name the simple variable NAME is first exposed. Suppose it has the value Lisa. Then the variable which is exposed next is BIRTHDAY.Lisa, which is the derived name. If instead we had: PROCEDURE EXPOSE birthday.name name then BIRTHDAY.NAME would have been exposed instead. The other tricky area with exposed variables is the issue of dropping them. Recall that the DROP instruction can be used to undefine particular variables, or all variables with a given stem. Any variables named or implied in a DROP instruction, and which happen to be exposed, are dropped not only in the subroutine, but in the calling routine as well. So if we had: a = 1 b. = 2 c. = 3 CALL mysub SAY a b. b.2 c. c.1 c.2 ... mysub: PROCEDURE EXPOSE a b. c.1 ... DROP a b. c. RETURN then on return from mysub the SAY instruction will produce A B. B.2 3 C.1 3. Notice that the stem B. and all possible variables with that stem have been returned to an undefined state, since the whole stem was exposed. However, since only the single variable C.1 was exposed, it is the only one with the stem C. that becomes undefined in the calling procedure. A similar effect occurs when values are assigned to a stem which has been exposed. The definition of assignment to a stem says that all variables having the stem are first dropped before the new value is assigned. So, if our example were slightly different: a = 1 b. = 2 c. = 3 CALL mysub SAY a b. b.2 c. c.1 c.2 ... mysub: PROCEDURE EXPOSE a b. c.1 ... a = 4 b. = 5 c. = 6 RETURN then the SAY statement would produce 4 5 5 3 6 3. It is an important fact that all variables of a subroutine which starts with a PROCEDURE statement are dropped when the procedure returns, except those variables or stems which have been exposed. Conceptually, the exposed variables really belong to the earlier generation of variables, unlike the unexposed variables. So it is only the variables of the newest generation that are dropped when a procedure returns. @endnode @node Subr4 "EXECUTION STATE PRESERVED AROUND PROCEDURE CALLS" The current generation of variables belonging to an active procedure may be considered to be a part of the state of execution associated with the procedure (assuming the procedure started with a PROCEDURE instruction). There are certain other things that are also part of the state of a procedure. These things are associated with a particular invocation of the procedure, and will in general be different if the procedure has multiple invocations as a result of recursion. When a procedure is called, the state of the calling procedure is inherited. However, any changes to the state remain in effect only until the called procedure returns. Because state information is preserved in this way, one source of undesirable side effects caused by actions of a subroutine is eliminated. For instance, you can use a TRACE instruction in a subroutine to initiate some form of tracing. Even if you don't subsequently turn tracing off, as soon as the subroutine returns, the previous state of tracing will be restored. Or a subroutine could turn tracing off after it has been debugged, while allowing tracing to proceed in the calling routine. Some of the other state information that is preserved in this way across procedure calls includes: * The setting of the default command environment established by the ADDRESS instruction * All settings controlled by the NUMERIC instruction (DIGITS, FORM, FUZZ) * The status of condition traps enabled or disabled by SIGNAL ON and CALL ON * The value of an elapsed time clock started with a call to TIME('e') Of course, the fact that program state is handled in this manner means that it is not possible to initialize such values in a subroutine for use elsewhere in the program. If any of these settings need to be done globally, they must be done in the top-level procedure of the program. It should also be pointed out that such program state information, just like variable values, is not inherited by external procedures that are called. The theory is that all external REXX programs should begin with the same set of default state values, so that they may assume [that] the defaults are in effect, and their behavior does not depend on conditions in the calling program, except for the values of arguments passed. A related matter is the handling of exceptional conditions and the SIGNAL instruction in connection with the execution of procedures. Exceptional conditions are such things as program syntax errors or attempts to use undefined variables. The SIGNAL instruction, described in the @{"preceding chapter" link Cont}, is a way of generating your own conditions of this sort. We will discuss the standard exceptional conditions at greater length later, but basically what happens is that when an enabled condition occurs, control is immediately transferrod to a specific label in the program. Any DO loops or other control strutures in the current procedure which were active at the time the condition was raised are deactivated. However, control structures, including DO loops, in higher-level procedures are not affected. In fact, the procedure which was active remains active, regardless of where in the program the label to which control is transferred is located. You must still explicitly return from a subroutine after a condition is signalled. This can make it difficult to handle exceptional conditions which occur in deeply nested subroutines, since it can be difficult to determine exactly where you were in terms of the subroutine-calling hierarchy. Thus, you may not be able to do much useful processing in a single handler intended for use throughout an entire program. There are certain features that mitigate this problom, such as the ability to handle a condition via a CALL ON instruction rather than SIGNAL ON. This allows the condition to be handlod in a new procedure which can return to the point at which the condition was raised by a RETURN instruction. You can also designate alternate labels to handle the same condition, and thereby provide for alternate processing of a given condition dependent on where you are in the program. Sometimes, when you can anticipate that certain serious exceptional conditions are possible, you can structure your program as one or more external procedures called from a separate main program. In the event of a serious problem, you would simply EXIT from the external subprocedure and allow the main program to restart from a known state. @endnode @node Exte "6: COMMANDS TO EXTERNAL ENVIRONMENTS" REXX is unique among general-purpose programming languages in that it has a fully-integrated capability for passing commands to external environments. That is, a REXX program can issue commands which are directed to and acted upon by some system or application software other than REXX itself. The most common example of an external environment is the operating system. The most common use of REXX, and the use for which it was originally designed, is to issue a sequence of commands to the operating system. Operating system command languages, also known as batch or procedure [or script] languages, existed long before REXX, of course. The original purpose of command languages was to allow users to collect groups of system commands into a file, for submission to the operating system as a batch. This makes it possible to invoke a group of commands required for some task with a single new command and greatly reduces the effort required to perform common repetitive tasks. Even the earliest command languages made it possible for users to extend the operating system with new commands and to automate routine command sequences. As system command languages evolved, they took on more of the characteristics of general-purpose programming languages. For instance, they added conditional execution and looping capabilities. Sometimes they did this by adding new system commands which actually implemented control contructs with new system commands that could be used in procedures to provide conditionals and looping. (The MS-DOS IF and FOR commands are examples of this.) Usually, however, the trend was to add features to the language itself, as something separate and distinct from commands of the system. REXX is a very advanced example of this latter trend. There is a very clear and distinct separation between REXX statements, as represented by REXX instructions and assignments, on one hand, and commands on the other. Nevertheless, commands are recognized as a third type of statement in REXX. Therefore, they are very naturally and easily incorporated into REXX programs, in a way that makes them appear to be almost an extension of the language itself, even though they are largely processed by some other software. Commands in REXX are not limited to system commands, either. Another development that also began a long time ago was for application software such as databases and file editors to be driven by commands, and to allow for such commands to be processed in a batch just like system commands. Since the situation is so similar, it was only natural for such application macro languages to evolve very much like system command languages did. Eventually, the logic of using the same language for both purposes became apparent. In the VM/CMS system, where REXX originated, this fact was recognized by the provision of application programming interfaces which made it possible for one language to serve easily as both a system command language and [as an] application macro language. Unfortunately, other operating systems did not appreciate the importance of this so quickly. Because of this and because applications often need to be portable, so that they cannot depend on operating system features for critical functions, most application software has tended to implement its own unique, proprietary language. This problem of a multiplicity of application-specific languages is only now starting to be addressed. Not surprisingly, REXX is one of the primary languages which may become used for both application and system command procedures, even across different computing platforms. @{" THE ADDRESS INSTRUCTION " link Exte1} @{" COMMAND RETURN CODES " link Exte2} @endnode @node Exte1 "THE ADDRESS INSTRUCTION" As we've seen, any REXX statement that does not begin with a REXX keyword and which is not an assignment is automatically classified as a command and submitted to an external environment for processing. The main issue, then, is to identify which environment the command should be sent to. For every REXX program there is always a default external environment for handling commands. Exactly what the environment is depends very much on the particular implementation of REXX. But for any given implementation, the default environment usually depends on how the REXX program is started. For instance, if the program is started by the operating system (usually because a user invoked it as a command from a system command line), the default environment is the operating system itself. If the program was started by an application, such as a communication program or text editor, the application is (very likely) the default environment. Usually, the application programming interface which permits an application to start a REXX program provides some means for giving a name to the default environment and for setting up the details of how the REXX processor actually passes the command to the environment. These details, of course, are system-dependent, and ordinarily not of concern to the REXX programmer. In the simplest cases, the REXX program does not need to be aware of the name of the external environment. However, it may well be true that more than one possible external environment is available. For instance, if the REXX program was started from an application, the system command environment is probably still available as well. Some systems, like VM/CMS, even have more than one system command environment, with subtle differences in command handling between the alternatives. (In CMS, the alternatives are known as CMS and COMMAND.) Many systems also make it possible for multiple application environments to be available simultaneously as well. In fact, this provision may place REXX in the position of being able to allow for easy communication between applications. Any time [that] a REXX program needs to send commands to more than one environment, some means of specifying which environment is obviously needed. That is the function of the ADDRESS instruction. There are basically two forms of this. The simplest is: ADDRESS environment where environment is a symbol or literal string which is taken as the name of an external environment. The symbol is taken literally (so no substitution is performed), but case is ignored. Examples: ADDRESS command ADDRESS 'REXXTERM' If the named environment exists, it becomes the new default environment to which subsequent commands will be passed for execution. If the named environment doesn't exist, the action of the instruction is undefined. Since some REXX implementations allow new environments to be created on the fly, it may be impossible to determine, until a command needs to be executed, that the proper environment doesn't exist. There is no language- defined method of determining what environment names are actually available. It is also possible to compute the environment name at run-time by using an expression after the ADDRESS keyword. The first token of the expression must be something other than a symbol or literal, or else the expression must be preceded with the word VALUE: ADDRESS (expression) or: ADDRESS VALUE expression This has exactly the same effect as before, except the value of expression (which could simply be a variable name) is used as the name of the new default environment. Examples: ADDRESS (editor_environment()) ADDRESS VALUE environment_of('pliopt') The ADDRESS instruction can also be used all by itself with no name or expression in order to reinstate the last previous default environment. This allows you to toggle between two external environments, or to restore the previous environment without explicitly having to save its name. There is an ADDRESS() built-in function which returns a character string containing the name of the current default environment. In addition to allowing you to save the name for later use, this can be helpful in cases where your program needs to behave differently if invoked under different circumstances. The names of the last two default environments are included in the state information that is saved across procedure calls. A called procedure inherits these names, but make arbitrary changes to the default environment without effect on the calling procedure. The second form of the ADDRESS instruction allows you to specify both an environment name and a command to be executed. The form is: ADDRESS environment command-expression As before, environment must be a symbol or literal string (but not an expression). It is the name of the external environment to which command- expression is to be sent for execution. This environment is used for only one command. The default environment is not affected. This form of ADDRESS executes a command exactly as if the command occurred by itself and the default environment was the one named. Examples: ADDRESS COMMAND "copy" file1 file2 ">nil:" ADDRESS "REXXTERM" "match please log in:" Whether or not you use the ADDRESS instruction explicitly to execute external commands, you should observe one simple rule for writing the command: it is always a good idea to enclose all nonvariable parts of the command in quotation marks, even if that is only the name of the command. For instance, always use: 'copy' file1 file2 rather than: copy file1 file2 The reason for this is twofold. First, it is always possible that copy may have been used elsewhere in the program as a variable. Without the quotation marks, the value of copy would be substituted, producing an entirely different command. Second, future versions of the REXX language may conceivably introduce COPY as a new instruction keyword. The quotation marks, again, will prevent any misinterpretation. If you do insist on living dangerously and don't put quotes around commands, you must at the very least avoid using a command name which is the same as a REXX keyword. The following are REXX keywords which must be quoted if used at the beginning of a command: ADDRESS ARG CALL DO DROP ELSE END EXIT IF INTERPRET ITERATE LEAVE NOP NUMERIC OPTIONS OTHERWISE PARSE PROCEDURE PULL PUSH QUEUE RETURN SAY SELECT SIGNAL THEN TRACE WHEN This list includes all instruction keywords and a few others. @endnode @node Exte2 "COMMAND RETURN CODES" In addition to performing some action, commands to an operating system or application program typically return a value, which is often numeric, but not necessarily so. The nature and meaning of this value is, of course, completely dependent on the particular command. Numeric values returned by commands are often called return codes, and they often give some clue as to the success of failure of a command. (For instance, in MS-DOS this value is called the error level.) Rules for the form of the value and how it is returned to REXX are implementation dependent. However, REXX does maintain a special variable, RC, to receive whatever is passed back as the return code. The RC variable is similar to (but not to be confused with) the RESULT special variable which is set after any CALL to an internal, external, or built-in function. That is, you should not use RC for your own data as it will be changed after any command invocation. (It may possibly even become undefined if a command does not produce a return code.) If you know the return codes which can be produced by the commands you issue, you may be able to determine whether the command did what you wanted it to [do]. Return codes don't always indicate an error condition, however. Sometimes the return code provides system or application information. For instance, the SENTRIES command in VM/CMS places the number of lines in the external data queue into its return code. A REXX program that is invoked by the operating system may be able to pass a return code back to the operating system, depending on the details of the interface provided. This is usually done by means of the RETURN or EXIT instruction that terminates the program. The treatment of this value by the interface depends on how the program was invoked. If the program was invoked as an external function, the value is simply the value of the function. But if the program was invoked as a system command, the value will be treated as a return code. If the value is to be a return code, it may be required to be numeric, with non-numeric values being ignored. Another way that a command issued from a REXX program can affect the program is by possibly raising an ERROR or FAILURE condition. These, in turn, may or may not be related to the command's return code. Of course, the details of how a command indicates such a condition, as well as the meaning of doing so, are highly implementation-dependent. The REXX language merely provides for such things to be used if desired. An ERROR or FAILURE condition is one of a small number of exceptional conditions recognized by the language. Other conditions include HALT, NOVALUE, and SYNTAX. The recognition and handling of such conditions is under the control of the SIGNAL instruction, which will be discussed later in more detail. SIGNAL simply provides a way for such conditions to be enabled or disabled and to specify what should be done if one occurs. The ERROR condition is intended to indicate that the command did not function entirely as expected, but at least a good attempt was made. This may or may not be reflected in the command's return code. In some operating systems like VM/CMS, a positive return code is actually defined to mean that an ERROR condition should be raised. This can be a problem, as some commands (like the aforementioned SENTRIES) use the return code to pass back information unrelated to success or failure. The ERROR condition is usually not enabled by default. If you decide to enable and provide a handler for it, you should test very carefully that it does not get raised when you don't expect it. Ordinarily you would enable it only in circumstances where unanticipated command errors must not go unnoticed. The FAILURE condition is considered to be more serious. It would indicate that a command could not be executed at all, for reasons external to the command. Possibly the command could not be found or there was not enough memory to run the command. This, too, may or may not be related to a return code. VM/CMS, for instance, tends to use a negative return code as an indication of FAILURE. The FAILURE condition is often enabled by default, which means that if you do not disable it or provide a handler for it, the entire REXX program may be terminated. Given that FAILURE is intended to indicate something seriously amiss, this is not unreasonable. However, if your programs have to be very robust and capable of handling any malfunction, you will need to provide a handler for the FAILURE condition. After all, you would need to detect and deal with the problem anyway. Unfortunately, the REXX language doesn't offer any further information to indicate the more precise nature of the difficulty. @endnode @node Char "7: CHARACTER STRING HANDLING" One of the key strengths of the REXX language is its character string handling abilities. These are inherent in the design of the language, since there are no explicit data types, and all data is represented internally as strings. The most freqently used string operation - concatenation - is so common that REXX does it implicitly whenever two symbols or literals occur together in an expression and are not separated by an operator or special character. As we have seen many times, there are two forms of implicit concatenation: with and without an intervening blank. Concatenation can also be represented by the "||" operator. This is necessary only when the syntax of the expression would otherwise be ambiguous. For instance, 'abc'x is a hexadecimal literal constant rather than a concatenation of the literal 'abc' with the value of the symbol x. To represent the latter you would have to use 'abc'||x. Similarly, 'abc'(x+3) is interpreted as a reference to a built-in or external function with the argument x+3. You have to use 'abc'||(x+3) to mean the concatenation of 'abc' with the value of x+3. The other kind of elementary string operation that has its own operator symbol is comparison. The ordinary comparison operators ("=", ">", "<", "\\=", "\\>", "\\<", etc.) operate on character strings using the standard collating sequence of the computer, but only after stripping off leading and trailing blanks. This is in keeping with the philosophy of REXX to do the most natural thing under the circumstances, because blanks are often nonsignificant. In case blanks are significant, or if the strings might be (incorrectly) interpreted as numbers, there are corresponding strict comparison operators ("==", ">>", "<<", "\\==", "\\>>", "\\<<", etc.). @{" STRING HANDLING BUILT-IN FUNCTIONS " link Char1} @{" STRING-ORIENTED FUNCTIONS " link Char2} @{" WORD-ORIENTED FUNCTIONS " link Char3} @{" OTHER STRING MANIPULATION FUNCTIONS " link Char4} @endnode @node Char1 "STRING HANDLING BUILT-IN FUNCTIONS" Perhaps the most powerful of all of REXX's character string manipulation features are to be found in the built-in functions. There are many of them. Some perform very obvious operations, like SUBSTR(), POS(), and LENGTH(). But there are a number of other functions that have some delightfully nonobvious uses. We're going to take a general overview of the string handling functions. Then we'll have a look at some of the more interesting and nonobvious ones. A great deal is sometimes made of the power of the PARSE instruction for manipulating character strings. However, since one can deal with character strings on a character-at-a-time basis, the built-in functions can in principle be used to do anything [that] PARSE can do, and much more. Well over half of the standard REXX built-in functions are for string handling. Within this group, several subgroups can be identified informally. We'll use this classification: Character-oriented Functions ABBREV CENTER COMPARE COPIES DELSTR INSERT LASPOS LEFT LENGTH OVERLAY POS REVERSE RIGHT SPACE STRIP SUBSTR TRANSLATE VERIFY XRANGE Word-oriented Functions DELWORD SUBWORD WORD WORDINDEX WORDLENGTH WORDPOS WORDS String Format Conversion B2X C2D C2X D2C D2X X2B X2C X2D Bitwise Operations BITAND BITOR BITXOR As we discuss these functions, we will observe some regularities in the use of built-in functions in general. Usually these are conventions and not formal parts of the language. However, it is a good idea to take note of them, since for consistency it is a good idea to use similar conventions in functions that you yourself write. One of the standard conventions is that procedures may have both required and optional arguments. If a required argument is omitted, a syntax error will be generated. A syntax error will also occur if too many arguments are specified. User-written procedures can test for the presence or absence of specific arguments with the ARG() built-in function, but they are responsible for enfocing a particular argument to be required. In the standard built-in functions, it is always the case that all required arguments precede all optional arguments. When describing built-in functions we will use notation like this: STRIP(string, [option], [char]) This shows that the STRIP() built-in functions takes up to three arguments. The first argument is required. The brackets around the second and third arguments indicate that they are optional and may be omitted. If one or both arguments are omitted, it it understood that the comma preceding them may also be omitted, providing no additional arguments follow (though it would not be an error to leave it in). Whenever an argument is optional and may be omitted, some specific default value will always be used. Another convention is that many built-in functions have on or more arguments which are options. Such arguments are character strings which affect how the function operates in some way or another. Example, the second argument of STRIP() is an option that indicates whether characters are to be stripped from a string in the Leading, Trailing, or Both positions. This option, if used, should be specified as a quoted string, but only the first character is significant, and it may be in either upper- or lowercase. So, the following statements all have the effect of stripping leading and trailing blanks from a string: STRIP(string, 'b') STRIP(string, 'B') STRIP(string, 'blanks') STRIP(string, 'Both') STRIP(string, 'boing!') A number of string handling functions take a particular sort of argument called a pad character. For instance, the COMPARE() function, which returns the number of the first position in which its first two argument strings differ, can optionally specify a pad character, which is considered to be appended to the shorter of the two strings before comparison, in case they are of different lengths. Whenever a pad character is used, it must always be exactly one character long. (It should also be quoted, so [that] it is not misinterpreted as a symbol.) Many of the string handling functions refer to relative positions in a string, either relative characters or relative words. Such positions are always numbered (from the left), starting with 1. A position number that is negative or zero is almost always invalid and will cause an error. A position number must also be a whole number in the sense that it may not contain a nonzero fractional part, nor may it be so large that it contains more ditis than specified in the current NUMERIC DIGITS setting. String handling functions often take arguments which represent string lengths also. Such arguments must likewise be nonnegative whole numbers, though they may be zero. @endnode @node Char2 "STRING-ORIENTED FUNCTIONS" With these preliminaries out of the way, let's turn first to the character-oriented string functions. We have encountered a number of them already, such as SUBSTR(), RIGHT(), LEFT(), LENGTH(), POS(), and STRIP(), because they are quite heavily used in REXX programs. SUBSTR() and POS() are possibly the most frequently used string manipulation functions. The syntax of SUBSTR() is: SUBSTR(string, start, [length], [pad]) string is the string to be operated on. start is the position of the first character to be extracted from string. This position could be beyond the end of the string, in which case only pad characters would be returned (or a null string). The result returned will be length characters long. If length is not specified, the default is the number of characters remaining in string (if any). Finally, if length is specified and start + length is beyond the end of the string, the result will be padded with the pad character to the specified length. The default for pad is a blank. The syntax of POS is: POS(target, string, [start]) Here, string is to be searched from left to right for a substring target. The search begins at the character position start (which defaults to 1). The match must be exact, in which case the function returns the position in string at which the first occurrence of target was found. If target is not found, the function returns 0. For a simple illustration of these functions, we will consider a particular problem. The problem is analyzing the contents of electronic mail files. Typically a mail file begins with a header such as: To: David Hilbert (hilbert\@gottingen.edu) From: Albert Einsten = length & string == string == LEFT(keyword,, LENGTH(string) Of course, just because REXX provides a built-in function that seems to do just what you want, it does not follow that the obvious way is the most efficient or the easiest to maintain. Suppose that you want to test a user-entered command against a list of valid commands, allowing for abbreviation. Let's assume that the variable command is a verb entered by the user, has been lowercased, stripped of leading and trailing blanks, and isn't entirely blank. Then one way to identify it would be with a loop and ABBREV(): command.1 = 'copy' command.2 = 'erase' command.3 = 'rename' command.4 = 'print' full_command = '' DO i = 1 TO 4 IF ABBREV(command.i, input) THEN DO full_command = command.i LEAVE END END But there are other ways, and they may be faster because they use the power of some built-in function to avoid an explicit loop. One trick which often comes in handy when working with relatively short lists of words is to keep them all in a single string, separated by blanks (or possibly other characters). For instance, the following code is equivalent to the above: list = ' copy erase rename print' j = POS(' 'command, list) IF j \\= 0 THEN full_command = WORD(SUBSTR(list, j), 1) In general, you will get best performance when you can find a way to do a given operation by taking advantage of an existing built-in function rather than by using a loop. Another operation that needs to be performed frequently with user input is checking that the data entered is valid, ie. belongs to an appropriate range of values. For example, if a user enters the name of a file to be created, you ought to check first that the name is legal. Rules for what constitutes a legal filename vary from system to system, but there is usually a set of characters which are invalid in filenames. You might put all these characters in a string and use the VERIFY() function to test for them: invalid = '<>\\,;"^|()' IF VERIFY(filename, invalid, 'm') \\= 0 THEN SAY filename 'is not a valid name!' The syntax of VERIFY() is: VERIFY(string, search, [option], [start]) where string is to be searched for characters from search. If option is 'm' ("match"), the function returns the relative position in string of the first character occurring in search that is found. Otherwise, it returns 0. start optionally specifies the position in string where the search is to begin. Sometimes you may wish to limit input to a specific subset of characters. For instance, you might want to ensure that filenames consist only of upper- and lowercase alphabetics, numerals, and a few special characters, even if the system would allow others. then you might do: valid = XRANGE('a','z') || XRANGE('A','Z') ||, XRANGE('0','9') || '-_\@#$' IF VERIFY(filename, valid, 'n') \\= 0 THEN SAY filename 'is not a valid name!' This uses [the] 'n' ("nomatch") option of VERIFY() (which is the default). It causes the function to return the position in string of the first character which is not in search, or 0 if all characters are in search. We should mention at this point that for some sorts of input validation the DATATYPE() function can be very useful. It allows you to ensure that strings are of some rquired type, such as valid numbers, all alphanumeric, all lowercase, etc. DATATYPE() is fully described in @{"Chapter 13" link Arit}. Another common use of VERIFY() is in the processing of natural language test, where you want to process one word at a time. The word-oriented functions of REXX are not really applicable, since they recognize only blanks as delimiters. In real text, words are delimited with various punctuation characters as well. So a word processing loop might be constructed as follows: /* process individual words in a file */ DO WHILE LINES(file) lines = LINEIN(file) /* process each line */ DO WHILE line \\= '' /* search for word delimiters */ i = VERIFY(line, ' ,.;:"?!()', 'm') IF i = 0 THEN DO word = line line = '' END ELSE DO word = LEFT(line, i-1) line = SUBSTR(line, i+1) END /* process "word" */ ... END END This program reads a file line by line. For each line, the VERIFY() function is used to identify individual words by searching for common punctuation characters. There are five other character-oriented string functions we have not discussed. If you do much string handling, the usefulness of these functions will be obvious. The CENTER() function (the spelling CENTRE() is also recognized) allows you to position one string in the middle of a field of a certain size, or to extract the central characters of a string. Its syntax is: CENTER(string, length, [pad]) If length is greater than the length of string, then enough pad characters are added to the beginning and end to centre string in the field. The default pad character is a blank. This is useful for formatting title lines on a page, for instance. If length is less than the length of string, the specified number of characters are extracted from the centre of string. The COMPARE() function can be used to find the position at which two strings first differ. Its syntax is: COMPARE(string1, string2, [pad]) The function returns the index of the first difference. If one string is shorter than the other, it is extended on the right with pad characters (default: blank). If the two strings are identical (after padding), the function returns 0. The INSERT() function inserts one string at a certain position in another. Its syntax is: INSERT(string1, string2, [pos], [length], [pad]) string1 is the string to be inserted in string2. It is inserted after the position specified by pos. The default for pos is 0, which means to insert at the beginning of string2. string1 can optionally be truncated to length characters or extended by padding on the right with pad. The OVERLAY() function is similar to INSERT(), except that it replaces characters in one string starting at a certain position with characters from another string. Its syntax is: OVERLAY(string1, string2, [pos], [length, [pad]) string1 replaces the characters of string2 starting at the position specified by pos, which has a default of 1. string1 can optionally be truncated tyo length characters or extended by padding on the right with pad. Though it's not obvious at first, the REVERSE() function can come in quite handy at times. It simply reverses the order of characters in a given string. The function often helps when it is desirable to process the characters of a string from right to left instead of [from] left to right. Its syntax is simply: REVERSE(string) @endnode @node Char3 "WORD-ORIENTED FUNCTIONS" The next group of functions are the word-oriented string functions. These functions treat strings as a sequence of blank-delimited words. Any other punctuation charcters are not treated as delimiters, and will be treated as part of "words". This includes even white space characters such as tabs and line feeds, which languages resembling C tend to count as equivalent to blanks. Multiple blanks are treated as a single blank. In general, the word-oriented functions are consistent with the PARSE instruction in the way they view strings as a sequence of words. When additional characters actually need to be treated as delimiters, the TRANSLATE() function can be used to convert them to blanks. The two most important word-oriented functions are WORD() and WORDs(). The first of these has the general form: WORD(string, n) where string is the string to be operated on. The function returns the @{i}n@{ui}th blank-delimited word in string. The function WORDS(), whose form is: WORDS(string) returns the total number of blank-delimited words in the string. You may find the word-oriented functions [to be] of some utility if you write programs that interact with a user through command dialogues or in in a pseudo-natural language manner. However, the character-oriented functions seem to be more generally useful than the word-oriented functions even though (or maybe becuase) they operate at a slightly lower level. Examples of the user of WORD() and WORDS() may be found in the program that displays the time in English, in @{"Chapter 2" link Over}, and in the program which searches through a file for lines containing one or more lines from a list, in @{"Chapter 3" link Stru}. Let's look at one more example, which will illustrate a couple more of the word-oriented functions in a natural application. The example is an Eliza program. This is a program that simulates a natural language conversation by means of very simple tricks. The original Eliza program was written by Joseph Weizenbaum to show how the Turing test for computer intelligence could be misleading. The Turing test is the notion that if a person cannot detect that he or she is interacting with a computer in a converstation, then the computer must be in some sense "intelligent". What Weizenbaum's Eliza program showed was that a program could really be quite unintelligent and still carry on a credible conversation. (Did he get this idea at a faculty cocktail party?) The original Eliza program simulated a Rogerian therapist in conversation with a patient. Though "unintelligent", it was quite elaborate in that it had a large number of possible conversational responses built in. The program demonstrated an ability to deceive some users, in a way that was disconcerting to Weizenbaum. (Little known fact: about the time [that] he concocted Eliza, Weizenbaum was teaching beginning programming classes in a language called MAD.) Our example is of necessity much less elaborate, since such a program draws its power to convince more from a large repertoire of conversational gambits than from programming subtlety. If you wish to experiment with it, you can make it much more interesting by adding new responses. It might be even more amusing if you choose a conversational paradigm other than psychoanalyst/patient - some singles' bar dialogue, perhaps. The program consists of three major parts. The first part is the main loop. It reads a response one line at a time from the "client". A double use of the TRANSLATE() fcuntion, similar to previous examples, is done to convert the input to lowercase and remove punctuation characters. The result is scanned using WORDPOS() for the occurrence of any of a number of certain trigger phrases. If one of the phrases is found, a plausible reply based on the input is constructed in the REPLY procedure. If no matching phrase is found, a random noncommittal response is generated. /********************************/ /* Sample Eliza program in REXX */ /********************************/ CALL initialize SAY "Hello, what's on your mind today?" /* main processing loop */ DO FOREVER /* read user input */ PARSE PULL sentence IF sentence = '' THEN LEAVE /* translate to lowercase and remove punctuation */ lower_sentence = TRANSLATE(TRANSLATE(sentence,, XRANGE('a', 'z'), XRANGE('A', 'Z')), ,, ".,?!;:") /* search for trigger phrases */ DO i = 1 WHILE phrase.i \\= '' j = WORDPOS(phrase.i, lower_sentence) IF j \\= 0 THEN DO SAY reply(i, j, sentence) LEAVE END END /* if no trigger found, give a random response */ IF phrase.i = '' THEN DO j = RANDOM(response_count-1) + 101 SAY response.j END END EXIT The second part of the program is the subroutine that constructs randomly chosen responses to the client's last remark. /***********************************************/ /* Reply to current sentence */ /* First argument: number of trigger phrase */ /* Second argument: word position of match */ /* Third argument: complete user input */ /***********************************************/ reply: PROCEDURE EXPOSE responses. response. phrase. PARSE ARG phrase_number, position, sentence i = WORDS(responses.phrase_number) j = WORD(responses.phrase_number, RANDOM(1, i)) reply = response.j /* if the prototype reply contains "_", substitute rest of input */ i = POS('_', reply) IF i \\= 0 THEN RETURN LEFT(reply, i-1) || SUBWORD(sentence,, position + WORDS(phrase.phrase_numer)) ||, SUBSTR(reply, i+1) /* if the prototype reply contains "$", subtitute trigger word */ i = POS('$', reply) IF i \\= 0 THEN RETURN LEFT(reply, i-1) || WORD(sentence,, position) || SUBSTR(reply, i+1) /* if the prototype reply contains "#", subsitute word after trigger */ i = POS('#', reply) IF i \\= 0 THEN RETURN LEFT(reply, i-1) || WORD(sentence,, position+1) || SUBSTR(reply, i+1) /* prototype has no substitution symbols */ RETURN reply The third part of the program sets up the tables of trigger phrases and possible responses. /**********************************************/ /* Subroutine called to set up the vocabulary */ /**********************************************/ initialize: phrase. = "" phrase.1 = "i am" responses.1 = '1 2 6' phrase.2 = "i'm" responses.2 = '1 2 6' phrase.3 = "i have" responses.3 = '3 5' phrase.4 = "i hate" responses.4 = '4 10' phrase.5 = "i want" responses.5 = '5 11' phrase.6 = "i need" responses.6 = '5 11' phrase.7 = "i like" responses.7 = '5 12' phrase.8 = "mother" responses.8 = '8' phrase.9 = "father" responses.9 = '8' phrase.10 = "sister" responses.10 = '8' phrase.11 = "brother" responses.11 = '8' phrase.12 = "wife" responses.12 = '8' phrase.13 = "husband" responses.13 = '8' phrase.14 = "my" responses.14 = '9' /* "_" will be replaced by a phrase */ /* "$" will be replaced by first word of trigger */ /* "#" will be replaced by second word of trigger */ response.1 = "How long have you been _?" response.2 = "Why do you say that you are _?" response.3 = "How do you feel about having _?" response.4 = "When did you first realize you hated _?" response.5 = "What does having _ mean to you?" response.6 = "Are you sure you are _?" response.7 = "When did you start to _?" response.8 = "Tell me more about your $." response.9 = "Does your # both you much?" response.10 = "Why do you hate _?" response.11 = "Why do you want _?" response.12 = "Why do you like _?" response.101 = "Do you feel any better now?" response.102 = "Why do you say that?" response.103 = "How long have you felt that way?" response.104 = "Do you think that's really true?" response.105 = "How do you feel about that?" response.106 = "That's interesting." response.107 = "Please go." response.108 = "Tell me more about that." response_count = 8 RETURN The syntax of WORDPOS() is: WORDPOS(phrase, string, [start]) Here phrase is a string that contains the words to be searched for, and string is the string of words to be searched. Blanks are the delimiters of individual words, and multiple blanks are treated as one. phrase matches a part of string, provided the words in phrase occur in string in the same order. The function is case-sensitive. If no match is found, a value of 0 is returned. Otherwise, the function returns the number of the word (not the character) at which the first match was found. Optionally, the word position iat which to begin the searched can be specified in start. The REPLY procedure illustrates a number of string handling functions, such as POS(), LEFT(), and SUBSTR(). It also introduces the SUBWORD() function, which has the syntax: SUBWORD(string, start, [length]) SUBWORD() is the exact analogue of SUBSTR(), operating on a string of words rather than a string of characters, just as WORDPOS() is the analogue of POS(). string is the string of words to be operated on, and start is the position of the first word to be selected. length is the number of words to be selected. If it is omitted, the default is the rest of the words in the string. The function returns up to length words beginning at the start position. If start is greater than the number of words in the string, a null string is returned. Blanks within the string of subwords are retained, but leading and trailing blanks are omitted. Some of the remaining word-oriented functions can also be understood as analogues of the character-oriented functions. For instance, WORDS() is analogous to LENGTH(), and DELWORD() is analogous to DELSTR(). @endnode @node Char4 "OTHER STRING MANIPULATION FUNCTIONS" The remaining string manipulation built-in functions fall into two categories, both of which address somewhat lower-level concerns than we have dealt with so far. The first of these is string format conversion. Although in one sense all data in REXX consists of strings, the same strings can be regarded as representing different things, and different strings can represent the same thing. For instance, consider the string '299792'. It is just a way of representing a particular decimal number as a character string. For some purposes, it may be necessary to represent the same decimal number in different ways. Perhaps the number has to be written to afile so it can be read by another program. Then we have to know in what internal format the other program expects to find the number. This is not a simple problem, because there are many possible equivalent formats, and it is not always easy to find out what format any given program is expecting to handle. Many programs written in languages other than REXX expect to deal with numbers in a computer-specific internal format, because this usually takes less space. To start with a simple example, the decimal number is represented internally in most computers in exactly the same way as the REXX literal 'ff'x is. Specifically, this is a single byte of data in which all 8 bits of the byte are 1s. If you need to write this number into a file so that it can be read by a program which is expecting this internal format, you must write 'ff'x rather than 255, as could be done with the statement: CALL CHAROUT file, 'ff'x Of course, this only works for a single value. So REXX provides a way to produce the internal representation of an arbitrary decimal number: the D2C() built-in function. Very confusingly, the internal representation is called the character representation of the number. (You may with much justice think that '255' ought to be called the character representation, but - sorry - that is just not the terminology adopted by REXX.) Accordingly, REXX uses D2C() as the name of the function that converts "decimal to character". So, if you want to write a number to a file in the correct internal format, you would use something like: CALL CHAROUT file, D2C(number) We say "something like" this, because there is another problem to deal with when we need to work with the internal form of numbers. The problem is that implicit assumptions are always made about the size of a number in internal form. Most computers are capable of dealing with numbers occupying 1, 2, or 4 bytes, and sometimes other sizes as well. All this ambiguity over what internal form to use for a number is precisely the kind of problem that REXX itself hides by consistently using a string representation. But when you have to exchange data with programs written in other languages, you simply have to face up to the difficulty. Anyway, the D2C() function allows you to be explicit about the size of the number in internal form, by allowing this length to be specified as the second argument. Thus the full syntax of D2C() is: D2C(number, length) Here, length is the length in bytes of the required internal form. Although the length argument is (sometimes) optional, you should always provide it with D2C(); otherwise, REXX produces a result which is just as long as it needs to be, but no longer. Thus D2C(255) is 'ff'x (1 byte), but D2C(256) is '0100'x (2 bytes). Most programs which process data will expect a certain definite size, so you should be careful to explicitly provide the right size, and accordingly write: CALL CHAROUT file, D2C(number, 2) (for instance). Another reason for always including the length in D2C() is that it is required if number just happens to be negative. D2C() always right-justifies the internal form. That is, if the requested length is longer than required, extra bytes of '00'x are added on the left for positive numbers, and extra bytes of 'ff'x are added on the left for negative numbers. (This is called sign extension, because the high-order bit of the most significant nonzero byte is propagated to the left as far as [is] necessary.) If the requested length is too short, the result is truncated on the left. Consistent with this behaviour, D2C() always produces the internal form with the most significant bytes first, the way we normally think of them (numbers are written with the most significant digits first). This raises yet another problem, since many [lame] computers store number in an internal form with the most significant bytes last. So you are forced to be concerned with byte ordering as well when you use D2C(), and you must take proper steps if you are writing data that will be read by another program which counts on a particular byte ordering. Fortunately, this happens to be very easy to do with the REVERSE() function, which swaps the order of characters in a string. IF this were in fact necessary you would just change the example to: CALL CHAROUT file, REVERSE(D2C(number, 2)) One final thing to note about the D2C() function, and something that is true of other format conversion functions as well, is that the input must be in the proper form, or else an error will be generated. Just as you can only use valid numbers in numeric opations, you must provide only valid numbers as input to D2C() for conversion. In this case a valid number must be a whole number in the sense that it can actually be represented exactly as an integer given the current value of NUMERIC DIGITS. We have dwelt on D2C() at such length because it seems to present the trickiest problems of all the format conversion functions; yet it is needed in many practical situations. Once you've understood the foregoing, the rest is pretty simple. In particular, there is an inverse function, C2D(), for converting from the internal character format to an (ASCII or EBCDIC) numeric character string. You would use C2D() in situations inverse from those where you would use D2C(), ie. when you want to get data into a REXX program that has been produced externally. For instance: number = C2D(CHARIN(file, 2)) The syntax of C2D() is: C2D(data, [length]) Here, data is the internal form to be converted to a number. Length is the length of the internal form, and defaults to the length of data. As with D2C(), you may have to be careful of byte ordering: C2D() expects data to be ordered with the most significant bytes first. IF this is not the case, you have to apply REVERSE() to the internal form before calling C2D(). Also, consistent with D2C(), the result must be expressible as a whole number, that is, an integer that does not have more digits than the current setting of NUMERIC DIGITS. And as with D2C() there is a problem with the signs of numbers as well. This is similar to the mysteries in other languages of deciding whether to treat numbers as signed or unsigned integers. In this case, if length is not specified, the data is always assumed to be unsigned. Hence C2D('ff'x) is 255. However, if a length is specified, the sign of the result is determined from the sign bit of the internal form, ie. from the leftmost bit. So C2D('ff'x, 1) is -1, but C2D('ff'x, 2) is 255, because when length is specified the input data is right-justified in a field of specified length, truncating on the left, or padding with '00'x as required. All six of the remaining format conversion functions convert to or from hexadecimal representations, that is, to or from REXX strings that happen to be valid hexadecimal constants., Such strings can consist only of digits 0 through 9 and letters A through F (in upper- or lowercase), and possibly written in groups of even length separated by blanks. "01234 ABCD" is an example of such a string. Do not confuse hex strings like this with REXX hex literals like "01234 ABCD"x (which is a way of directly expressing a hex internal form). These functions for converting to or from hex strings are actually of much less general utility than C2D() and D2C(), unless you happen to be a professional programmer, since most real world data is not expressed in hex strings. Of course, if you are a programmer, you may find these functions useful for working with dumps or other kinds of program- generated debugging information. The conversion functions come in pairs. For converting between hex strings and bit strings, there are X2B() and B2X(). These are entirely straightforward, and merely effect a change in number base between base 2 and base 16. Thus, for instance, X2B('f0') is '11110000', and B2X('11110000' is 'F0' (note: not 'F0'x!). No length parameters are used with X2B() and B2X(), because there are no ambiguities involving signs that have to be handled. Likewise, there are no additional problems in understanding X2C() and C2X() for conversion between internal character format and hex strings, once you've mastered the terminology. For instance, C2X('313233'x) is '313233' and X2C('313233') is '313233'x. Again, no length parameters are used because none are necessary. The only challenge presented by X2D() and D2X() is the handling of signs, just as with C2D() and D2C(). So this pair of functions does have a length parameter. Once you've understood what D2C() does, D2C() is easy. For instance, D2C(255) is 'FF'x, while D2X(255) is 'FF'. C2D('ff'x) and X2D('ff') both have the value 255. The length parameter in X2D() and D2X() performs just the same function as it does in C2D() and D2C(), which is to handle ambiguous cases of signed numbers. In D2X(), the length argument is the number of characters of the result, and it is required if the first argument of D2X() is negative. For instance, D2X(-1,1) is 'F', D2X(-1,2) is 'FF', D2X(-1,3) is 'FFF', and so son, while D2X(1,1) is '1', D2X(1,2) is '01', and D2X(1,3) is '001'. When the first argument of D2X() is nonnegative, the default length of the result is such that there are no leading 0s. Truncation or padding on the left with '0' or 'F' is done if necessary, according to the requested length. Similarly, X2D() uses the length argument to determine whether to produce a positive or negative result. So X2D('ff',2) is -1, but X2D('ff',3) is 255, since 'ff' is padded on the left with '0' to produce '0ff' before conversion. All of the functions that convert to hex string format, ie. B2X(), C2X(), and D2X(), produce results that use uppercase values of A-F and contain no embedded blanks. The functions BITAND(), BITOR(), and BITXOR() are also low-level string- manipulation functions, for a very different purpose - bitwise logical operations on strings. They are, like most other low-level functions, usually dependent on internal data formats. @endnode @node Pars "8: THE PARSE INSTRUCTION" PARSE is a complex and multifaceted instruction. Although many of its simpler uses include such diverse opertions as identifying subroutine arguments and reading data from the user or a file, it also offers a flexible and moderately powerful character-string analysis capability Mant tedious string operations can be done witih PARSE that could also be done with the string handling funuctions, though much less efficiently and elegantly. Because PARSE is used for so much besides string handling, and at the same time indroduces so many new ways of working with strings, it is fully deserving of a chapter by itself. As a string handling facility, PARSE may be thought of as the inverse of the concatenation operation, since it takes strings apart rather than putting them together. Although PARSE can operate on strings as individual characters, in its simplest and most commonnly used forms it is like the word-oriented character-sring functions in that it views a string as a blank-delimited sequence of words. Indeed, the errules for separating out individual words are the same, since only blanks count as delimiters, and multiple blanks are (usually) equivalent to a single one. PARSE is perhaps most frequently used disguised as the ARG and PULL instructions, which are short for PARSE UPPER ARG and PARSE UPPER PULL, respectively. ARG is normally used to receive arrguments in subroutines, and PULL is used to read information fromm the uuser. Our discussion of PARSE therefore includess these very common instructions as special cases. The high level syntax of PARSE is: PARSE [UPPR] source [template] UPPER is an optional flag that indicates the data taken from source is to be converted to uppercase before further use. (Curiously, there is currently no symmetric LOWER flag for PARSE.) @{" SOURCES OF INPUT TO PARSE " link Pars1} @{" PARSE TEMPLATES: SIMPLEST CASE " link Pars2} @{" PATTERN MATCHING IN TEMPLATES " link Pars3} @{" POSITIONAL PATTERNNS IN TEMPLATES " link Pars4} @{" VARIABLE PATTERNS " link Pars5} @{" PARSING PROCEDURE ARGUMENTS " link Pars6} @{" PARSE IN RELATION TO OTHER FORMS OF STRING MANIPULATION " link Pars7} @{" PRACTICAL EXAMPLES OF PARSING " link Pars8} @endnode @node Pars1 "SOURCES OF INPUT TO PARSE" There are a variety of sources for the data string that PARSE is to operate on, corresponding to how source is specified: ARG In this case (and this case alone) there may be multiple strings handled by PARSE. The strings are the arguments to the current procedure. For the main procedure there is usually just one argument string. But for other procedures there may be an arbitrary number of arguments. If there are multiple argumentts, they are separated from each other by commas in the procedure call. Commas are used in the template to indicate which part of the template applies to which argument. LINEIN The input string is the value of the LINEIN() function for the default input stream. Input streams, and I/O in general, are discussed in @{"Chapter 9" link Inpu}, but normally this means that the data is read directly from the user at the keyboard. If the default stream is the keyboard, this option will wait until the user enters a complete line and presses RETURN on the keyboard. LINEIN differs from PULL as a source in that it does not check the external data queue first. Unless it is specifically intended to pass data through the queue, it is preferable to use LINEIN rather than PULL to avoid confusion from data placed in the quueue for other purposes. PULL The external data queue is checked first for the presence of any data. If it is not empty, the next line ein the quueue is used as the input line for parsing. Otherwise, the line is taken from the default input stream, as with the LINEIN option. PULL should be used rather than LINEIN if you anticipate that you will need to pass input data to the REXX program using the external data queue. Doing this offers an alternative to redirecting the default input stream to a file in order to run the program without manual intervention. Neither [the] PULL nor LINEIN options will include delimiter characters such as [a] carriage return (if any), regardless of whether the line comes from an input stream or the queue. SOURCE The data string is one specially constructed to provide information about how the current program was invoked. The first word of the string should be the name of the operating system, eg. "AmigaDOS" or "PC-DOS" or "CMS". The second word should indicate how the program was invoked. It may be "COMMAND" if the program was run as [a] command by the operating system or application, "SUBROUTINE" if the program was CALLedd as an external procedure from another REXX program, or "FUNCTION" if the program was invoked as an external function. These alternatives will always refer to how the main procedure, rather than the currently active internal procedure, was invoked. The remainder of the data string is implementation dependent, but usually includes the actual name of the program file. Many operating systems allow blanks in filenames, so extra care may be needed in separating the filename from any further information that may be present. VALUE expression WITH The string to be parsed is the value of expression. WITH is a reserved word in this context and cannot be used as a variable name in expression, since it marks the end of the expression. Be careful, also, not to use WITH in connnection with any of the other types of source specificaation, as it will then be taken as part of the template. VAR name The data string is taken from the value of the variable name. The NOVALUE condition can be raised if it is enabled and name is not initialized. VERSION The data string is specially constructed to indicate the version of REXX and the language processor in use. It is therefore implementation-dependent. The first word will usually indicate which language processor is being used. The second word usually indicates the version of the REXX language; it may be used to test for the availability of certain language features (if you can figure out in which version a given feature was introduced). The next three words should contain the release date of the REXX processor, in the default format of the DATE() function (eg. 21 May 1991). @endnode @node Pars2 "PARSE TEMPLATES: SIMPLEST CASE" The remainder of the PARSE instruction, the template, is the most interesting part. It is optional. If the template is omitted, then all necessary steps are taken to construct the input string, but no parsing actually occurs. In practice, this means that input (or queue access) is performed for LINEIN and PULL, expression evaluation for VALUE, and data fetching for VAR. You might use PULL (or PARSE PULL) alone just to clear a line of input from the queue or standard input stream. But the instruction is ordinarily worth doing only if there is a template. The simplest and most common form of template is a list of variable names. We shall need a standard string to parse in many of the following examples, so we shall assume the assignment: x = "'Twas brillig, and the slithy toves" We could assign each word of this string to separate variables with the single instruction: PARSE VAR x x1 x2 x3 x4 x5 x6 and the effect would be the same as if we had said: x1 = "'Twas" x2 = "brillig," x3 = "and" x4 = "the" x5 = "slithy" x6 = "toves" This is in fact a useful technique for setting a large number of variables quickly if each needs to be set to a single word without leading or trailing blanks. It is usually more efficient than the equivalent series of assignment statements. The rule for parsing a striing when the template contains a sequence of consecutive variable names is to start with the first word of the string (in the sense used with the word-oriented, built-in string functions) and assign it to the first variable, assign the second word tto the second variable, and so forth. In other words, blanks (and only blanks) are treated as word delimiters and aree stripped out before assignments are made. A special rule applies in casae there are fewer variable names in the template than words in the string. In that case, the remainder of the string is assigned to the final variable. So if we had: PARSE VAR x x1 x2 x3 x4 x5 then x1, x2, x3, and x4 would be assigned as before, but now: x5 = "slithy toves" So the last variable in a list of variables in a template can be assigned more than one word. In fact, the last word can be assigned a string with both leading and trailing blanks. For instance, if we have: PARSE VALUE COPIES(' xxx ', 3) WITH a b c then we get: a = "xxx" b = "xxx" c = " xxx " This is because of the rule (which is not claimed to be intuitive!) that for each variable assigned except for the last, all preceding blanks and one trailing blanks in the PARSE string are consumed, but no preceding or trailing blanks are consumed in the assignment to the last variable. In the simplest case: PARSE VAR x y we have exactly the same effect as the assignment: y = x because no leading or trailing blanks will be stripped from x. Though this seems like a trivial example, it is essentially what happens with: PULL answer and you generally have to be careful to use the STRIP() function to remove surrounding blanks from answers before trying to match it with possible expected responses. In the opposite circumstance, when there are fewer words in the PARSE string than variable names in the template, all "excess" variables are assigned the null string. In general, PARSE will make assignments to all variables named in a template, and will use a null value if nothing else is appropriate. So, with: PARSE VALUE COPIES(" xxx ", 3) WITH a b c d we would get: a = "xxx" b = "xxx" c = "xxx" d = "" Notice that here blanks were stripped from the last word before assignment to c, since c was not the last variable in the list. To further get usedc to the rules for dealing with blanks at the end of a string, consider two cases. First: PARSE VALUE COPIES(' ', 2) WITH a simply assigns a string of 2 blanks, the "remainder" of the string, to a. Second: PARSE VALUE COPIES(' ', 2) WITH a b assigns a null string to both a and b. This is because the string is entirely consumed in trying to find a word to assign to a. So there is nothing to assign to a, and nothing left for b. Sometimes you will want to simply ignore words in a string whose format is known. To do that, just use a period instead of a variable name at the corresponding location in the template. So, if we know x contains a string consisting of 6 words: PARSE VAR x WITH first . . . . last yields first = "'Twas" last = "toves" and no other variables are affected. The period is a place holder which causes the same parsing as for a variable but without causing any assignment. It can be useful in avoiding some of the peculiarities which arise from the rules about the assignment to the last variable of a template. So, for instance, if you know that a string will contain a certain number of words but an unknown number of blanks between words, you can use: PARSE LINEIN x y z . to be sure that z does not contain leading and trailing blanks. Another common case is when you are interested in the first few words of a string. If all you care about is the first three words, then: PARSE LINEIN x y z . gets them for you, and you don't need to be concerned that z is assigned anything extraneous. Because PARSE, in the simple cases we have been considering, views a string as a sequence of blank-delimited words, it can often be used in place of the word-oriented string functions - and frequently it is much more efficient. In particular, if you are concerned about performance, be careful when using WORD() to examine each word of a long string. In a loop like this: DO i = 1 TO WORDS(text) x = WORD(text, i) ... END the text string has to be rescanned from the beginning every time through the loop. This will have a noticeable performance impact if text is even moderately long (even 80 characters or so). From a performance standpoint, a loop like: y = text DO WHILE y \\= '' PARSE VAR y x y ... END will be significantly faster in most implementations. This example is in fact a frequently used REXX idiom for extracting successive words of a string, because of its efficiency. The key part is the line: PARSE VAR y x y which places the first word of y into x and replaces y with the remainder of the string each time through the loop. (We first assigned text to a temporary variable because the string is entirely consumed in the parsing process and we might want to keep it around for some reason in its original form.) @endnode @node Pars3 "PATTERN MATCHING IN TEMPLATES" PARSE templates can do much more than just break a string apart into blank-delimited words. By explicitly specifying string patterns to search for, you can use arbitrary strings as delimiters. We have already seen simple cases like: PARSE VALUE TIME('l') with hours ':' minutes, ':' seconds '.' fraction The time('l') call gives the current time in the form hh:mm:ss.uuuuuu (uuuuuu is a fraction of a second in microseconds). PARSE scans the string looking for matches to literal strings given in the template. If all literals are found (in the order specified) in the string, then the string is broken into a number of substrings. Each substring is then parsed into words and assigned to any variables which are named in the template between pairs of literals. For instance, to use our original example: PARSE VAR x "'Twas" a b c "toves" we get the assignments: a = "brillig," b = "and" c = "the slithy" In other words, all of the rules for parsing strings into words and assigning the words to variables apply to the substring bounded by (but not including) pairs of literals in a template. You can even place two (or more) literals adjacent to each other in a template to bound a portion of a string which is to be ignored: PARSE VAR x a "brillig" "slithy" b yields: a = "'Twas " b = " toves" This example also illustrates another point, which is that a template is always assumed to start with a pattern that matches the beginning of the string, and to end with a pattern that matches the end of a string. Note, finally, the blanks at the end of a and [at] the beginning of b. These are not stripped since a and b, being the only variables corresponding to their respective substrings, are assigned the entire remainder of the substring. What if a literal pattern is not found in scanning a string? The answer is that this is not considered an error condition, and you cannot test directly for it. PARSE simply assigns the null string to all variables named after the literal string that was not found. The unfound literal pattern is treated as matching the end of the string. For example: PARSE VAR x a "and" b "kumquat" c yields: a = "'Twas brillig, " b = " the slithy toves" c = "" Since "kumquat" was not found in the string, it matches the end of the string. Therefore, b is assigned the remainder (everything after "and"), while c is assigned a null string. There is a special pattern-matching case which is handled in a way that may not seem intuitively obvious. This is the fact that a null string as a pattern in a template is considered not to match anything except the end of the string. Though perhaps surprising, this is consistent with other behaviour of a null string in REXX, such as the fact that it doesn't match any part of the string in the POS() built-in function. It may be useful if you need a way to force a match at the end fo the input string. Of course, this happens anyway if the template does not end with a pattern, but you might want to select use of this behaviour by ending a template with a variable pattern (discussed later) which could possibly be a null string. Because unfound patterns are considered to match the end of the input string, we can say that parsing is actually driven by the pattern matching. That is, the operation of parsing consists first of finding patterns in the string, and second of assigning the part of the string between two matched patterns to variables. When the template does not actually contain any patterns, as in the preceding section, it is assumed that the boundaries are the beginning and the end of the string. In the next section we shall see that pattern matches may be forced to occur at specific column locations in the input string. @endnode @node Pars4 "POSITIONAL PATTERNS IN TEMPLATES" There is one additional kind of pattern which can occur in a template. It is called a positional pattern, because it allows you to specify explicit character positions in a string. You may think of a positional pattern as a pattern of zero length which matches the input string at a certain position. The column position can be specified as either an absolute or a relative number. An absolute position is just an unsigned integer, while a relative position is an integer preceded (with zero or more intervening blanks) by a + or - sign. (For symmetry, you can use an = sign before an absolute position, if you like.) Obviously, counting character positions can be tedious in free-form text. The main purpose of positional patterns is for dealing with records that have fixed field sizes, so that absolute or relative positions are easily calculated; but there are other surprising uses as well. Let's consider absolute positional patterns first. Using the standard example: PARSE VAR x x1 x2 16 x3 x4 23 x5 yields: x1 = "'Twas" x2 = "brillig, " x3 = "and" x4 = "the" x5 = " slithy toves" since character position 16 is the "a" of "and" and position 23 is the blank after "the". What has happened is that the string was partitioned into three substrings: positions 1 to 15, 16 to 22, and 23 to the end. These were then parsed into variables as usual. Because positional patterns have zero length, all characters of the string wound up in on of the three substrings. Since the string starts at position 1, we would get the same results from: PARSE VAR x 1 x1 x2 16 x3 x4 23 x5 and, in fact, all templates can be considered to begin with the absolute positional pattern 1, which matches the first position of the string. To see the value of relative positional patterns, suppose we knew in advance that the string began with fields of widths 15 and 7. Then another equivalent way to write the same instruction would be: PARSE VAR x x1 x2 +15 x3 x4 +7 x5 wince 1 + 15 is 16 and 16 + 7 is 23. This way of writing the template has the advantage that you can think of the relative positions as field widths. So if you know what the field widths should be, you don't have to compute absolute column positions yourself, and the program will be easier to maintain if any field widths have to change. Although positional patterns appear reasonably straightforward from what we have seen so far, there are some special rules that make life more interesting. You may have noticed [that] it was never said that absolute positions must be specified in a template in increasing order. The very fact that negative relative positions can be specified means backing up is allowed. The first rule comes into effect whenever a positional pattern, either absolute or relative, specifies a position at or before the last specified position in the string. The rule is that when this happens, the variable just before the positional pattern (if any) is assigned the entire remainder of the string. Why? Well, ordinarily when a positional pattern is used, and it is to the right of the previous position, then the substring bounded by the two positions extends from the first up to (but not including) the second. Consequently, a variable which precedes the second positional pattern is assigned characters up to (but not including) the second position. For instance: PARSE VALUE '123456789' WITH 2 x 6 y yields: x = '2345' y = '6789' But this way of looking at things breaks down when the second position is before the first position, since that would entail a result of negative length. The rule resolves this problem by making the end of the string, rather than the second position, be the right-hand limit of the substring. Thereafter, parsing resumes at the new position. Thus: PARSE VALUE '123456789' WITH 2 x 1 y would be rather hard to make sense of if we didn't have the rule, which yields: x = '23456789' y = '123456789' Far from merely resolving a difficulty, this rule adds an interesting capability to PARSE. Since it specifically includes the case where the second positional pattern is the same as the preceding one, the rule means [that] the same string can be parsed many times, each in a different way. For instance: PARSE VALUE '123456789' WITH 1 x1 1 x2 6 x3 1 x4 4 x5, 7 x6 yields: x1 = '123456789' x2 = '12345' x3 = '6789' x4 = '123' x5 = '456' x6 = '789' As a special case, this provides a way to initialize a number of variables to the same value with a single statement, eg: PARSE VALUE TIME() WITH 1 t1 1 t2 1 t3 1 t4 1 t5 sets each of t1, t2, t3, t4, [and] t5 to the current time. This is more efficient than a series of assignment statement. And, as an added advantage, it is guaranteed that exactly the same [time] value is assigned to each variable, since the function is evaluated only once. But PARSE has still more surprises in store for us. The second special parsing rule we will consider is how positional and string patterns interact. Although we have avoided giving any examples yet, there are no rulesthat say positional and string patterns cannot be mixed in the same template. The fact is, they can be. Indeed, any mixture of variable names, string patterns, and positional patterns can be used in a template, in any order at all. The only question is how to interpret all the possible cases. To begin with, REXX provides that every string pattern sets a position in the string which is equivalent to a positional position. That position is, specifically, the position of the first matching character of the string pattern. It could be the end of the string if the pattern is not found, since an unfound string pattern is considered to match the end of the string. This position is then the previous position used when the next pattern is positional. For example: x = 'name: Fred age: 35' PARSE VAR x 'age:' age +7 yields: age = 'age: 35' Note that the substring bounded by a string pattern followed by a relative positional pattern specifically includes the string pattern itself; the string pattern is not stripped out in this case. This is just part of the rule; it does not follow from anything else. What if you didn't want the string pattern to be included? Then use: PARSE VAR x 'age:' +4 age +3 Remember: there's nothing that says two patterns can't be adjacent in a template. Sometimes it's actually useful that they can be. At this point, we have to note a minor inconsistency between the way that aboluste and relative positional patterns work. It arises when a string pattern is followed by a positional pattern. Suppose we have a record format where a character string is followed by a field of fixed width. For instance: x = "Invoice number: 012345" The whole field, counting the tag it begins with, is 22 characters wide. Hence: PARSE VAR x "Invoice number:" inv_no 23 will correctly set the variable inv_no with the number: inv_no = " 012345" The tag is stripped out because it matches part of the data. Yet the apparently equivalent: PARSE VAR x "Invoice number:" inv_no +22 causes the assignment: inv_no = "Invoice number: 012345" because it is defined that the string pattern is included in what is assigned to the variable when followed by a relative positional pattern. Though this is an inconsistency, it isn't very likely to occur in practice, because you aren't very likely to use an absolute positional pattern following a literal pattern - use of the string pattern generally implies a free-form data layout where absolute column numbers are not relevant. If for some reason you wanted to do this and you wanted the literal not to be stripped out of the value assigned, you could do: PARSE VAR x "Invoice number:" +0 inv_no 23 It is more likely that you would use another literal pattern in a case like this, perhaps: PARSE VAR x "Invoice number:" +0 inv_no "Date:" Either way, by inserting an extra relative positional pattern of +0 we have defeated the effect of stripping the matched string from the value assigned to te variable, so that we get: inv_no = "Invoice_number: 012345" @endnode @node Pars5 "VARIABLE PATTERNS" Sometimes the literal or positional patterns needed in a template are not known in advance and must be computed at the time a program is run. It is even possible for strings to be self-defining in the sense that they contain information about the location or size of fields which they contain. In order to handle situations like this, REXX allows both literal and positional patterns to be specified by the values of variables. The variables can even be set in the same PARSE instruction where they will later be used to specify patterns. Some syntactic device is necessary in order to distinguish variables which contain pattern information from those that are to be assigned values during the parsing process. That device is called a variable reference, and it consists of the variable name enclosed in parentheses. If this is to be used as a string pattern, that is all that is required. If it is to be used as a positional pattern, then the reference should be preceded by "=" (for an absolute position), or "+" or "-" (for a relative position). For instance, we could parse dates written in any of the forms "mm/dd/yy", "mm-dd-yy", or "mm.dd.yy" with something like: datesep = "/" PARSE VAR date month (datesep) day (datesep) year Unfortunately, we can't handle all three alternatives at once. In other words, the program must decide in advance what will be used as the punctuation character. there is nothing in REXX that permits matching on any one pattern from a list of alternatives, nor is there any capability for using something like regular expressions. In this respect, the PARSE instruction is considerably weaker than other pattern-matching paradigms. To see an example of variable positional patterns, let's consider the case of a record that consists of an arbitrary number of repeated data items. Each data item is of variable size. Suppose that the first three characters of each item contain the length of the remainder of the item. Then a loop like: DO i = 1 BY 1 WHILE record \\= '' PARSE VAR record length+3 data.i +(length) record END would neatly break the whole record into individual items assigned to data.1, data.2, etc. Because of the + before (length), the variable reference is taken to be a relative positional pattern. The value assigned to length in the first part of the template must be numeric or an error will result. The end of the template specifies that everything left over replaces the original value of record. Although this example is simple and elegant, it does not handle errors well. Since length must be a numeric value, the program will encounter a serious error if the input data inadvertently contains a nonnumeric value at the beginning of a field. It might be better in practice, therefore, to extract length first and test it for validity, so at least a sensible error message can be issued in case of a problem. It is possible to get into trouble using variable patterns in which the variables are set during the PARSE instruction itself. For instance, suppose that you want to have a record where the first two items are the starting and ending column numbers of a third item. You might try to use: PARSE VAR x first second =(first) third =(second) but that won't work. The problem is that, since parsing is driven by pattern matching, the variable first has to be referred to in order to determine a column number before it gets set. So it will have an undefined, or at best irrelevant, value. To handle this example correctly you would need something like: PARSE VAR x first ' ' second ' ' =(first) third, =(second) to force a pattern match on the blanks following the first and second numbers. Then first and second get set properly, and the instruction works as expected. One limitation with variable patterns is that only individual variable names can be used inside parentheses in a template. You cannot use any sort of expression. If computations are required, they must be done outside of the template and the results assigned to variables. However, you can use compound variables, and the tail may refer to variables which are set earlier in the template. @endnode @node Pars6 "PARSING PROCEDURE ARGUMENTS" One of the most frequent uses of PARSE, though sometimes in a concealed form, is receiving the argument values in a procedure. The ARG instruction can be used for this, but it is really just shorthand for PARSE UPPER ARG. As a rule, it is better to use PARSE ARG explicitly (without UPPER), since it does not mangle the case of the arguments. Use ARG only if you definitely want to ignore [the] case of the arguments. When receiving passed arguments with PARSE ARG, and in this case alone, the template has a special form. Since multiple argument strings can be passed, some notation is needed to indicate which parts of a template apply to which strings. A comma is used for this. The part of the template before the first comma applies to the first argument, the part before the second comma [applies] to the second argument, and so forth. The commas used in the template are not patterns, and they are not quoted. For instance, for a procedure called like this: CALL names 'Kellyn', 'Ashley', 'Shanna', 'Meghan' we might have: names: PROCEDURE PARSE ARG name1, name2, name3, name4 so that: name1 = "Kellyn" name2 = "Ashley" name3 = "Shanna" name4 = "Meghan" This takes the four procedure arguments and assigns them to variables. One way to think of this is that the commas imply four distinct parsing operations, as if there were actually three templates used: one for each argument. The preceding example, in one sense, does no actual parsing or character string manipulation at all. Instead, for each argument in turn, the whole argument string is assigned to a variable without regard to the content of the string. But each portion of the template could be more complex. We could have: CALL parse_poem, "'Twas brillig, and the slithy toves",, "Did gyre and gimble in the wabe:",, "All mimsy were the borogoves,",, "And the mome raths outgrabe." ... parse_poem: PROCEDURE PARSE ARG first1 rest1, first2 rest2, first3 rest3,, first4 rest4 to extract the first word of each argument. The remainder of each line gets assigned to one variable, eg: rest3 = "mimsy were the borogoves," because rest3 is the end of the template as far as the third argument is concerned. In general, there should be as many commas in the template that parses arguments as there are commas between arguments in the procedure call (ie. one less than the actual number of arguments). A frequent and easily made mistake would be to use (with reference to the earlier example): PARSE ARG name1 name2 name3 name4 This actually does something very different: name1 = "Kellyn" name2 = "" name3 = "" name4 = "" Since there are no commas, this statement deals with only the first argument. It operates on one string (the first argument to the procedure) and searches for blank-delimited words, of which there is only one in the first argument. So, if you have fewer subtemplates in an ARG instruction than you have arguments, some of the arguments will be ignored. On the other hand, if you have too many subtemplates, all variables in the excess subtemplate will be set to null strings, since in effect the excess subtemplates are parsing null strings. In particular, if you use templates with multiple subtemplates in any form of PARSE other than PARSE ARG, all variables in the subtemplates after the first will be set to NULL. This might even be useful: PARSE VALUE '' with x1, x2, x3, x4, x5, x6 is a quick way to set a number of variables to the null string, though: PARSE VALUE '' with x1 x2 x3 x4 x5 x6 is even more straightforward. You can use PARSE ARG anywhere in a subroutine. It does not have to be the first statement after the label or PROCEDURE instruction. You can also use it several times, with different templates each time. @endnode @node Pars7 "PARSE IN RELATION TO OTHER FORMS OF STRING MANIPULATION" In general, string manipulation chores that can be done with PARSE will be done more easily and more efficiently than the equivalent operations done with the word- or character-oriented built-in functions. In addition to being easier to program (once you get the hang of it), you can gain execution speed since the whole operation can be done internally, without the need to constantly issue function calls. The use of PARSE to separate one word at a time from a long string, instead of using the WORD() function, is a notable example of this. However, there are trade-offs. Even when PARSE and the string functions can perform more or less equivalent operations, there may be reasons other than processing efficiency for choosing one over the other. One possible reason if that more detailed error-checking can be done if you work at a lower level with the string functions. For instance, consider the problem of using PARSE to substitute one substring (contained in the variable x) for another (contained in y). You might be tempted to try something like: PARSE VAR string before (x) after string = before || y || after Although this is simple and elegant, it causes trouble if the substring x isn't contained in string. In that case, PARSE will merrily assign the whole string to before. Then the next statement will append the value of y. This is probably not what you want. The problem is that PARSE has made things (apparently) so easy that you can forget to test for exceptional cases. PARSE has no "smarts" and does not deal well with string match "failures". A more robust solution to the problem would be: IF POS(x,string) \\= 0 THEN DO PARSE VAR string before (x) after string = before || y || after END This tests separately whether the value of x occurs in the given string, since PARSE doesn't do that for you. String manipulation functions can always be used as an alternative to PARSE. Although they may be somewhat less efficient, they may be more flexible. For instance, INSERT() and DELSTR() can be used to do string substitution as in the previous example, and in order to use them you are almost forced to consider the possibility that the substring you want to replace isn't found: i = POS(x,string) IF i \\= 0 THEN string = INSERT(y, DELSTR(string,i,LENGTH(x)), i-1) @endnode @node Pars8 "PRACTICAL EXAMPLES OF PARSING" One more example: in this we would like to show the symmetry which is revealed when you consider the handling of long records with many fields of specific lengths. You might build up such a record for output with a statement like: record = RIGHT(val1, len1) || RIGHT(val2, len2) ||, RIGHT(val3, len3) || RIGHT(val4, len4) ||, RIGHT(val5, len5) || RIGHT(val6, len6) Given this, the same record could be input with: record = LINEIN(file) PARSE VAR record val1 +(len1) val2 +(len2), val3 +(len3) val4 +(len4), val5 +(len5) val6 +(len6) @endnode @node Inpu "9: INPUT AND OUTPUT" Programming languages exhibit wide diverges in their input/output capabilities. Some languages like PL/I attempt to provide comprehensive I/O facilities within the language itself. Others like C leave I/O entirely to separate function libraries (though, in the case of C, the essential core functions have become standardized to the point of being a true part of the language). Still other languages, like FORTRAN, have opted to include basic I/O capabilities as part of the language, without attempting to address the full range of possibilities that PL/I (for instance) does. REXX is in the middle. It is almost like C in that the language itself has no pure I/O constructs other than SAY. (The QUEUE, PULL, and PUSH instructions are halflings that sort of do I/O, some of the time.) REXX's I/O capabilities are embodied in a small number of built-in functions (mainly LINEIN(), LINEOUT(), CHARIN(), and CHAROUT()). These are, however, standard parts of the language, though their behaviour is allowed a certain amount of latitude across implementations. It is a good idea to distinguish two sorts of I/O which really place very different demands on a language. The first kind is I/O involving files and hardware devices like printers, where no human user or operator is directly involved. For this, the REXX built-in functions are fairly well-equipped to handle the most common basic tasks. They do not, however, offer any particular support for anything besides flat files, nor are they capable of dealing with idiosyncrasies of specific hardware devices. The second kind of I/O involves interaction with a human user. For this, REXX satisfies the most rudimentary requirements by providing SAY and PARSE PULL. This assumes a hardware model that consists of a simple printer-keyboard combination. REXX has nothing to offer as part of the language for any more modern human interface styles such as windows or direct manipulation. But, then, neither do most other programming languages. We stress this distinction, because REXX's file I/O facilities can also be employed for user interaction by means of the notion of treating a keyboard-screen combination as a virtual file on which a program can read and write. When this is done properly, it can add a certain kind of flexibility to a program, because the program may be able to operate equally well whether it is reading from and writing to a human operator, a file, a device, or even another program. Such device-independent programs offer a certain economy of effort, in that they may be used in a wider range of ways. In the best cases, such as the filter programs which are so common in UNIX, this permits a powerful, building-block approach to writing utilities. But, as everyone knows, though such programs are capable of interacting with people by treating the keyboard and screen as devices, the quality of the user interface leaves something to be desired. Furthermore, the intentional blurring of the distinction between human-oriented and computer-oriented forms of I/O capbilities can lead to certain confusions about specific REXX I/O features. Besides that, the two kinds of I/O often present different problems and possibilities in such things as random access and dealing with the end of file. And further confusion can arise because of the semi-incorporation in REXX of yet another kind of (pseudo-) device, the external data queue, to be discussed in the @{"next chapter" link Queu}, which is (among other things) another paradigm for virtualizing a human operator. In the first implementation of REXX, in IBM's VM/CMS operating system, the two kinds of I/O were completely separate, for better or worse. User interaction was done with SAY and PULL, but file I/O could be done only with external commands, such as EXECIO. This command was clever in that it could also work with the procedure language which preceded REXX, but it was ultimately a kludge. Nevertheless, it was necessary, as the earliest implementation of REXX in VM/CMS did not support the I/O built-in functions. (It still does not at the time of this writing.) Since this situation must inevitably change, we won't deal with EXECIO here, and it is not part of the REXX language in any case. Like most of the rest of REXX, the I/O model was designed for simplicity of use. It was intended for handling what were expected to be the most common cases of I/O very easily. As far as user interaction was concerned, this meant a simple line-at-a-time, question-answer style of user interface. More sophisticated capabilities were consciously excluded. Unfortunately, what is adequate for simple user interaction sometimes doesn't work quite so well for files. Furthermore, even the simple user interface model was falling from favour at the time REXX was designed. REXX emerged at the end of an era of computer technology in which it was still possible to take seriously the model of an interactive program which communicated with its user through a question-answer user interface. At the time (early 1980s), such line mode interfaces were already being rapidly supplanted by "full-screen" interfaces which exploited a two- dimensional surface for interaction with users. And these in turn rapidly gave way to even more elaborate systems of windows, pop-up or pull-down menus, requesters, and the like. At present, the complexity of programming adequate user interfaces, in either text or graphics mode, far outstrips the resources of any language in common use which does not have access to extensive (and unstandardized) interface libraries (or the equivalent classes of object-oriented languages). This makes any plain, unaugmented language, including REXX, not very suitable for building what are now considered to be "good" user interfaces. Fortunately for REXX (and other languages), not all programs require "good" interfaces. Some don't require a user interface at all. Because much use of REXX is for such things as system command procedures, application macros, and quick-and-dirty utilities, sophisticated interface tools are frequently not needed. But other common uses of REXX, such as prototyping and application building by combining tools, do suffer from the lack of good interface capabilities. @{" CHARACTER-ORIENTED VS. LINE-ORIENTED I/O " link Inpu1} @{" OPENING A FILE " link Inpu2} @{" FILE READ/WRITE POINTERS " link Inpu3} @{" CLOSING A FILE " link Inpu4} @{" LINE-ORIENTED FILE I/O FUNCTIONS " link Inpu5} @{" CHARACTER-ORIENTED FILE I/O FUNCTIONS " link Inpu6} @{" COMMUNICATION WITH THE USER " link Inpu7} @{" EXAMPLE: BINARY SEARCH OF SORTED FILES " link Inpu8} @{" EXAMPLE: MAKING A FILE INDEX " link Inpu9} @{" EXAMPLE: WRITING 'FILTERS' IN REXX " link Inpu10} @endnode @node Inpu1 "CHARACTER-ORIENTED VS. LINE-ORIENTED I/O" Let's leave such philosophical observations aside now, and begin to explore REXX's I/O capabilities by looking first at the file I/O functions. REXX provides two distinct sets of file I/O functions, corresponding to a distinction between two ways of regarding a file. From one point of file, a file is just a string of zero or more bytes. Every byte is just like any other byte in that the file system does not reserve special bytes to indicate the end of a line or a record. The other point of view is that a file is a sequence of zero or more lines or records. Each line is a sequence of zero or more bytes. Lines may all be of one fixed length, or they may be of varying length. Sometimes the end of a line is marked by one or more bytes actually contained in the file, and sometimes there are no such markers, with the file system itself keeping track of the location of line boundaries. Most operating systems now support both of these views of what a file is, but they also generally favour one view or the other. Mainframe file systems, for instance, generally consider a file to be a sequence of lines, whereas most other systems treat a file as a sequence of bytes. These two views are not wholly incompatible, and any given file system may support both views of the same file, even simultaneously. This is obviously an area of much potential for incompatibility between systems. REXX supports both views of file organization through its character- oriented and its line-oriented file I/O functions. They may even be used simultaneously on the same file, though this can be troublesome. The character-oriented functions are CHARIN(), which reads one or more characters; CHAROUT(), which writes one or more characters; and CHARS(), which returns the number of characters that remain to be read in a file. The corresponnding line-oriented functions are LINEIN(), which reads a single line from a file; LINEOUT(), which writes a single line to a file; and LINES(), which returns the number of lines that remain to be read in a file. @endnode @node Inpu2 "OPENING A FILE" Most operating systems require that an operation called opening be performed on a file before it can be used. Correspondingly, most languages require files to be opened before using language I/O facilities. The open operation has several purposes. One of these is to identify by name the file or device that is to be used. Usually the open process associates the name with a handle or control block which is used in subsequent I/O operations rather than the name itself. Another purpose of the open operation is to allow the user or programmer to make choices among various available options for file processing. The options that are available, of course, will vary greatly from system to system, but they often include: * whether the file is to be read, written, or both * whether writing (if any) should start at the beginning of end of the file * how the file may be shared by other programs that have concurrent access to it * what attributes the file should have if it is being created * how much space should be pre-allocated for the file if it is being created Notice that all of these things are true options in the sense that they are not (ordinarily) characteristics of the file itself (if it already exists). Various other options deal with alternative ways of processing a file that may admit variation, yet must still be compatible with the true characteristics and organization of the file. This category of options includes: * whether the file is to be treated as text or binary. This distinction usually refers to whether there is to be special handling for control characters that indicate the end of a line. * what the logical length of a file record is, in case there is no internal indication in the file of record boundaries. What this discussion is leading up to is the fact that there is no function in REXX, in contrast to most other languages, for opening a file. This is because, for most simple file operations, it is possible for REXX to open a file automatically the first time it is used. REXX does this in the spirit of "helping" the programmer by eliminating a frequently unnecessary and often confusing operation. This is akin to the REXX philosophy that it should not be necessary to declare variables before use. (It is not irrelevant, though, that in VM/CMS where REXX was first implemented the operating system did not require a file to be opened before being used.) In languages where opening a file is required, the file is normally referred to by its name only at the time it is opened. Thereafter, the open file is referred to by either a handle (an object returned by the open routine) or by a control block. But in REXX a file is always referred to by its name. REXX internally keeps track of whatever handles or control blocks are needed by the operating system, automatically associating them with the name after the first time the file is used. Keeping such details out of the program itself can contribute to portability across systems. Of course, the disadvantages of having no explicit open operation is that it is not possible in standard REXX to specify any of the options which are otherwise available when a file is opened. Standard REXX is able to handle some of the options by choices available in the file I/O functions. For instance, whether the file will be read or written is determined by whether both input and output functions are actually used. Since REXX can't know in advance whether you might want to update a file even if the first operation is to read it, a REXX implementation will typically open the file in read-write mode (if possible). Thereafter, reads and writes can be intermixed in any order. If there is some reason why REXX cannot open the file in read-write mode, it may open it in read-only mode. Then REXX will signal an error condition if, and only if, an output operation is attempted. One side effect of this approach, which may or may not be a disadvantage, is that the file may be opened in a mode that prevents other programs from simultaneously reading or writing it, even if the file is only going to be read. The inability to choose at open time whether to start writing a file at the end (append mode) or at the beginning (create mode) is of lesser importance. Normally a REXX implementation will by default begin writing at the end of an existing file (to reduce the chance of unintentional destruction of data). But both the CHAROUT() and the LINEOUT() functions provide the ability to write at other positions in a file, including the beginning. (Though in some cases it may not be possible for an implementation to support writing at an arbitrary byte or line position in the middle of a file.) Another common option in opening a file is whether to treat it as text (a sequence of lines) or binary (a sequence of bytes). In general, REXX's character-oriented functions, when fully implemented, deal with a file in binary mode as a sequence of bytes. That is, they read or write exactly what is contained in the file and do not make any transformations on the data or treat any characters as special (eg. end of file characters). The line-oriented functions may, however, interpret certain control characters in special ways, usually to determine the end of a line. (But note that the distinction between text and binary is very situation- and system- dependent, and is not becessarily the same as the line vs. character distinction). Some operating systems like MVS permit certain file characteristics to be specified at open time, such as the length of logical records of a file. REXX's character-oriented functions (though not the line-oriented functions) handle this easily, since any number of characters can be read at a time. There are several alternatives for handling those open options that can't be simulated in the REXX file I/O functions themselves. For one thing, there is a catchall built-in function called STREAM() which can be used by an implementation to provide any desired I/O capabilities above and beyond what can be done with the standard functions. Among other things, STREAM() may support an explicit open command, which allows specification of necessary file open options. Unfortunately, the specific commands provided by STREAM(), including the open command, are completely implementation-dependent and not standardized. You might just as well use the native file system facilities of your implementation of REXX. An implementation can also add as many extra built-in functions as it likes to provide necessary I/O services. It may choose to offer a complete set of I/O functions more or less parallel to the standard ones but which correspond more closely to the I/O facilities of the underlying operating system. The numerous possible file-handling options, and their lack of consistency among different systems, is one of the major reasons I/O has always been a difficult topic in learning a programming language. When treated in full generality, I/O remains difficult even in REXX. Yet a few of the most common requirements are handled cleanly and automatically by the language. @endnode @node Inpu3 "FILE READ/WRITE POINTERS" There is one concept which applies to most of the I/O functions, so we will discuss it first. This is the fact that REXX keeps track of its current position in the file at all times. This is done with something conceptually like a cursor or a pointer that always indicates the next character or line of the file that will be read or written. The current position is maintained independently for reading and writing with what we will refer to as a read pointer and a write pointer. By definition, the read pointer is positioned just before the next character or line to be read from the file. When a file is opened, the read pointer is conceptually just before the first character of the file. As data is read by CHARIN(), the read pointer is advanced by the number of characters read, so that it is positioned after the last one read. When LINEIN() is used, the read pointer is moved to a pointer right after the line just read, including any characters signifying the end of the line. If CHARIN() is used after LINEIN(), the read pointer can be left pointing somewhere in the middle of the line. If LINEIN() is then used, it will return only a partial line, from the read pointer to the end of the line. Similarly, the write pointer is always positioned at the point where the next output operation will add data to the file. When a file is opened, the write pointer will be right after the last existing character of the file, so that any newly written data will be appended to the file. As new data is written by CHAROUT(), the write pointer is advanced to right after the last character written. The read and write pointers can also be changed without actually performing I/O. All of the four principal file I/O functions are defined so that the read or write pointer can be moved to a specific location before the operation begins. This permits random access for reading and writing in a file. The functions also allow for requesting 0 characters or lines to be read or written. This allows for changing the read or write pointers without performing I/O. Additionally, the STREAM() function may provide another means of changing the pointers. This operation is sometimes called seeking. When using the character-oriented functions to set the read or write pointer before an operation, the relative byte number in the file is specified. Bytes in a file are numbered starting from 1, which is consistent with how bytes in REXX strings are numbered. When the line- oriented functions are used to set the read or write pointer, the relative line number (starting with 1) is specified. There is no standard way of enquiring what the value of the current read or write pointer is, though it may be possible in some implementations through the STREAM() function. In cases where it can somehow be done, you need to understand clearly whether the numerical value used refers to a relative byte or line number. We should point out that the behaviour of the read and write pointers is as yet a somewhat unstandardized area of the language, as is true, in fact, for file I/O in general. Implementations definitely do differ. For instance, even though Cowlishaw explicitly states that the read and write pointers should be independent, at least one implementation (IBM's OS/2 REXX) forces them to be the same. This means that writing to a file can change the read pointer, and vice versa. Making the read and write pointers the same is, of course, extremely dangerous, since if a program reads randomly in a file and then does output, data may be inadvertently destroyed because the write pointer has been moved to the position of the last read. Be careful. Anyway, the purpose of the read and write pointers is twofold. They make it possible, first of all, to have random access to a file, in order to read or write at arbitrary positions. This is done, as we said, when a position is specified explicitly in the file I/O functions. The other purpose of keeping track of the current position (specifically, the read position) is to give a well-defined meaning to the CHARS() and LINES() functions. CHARS() returns the number of characters in the file that have not yet been read, ie. the number of characters from the current read position to the end of the file. For instance, when a file is first opened, the value of CHARS() should be the size of the file. Similarly, LINES() returns the number of lines from the current read position to the end of the file. It is worth noting at this point that these concepts are meaningful for files, but not for most of the other file-like things that may be accessed by the file I/O functions disguised as files. Such things include the user's keyboard and screen, devices like a printer, and pipes (which allow communication with another program as if it were a file). So our discussion here is essentially "file chauvinist". We speak as if the I/O functions always deal with files, and as if it is always possible to do something reasonable with a file. It must be recognized, however, that this point of view can't be maintained consistently when working with nonfiles. And what is even worse, there is no means in standard REXX of even detecting whether a name refers to an actual file, as opposed to a device, a pipe, or whatever. Indeed, many operating systems, in an effort to promote device independence, go out of their way to conceal this information from programs. Cowlishaw's @{i}The REXX Language@{ui} avoids most of these messy issues. It makes some attempt to distinguish between persistent streams and transient streams. However, these terms aren't defined, except by example. A persistent stream is, essentially, a file. The term implies that the object persists, so that one can reread it and get the same data, unless it has been explicitly changed. Such a thing also admits having a well- defined beginning and end, so that the notions of read and write pointers make sense. A transient stream, on the other hand, cannot be read or written at random and does not have a well-defined beginning and end. You should also recognize that even in the case of a file, the underlying operating system may not permit random access. And even if random access is possible, it is often the case that such access can be done by byte number or by line number, but not both. Those file systems which view a file as a sequence of bytes generally allow random access by byte number. They do not usually allow random access by line number, nor can they report how many lines remain unread in a file. Similarly, if the file system views a file as a sequence of lines, it probably permits random read access by line number. It may or may not allow random write access by line number, and if it does, it may not allow change in the size of a given line, or it may delete all lines after a new one is written in the middle of a file. File systems of this sort rarely allow random access by byte number. REXX does not provide any mechanism for indicating to a program what types of random file access are permitted. Your program is expected to know what [it] is reasonable to do. If the program must be portable across systems, it should be aware of the environment it is running in, if only to provide warnings when it can't do something that would not be supported. @endnode @node Inpu4 "CLOSING A FILE" The operation that is the inverse of opening a file is closing a file. Several important purposes are served by this operation: * If new data has been written to the file, the operating system can be notified that the data is complete and should be committed to permanent storage. The system may assign a last modification date to the file at this time. * Restrictions on sharing the file which would inhibit access by other processes are removed. * File buffers (if any) may be released back to the operating system. * File handles (which may be a limited resource) can be released for reuse. Because these functions, or close equivalents, are almost always needed, most operating systems support the concept of closing a file. In particular, you should never assume your data is safely written to disk so that it can be accessed by another program until the file has been closed. Moreover, it is generally not possible to automatically or implicitly close a file when your program is done with it. The request to close a file is the way that the program says it is done with a file. Of course, most implementations will close all open files when a REXX program returns to the operating system, but it is very bad practice to depend on this. In a long-running program, quite a large amount of time may pass before a program terminates. During that time, an open file may be unavailable to other processes, and it may even be lost completely if the computer crashes. In the case of files destined for a printer, the operating system may wait until the file is closed until printing can begin. Yet another consideration is that REXX programs frequently invoke other programs or system commands. A file needed by such programs may be unavailable or incomplete unless it has been closed first, leading to mysterious errors and possible data loss. To avoid all this, just observe the simple rule: always close files as soon as you are finished with them. REXX does have a means for explicitly closing a file, though it is unintuitive and a little obscure. Instead of simply having a function to close a file, REXX provides that when either the CHAROUT() or LINEOUT() function is used with only the filename as an argument, then the file will be closed. This applies even to files that have been used only for input, and even to files that are read-only. The STREAM() function may, in various implementations, also provide a way to close a file, but any such capability is not standardized. @endnode @node Inpu5 "LINE-ORIENTED FILE I/O FUNCTIONS" We'll take a closer look at the line-oriented functions first. They are, probably, the most commonly used, because they work well with the kind of text files one tends to deal with most often in REXX. There is no completely consistent definition of what a text file is which applies to all systems. In general, however, it is thought of as a file consisting of a sequence of lines of text. These lines usually (though not necessarily) will consist entirely of printable characters. Text files are usually created and maintained with a text editor (naturally enough). They are used for such things as program source code, electronic mail, and general program data input and output. On systems which internally store files as a sequence of bytes, the individual lines of a text file are generally delimited by specific control characters embedded in the file. LINEIN() is the line-oriented input function. Its syntax is: LINEIN([name], [line-number], [line-count]) LINEIN() returns the next line of the file, starting at the current read position or the position specified by line-number. If the current read position is at the end of the file, LINEIN() returns a null string. When lines of a file are delimited by characters embedded in the file, the final delimiting characters are not included at the end of a line. For example: line = LINEIN("PROFILE EXEC A") /* next line */ line = LINEIN() /* read line from standard input */ All of LINEIN()'s arguments are optional. name is the name of the file. Depending on capbilities of the operating system, name may actually refer to a device, such as a serial I/O port, which is being simulated as a file. If name is omitted or a null string, LINEIN() reads from the default input stream. The nature of the default input stream is system-dependent. In CMS it is always the keyboard. In MS-DOS, OS/2, AmigaOS, and UNIX, it is the standard input file, which might be the keyboard, a file, a device, or even another program (through pipes). line-number specifies the relative line number in the file at which the read should begin. REXX numbers file lines starting at 1. The default, if line-number is not specified, is to begin reading at the current read position. If calls to CHARIN() have been used, it is possible that the current read position may not be exactly at the beginning of a line, so that LINEIN() may not return a full line. Furthermore, not all file systems are capable of supporting a read that starts at an arbitrary line number. In particular, when a file is organized as a sequence of characters and lines are delimited by a particular character sequence, the capability to begin reading at an arbitrary line number would be very inefficient to implement, so it is not usually possible. line-count may be 0 or 1. It is 1 by default, and specifies that one line is to be read. A value of 0 indicates that no input is to be done, but the read pointer is to be moved to the line specified in line-number (if possible). In this case, LINEIN() returns a null string. LINEIN() may behave differently with input sources other than files. A file has a definite location called the end of file, beyond which LINEIN() will return a null string. However, the end of file concept does not usually apply to devices, the keyboard, or other programs, though the operating system may provide a means of signalling [the] end of file for some of these. In cases of this sort, LINEIN() may simply stop and wait until more input is available rather than returning a null string. Positioning to a specific line number is also generally not possible for input sources other than files. There are no standard REXX functions which can be used to determine whether a given name actually refers to a file, whether an end of file can be recognized, or whether random positioning by line number is possible. Although LINEIN() returns a null string when the end of file is reached, this is not a reliable way of determining that there is no more input, since a file may very well contain null lines that are not at the end of the file. REXX provides the LINES() function to handle detection of the end of the file. Its syntax is: LINE([name]) LINES() returns the number of complete or partial lines not yet read in the file specified by name. If name is omitted or a null string, the default input stream is assumed. LINES() determines the number of lines remaining in terms of the number of lines from the current read pointer to the end of the file. If the file has not yet been opened, LINES() will open the file and return the number of lines in it. If the end of file has been reached, so that the read pointer is after the last line, LINES() returns 0. Because not all file systems permit efficient counting of lines (as with files that are a sequence of bytes and lines separated by special control characters), LINES() may simply return 1 to indicate that there is more input to be read. When LINES() is used with an input stream other than a file, such as a device, a pipe, or the keyboard, the result is harder to predict. Usually LINES() will return 0 if an input line isn't currently available. That is generally the same case in which LINEIN() would wait for input to become available. LINEOUT() is the line-oriented output function. Its syntax is: LINEOUT([name], [date], [line-number]) LINEOUT() writes one line of data to the file, and returns 0 if it was successful. Otherwise it returns the number of lines not completely written (perhaps because the output disk was full or not ready). As usual, name is the name of the file (or device, etc.) to be written to. data is the line of data to be written. Line delimiter characters, if used by the file system, should not normally be included in data, because they will be added automatically. Examples: CALL LINEOUT , 'Hello world!' /* standard output */ CALL LINEOUT 's:startup-sequence', 'assign T: RAM:T' For file systems that support random file access by line number, line- number is the relative line number, starting with 1, where data is to be written. Even if a file system supports this kind of file access, it may cause loss of data beyond the new line being written, especially if the new line is not the same length as the one being replaced. Be sure you understand how your file system handles this situation. If name is omitted or a null string LINEOUT() writes to the default output stream. The nature of the default output stream is system- dependent. In CMS it is always the terminal. In MS-DOS, OS/2, AmigaOS, and UNIX, it is the standard output file, which might be the screen, a file, a device, or even another program (through pipes). If both data and line-number are omitted, the specified file is closed as described earlier. If data is omitted, but not line-number, the write pointer is set to the specified line (if the file system supports this). If line-number is omitted, which is by far the most common case, the new line is written at the current output position as specified by the write pointer, which is normally the end of the file. The simplest type of file I/O that is done in REXX often involves reading a line, processing it in some way, and producing output. For instance, a simple program that copies one file to another might be no more than: PARSE ARG input output DO WHILE LINES(input) > 0 CALL LINEOUT output, LINEIN(input) END CALL LINEOUT input CALL LINEOUT output This is just a read-process-output loop, where there is no processing to speak of. It can be elaborated upon to handle almost any situation that involves reading lines sequentially from an input file, processing each in some way, and writing to an output file. The processing might be searching for specific words (as in the WORDFIND program discussed in @{"Chapter 3" link Stru}), reformatting the input lines, computing totals of input data, or whatever. @endnode @node Inpu6 "CHARACTER-ORIENTED FILE I/O FUNCTIONS" The character-oriented functions work equally well with text or binary files. Of course, if the file system does not permit character-level access to files, the character-oriented functions may be unsupported or only partially supported. These functions return the exact data which is in the file, without translation or interpretation. In particular, special control characters used to signify line end or end of file are returned along with ordinary data characters. It should be possible to process any type of file with the character- oriented functions. This includes database files, executable programs, word processor files, or whatever. Your program is responsible for knowing the detailed low-level format of the file, however. CHARIN() is the character-oriented input function. Its syntax is: CHARIN([name], [character-number], [character-count]) CHARIN() returns the next character-count characters of the file, starting at the current read position or the position specified by character- number. If the current read position is at the end of the file, CHARIN() returns a null string. Examples: zip_code = CHARIN('address.dat', pos, 5) CALL CHARIN file, 1000, 0 /* position at byte #1000 */ All of CHARIN()'s arguments are optional. name is the name of the file. If name is omitted or a null string, it is assumed to be the default input stream. character-number specifies the relative character number in the file at which the read should begin. REXX numbers file characters starting at 1. The default, if character-number is not specified, is to begin reading at the current read position. Not all file systems are capable of supporting a read that starts at an arbitrary character number. This is frequently true with file systems that primarily regard a file as a sequence of lines. character-count may be any nonnegative whole number. It is 1 by default. A value of 0 indicates that no input is to be done, but the read pointer is to be moved to the byte specified in character-number (if possible). In this case, CHARIN() returns a null string. Like LINEIN(), CHARIN() may behave differently with input sources other than files. A file has a definite location called the end of file, beyond which CHARIN() will return a null string. In cases where [the] end of file is undefined, CHARIN() ,ay simply stop and wait until more input is available, rather than returning with fewer characters than requested. Positioning to a specific character number is also generally not possible for input sources other than files. CHARS() is the character-oriented analogue of LINES(). It may be used to determine the number of characters left to read in a file. Ordinarily it is used simply to dermine whether the end of file has been reached. The syntax is: CHARS([name]) CHARS() returns the number of characters not yet read in the file specified by name. If name is omitted or [is] a null string, the default input stream is assumed. CHARS() determines the number of characters remaining in terms of the number of characters from the current read pointer to the end of the file. If the file has not yet been opened, CHARS() will open the file and return the number of characters in it. If the end of file has been reached, so that the read pointer is after the last character, CHARS() returns 0. CHARS() may return 1 if the end of file has not been reached on a file, but the exact number of characters to be read cannot be efficiently determined. As with LINES(), when CHARS() is used with an input stream other than a file, such as a device, a pipe, or the keyboard, results are harder to predict. Usually CHARS() will return 0 if no characters are currently available, even though more may be added later to the end of the input stream. It is not a good idea to use CHARS() to detect the end of file when input is being done with the LINEIN() function. The reason is that in some implementations LINEIN() may stop returning nonnull lines without actually reading past the end of [the] file. (This can happen with files where certain control characters are used to indicate the end of file even though more data actually remains to be read.) As a rule of thumb, for reasons like this, it is a good idea not to mix the line-oriented and the character-oriented functions, unless you are sure of what you are doing. CHAROUT() is the character-oriented output function. Its syntax is: CHAROUT([name], [data], [character-number]) CHAROUT() writes the specified data to the file, and returns either 0, if it was successful, or the number of characters which couldn't be written if not successful (perhaps because the output disk was full or not ready). As usual, name is the name of the file (or device, etc.) to be written to. data is the string to be written. Examples: CALL CHAROUT , 'Enter account code:' /* display prompt on standard output */ CALL CHAROUT 'image.dat', databits, location For file systems that support random file access by character number, character-number is the relative character number, starting with 1, where data is to be written. IF name is omitted or [is] a null string, CHAROUT() writes to the default output stream. The nature of the default output stream is system- dependent. In CMS it is always the terminal. In MS-DOS, OS/2, AmigaOS, and UNIX, it is the standard output file, which might be the screen, a file, a device, or even another program (through pipes). If both data and character-number are omitted, the specified file is closed as described earlier. If data is omitted, but not character-number, the write pointer is set to the specified character position (if the file system supports this). If character-number is omitted, which is by far the most common case, the data is written at the current output position as specified by the write pointer, which is normally the end of the file. @endnode @node Inpu7 "COMMUNICATION WITH THE USER" There are no additional functions needed for communicating with the user of a program by means of simple dialogues. Everything that can be done along these lines in standard REXX can be done with the facilities already described. However, there are a few special considerations to be noted, and some instructions that provide a little cosmetic simplification. Ordinarily, and by default, the standard input stream is the user's keyboard, and the standard output stream is the user's screen. The standard input and output streams can be specified in any of the file I/O functions simply by omitting the name argument or (on some systems) by making it a null string. It is not necessarily true that the default input and output streams are the user's terminal, since most environments permit streams to be redirected to files, pipes, or devices. (We will go into this a little further when we discuss filters.) But your program has no general means of determining whether its standard input and output have been redirected, so for the sake of generality it should assume that they have not been. That is, you should assume the standard I/O streams are connected to the user's terminal. To emphasize this, we will refer to these as the terminal I/O streams. There are limitations on what can be done with the terminal streams. The main thing is, you cannot position randomly in them, so you should not specify character or line numbers in the I/O functions. There are no meaningful read or write pointers associated with the terminal input and output. However, as a special case, the CHARS() and LINES() functions will return a value of 1 for the standard input stream, to indicate that data may be available. Finally, you do not need to close the terminal streams. In fact, it is a good idea not to, since doing so may prevent other, independent parts of a program from doing terminal I/O. REXX provides the SAY instruction as a shorthand form of LINEOUT(). In other words: SAY expression is almost fully equivalent to: CALL LINEOUT , expression We say "almost", because expression in both cases is optional, but the results of omitting it are different in the two cases. If SAY is used by itself without an expression, it displays a null line on th terminal, as if the value of the expression were null. On the other hand, LINEOUT() will a null expression will display a blank line, but with an omitted expression it will close the terminal input stream. It is customary in REXX programs to use SAY for terminal output. This saves typing a few characters and it makes the operation of the program a little easier for a reader to follow. The PULL instruction is in some sense the analogue of SAY in that it provides a simplified means of reading a line of input from the terminal. Unlike SAY, however, it is not equivalent to one single I/O function. In the first place, PULL is really a shorthand form of PARSE, so that: PULL template is equivalent to: PARSE UPPER PULL template In both cases, template is a PARSE template, which could perform elaborate input parsing. But when your program is doing terminal input, you probably don't want to make the user type things in a rigid format to match a complex template, so usually the template is just a variable name to which all input is assigned. So PULL saves a little typing over the equivalent PARSE form, but unfortunately it forces all input to uppercase. Not having to deal with mixed case can make it easier for a program to interpret user input, but it can also get in the way too. There's another complication with PULL, in that before it reads from the terminal input stream, it will attempt to read a line from the external data queue. The external data queue is a separate REXX mechanism for storing data temporarily in a scratch area, and it will be discussed in detail in the @{"next chapter" link Queu}. It is often used for communication between REXX programs (in the absence of any standard means for sharing variables). And, in some implementations, the external data queue may be usable as a surrogate for terminal input. This usage of the queue is why PULL and PARSE PULL are defined to take input from the queue before they read from the terminal. You can use the queue to prepare data in one REXX program that will be read as input by another REXX program called from the first, provided the called program uses PULL or PARSE PULL to read input. In this way the called program can be written so that it will accept input from the terminal if there is none in the queue. However, sometimes you will want to force a program to read input from the terminal, regardless of the contents of the queue. The way to do this is to use the I/O functions CHARIN() or LINEIN() directly, because they are defined to read only from a specified input stream (such as the terminal input stream) and never from the external data queue. There is even a form of PARSE which recognizes this use of LINEIN() to bypass the data queue: PARSE LINEIN template is defined to be the same as: PARSE VALUE LINEIN() WITH template @endnode @node Inpu8 "EXAMPLE: BINARY SEARCH OF SORTED FILES" Our first extended example of file I/O to do something moderately interesting is an illustration of the binary search algorithm. This is a technique of searching certain sequential files that is much more efficient than a brute force search through the whole file. The technique depends on the assumption that the file has already been sorted (in ascending order, let's say) on the value of the key that we are searching for. It operates by first examining the record in the middle of the file. If that is not the record we want, we can then at least be sure that the record we're looking for must be either in the first half of the file or the second [half], if it is present at all, based on whether or not the desired key is lower or higher than the one in the record we examined. The process is repeated as many times as necessary, and each time the size of the portion of the file in which the desired record can be found is reduced by half. This is why it is called a binary search. It is a very fast process. A file of 1000 records can be fully searched for a record with a desired key in at most 10 steps, since 2 ** 10 = 1024. Here's a REXX program for doing a binary search: /* do a binary search of file for specified key */ /* returns record number of line containing the key */ binsearch: PROCEDURE PARSE ARG file, key, lrecl size = STREAM(file,'c','query size') IF size = '' THEN DO SAY "File not found:" file return 0 END records = size % lrecl high = records low = 1 DO WHILE low <= high mid = (high + low) % 2 CALL CHARIN file, (mid - 1) * lrecl + 1, 0 line = LINEIN(file) test = WORD(line,1) IF test < key THEN low = mid + 1 ELSE IF test > key THEN high = mid - 1 ELSE LEAVE END CALL LINEOUT file IF low > high THEN DO SAY "Key '"key"' not found in" file RETURN 0 END ELSE RETURN mid The program is coded as an internal procedure which is meant to be called with three arguments. The first is the filename, the second is the key to be searched for, and the third is the logical record length of the file. To make the example interesting, we have assumed that the file system stores files as a stream of bytes, and record boundaries are marked by control characters contained within the file. It is further assumed that all lines have the same length. The logical record length is the length of each line plus the number of delimiters per line. (For instance, in MS-DOS or OS/2, there are [unnecessarily] two delimiters per line: a carriage return and a line feed.) The example calls the STREAM() built-in function to determine the size (in characters) of the file. The arguments of the STREAM() function are not fully specified by the language. Here we have used arguments understood by by Personal REXX and IBM's OS/2 REXX. Other implementations must use some other technique to obtain the size of the file. Since this is the size in characters, we have to divide by the logical record length to get the number of lines in the file. We used an integer divide here ("%") to be sure we have an integral number to work with. (This allows for the presence of overhead characters such as "end of file" that are not part of any line.) Two variables, low and high, hold the top and bottom of the range of lines between which the desired record can be located. The loop is performed as long as low does not exceed high. It must terminate eventually, since the range is reduced by at least one line each time (and usually [by] much more). The variable mid is the number of the record we will examine next. The call to CHARIN() positions the file in terms of a relative byte number, since we assume the file system does not support positioning by line number. Notice that we have to be careful with byte numbering, since REXX numbers files with the first character being 1 rather than 0. No data is actually read by CHARIN(), since the third argument is 0. Having positioned the read pointer, the line we want to examine can be read with LINEIN(). For simplicity we have assumed that the first word of the line contains the key value on which the file has been sorted. A more general routine would allow the key to be located in any given position on the line. The loop terminates when the value in the line is neither higher nor lower than the key being searched for. If the key wasn't found, the loop will terminate because low became greater than high. In that case we return 0, which is not a valid number, since we assume lines are numbered starting with 1. Otherwise we return the number of the line where the key was found. @endnode @node Inpu9 "EXAMPLE: MAKING A FILE INDEX" In a file system where a file is a sequence of bytes and records are separated by embedded control characters, there is a problem in representing a file as a sequence of items that can vary in size up to some large number if we want efficient access to each item. If we want to allow each item to be up to 64,000 characters long (for instance), the naive approach would be to create a file with logical records each [being] 64,000 characters long. If the average item (record) is much shorter than this, we will have an exorbitant amount of overhead. One simple way to deal with this situation is to store the data as two separate files. One file contains the actual data, and the second file is an index to it. Only the index file needs to have a fixed record length for efficient access. The length of each record in the index can be fairly short. Let us suppose [that] the maximum size of the data file will be [about] 1Gb. For simplicity, let's suppose the limit is actually 999,999,999 bytes, so that a maximum of nine digits are needed to store the relative byte number with the default NUMERIC DIGITS 9. Five digits suffice to store the size of an item. The relative byte address and size of an item will be stored in an index record in character form, so each index record needs to be 14 bytes long (plus, say, two more bytes for delimiters). We could use this index technique for other purposes as well. In fact, this is how databases are often implemented. The primary data records are stored in a main data file in the order they are added. The data records can be of either fixed or varying length. As many index files as desired can be built which represent the primary data sorted on different key values. Each index record would contain the key value and the relative byte address of the corresponding data record in the main file. Such index files can be searched by a binary search as discussed above, allowing fast random access to the primary data records based on various key values. About the only drawback to this technique as just outlined is that such index files have to be rebuilt entirely every time a new data record is added. It would be possible to go a step further and store the index files as B+ trees to allow for efficient modification of the indices. This would still be easy to implement in REXX, but a full discussion of this method would take us too far afield. Instead, we'll look at one of the most elementary uses of an index file, where we just want an efficient way to access variable size data records by record number. The application is a "cookie" program that displays a random fortune cookie fortune each time it is invoked. Here it is: [1] /* display fortune cookies */ [2] datafile = 'fortune.cookies' [3] indexfile = 'fortune.cookie.index' [4] lrecl = 16 [5] count = STREAM(indexfile, 'c', 'query size') % lrecl [6] item = RANDOM(1, count) [7] CALL CHARIN indexfile, (item - 1) * lrecl + 1, 0 [8] index_record = LINEIN(indexfile) [9] CALL LINEOUT indexfile [10] PARSE VAR index_record rba 10 size [11] bytes_read = 0 [12] CALL CHARIN datafile, rba, 0 [13] DO WHILE bytes_read < size [14] line = LINEIN(datafile) [15] bytes_read = bytes_read + LENGTH(line) + 2 [16] SAY line [17] END [18] CALL LINEOUT datafile That's all there is to it. Most of the details are similar to those of the previous example. Notice (again) that there are 14 bytes of information in each index record (a nine-digit and a five-digit number), but we have assumed two extra bytes per line for delimiters. Actually, we could dispense with the delimiter characters. It all depends on how the index file is maintained. As illustrated, the index file could be updated with a line-oriented text editor which will assume the line delimiters are present. In fact, editing the index would not make much sense, because it would be very tedious to put in file offsets and item sizes by hand. Instead, indexes should be created and maintained by a program, and we'll illustrate this shortly. The program can simply omit line delimiter characters. In that case, we would have to modify the above example by setting lrecl to 14 [in line 4], and replacing the line [8] where we read an index record with: index_record = CHARIN(indexfile, , lrecl) We continue to assume that line delimiters are present in the data file, which requires us to account for them in the loop where we are reading a single item, by adding two to the length of each line. Let's look at how to maintain the index. We shall assume that items to be added to the data file will be created with a text editor. Initially, each item is contained in its own file. We will have a REXX program that appends each item file to the main data file and updates the index. The main data file then winds up looking something like this: A journey of 1000 miles must begin with a single step. The moving finger writes; and, having writ, Moves on: nor all thy piety nor wit Shall lure it back to cancel half a line. From listening comes wisdom, and from speaking repentance. Man go to bed with itchy bum; man wake up with smelly finger. ... Notice that there is no indication in the file where one item ends and the next begins. All that information is contained in the index. More interesting as an example is the program that adds new fortune cookies to the file and the index. We suppose the new item is contained in a file by itself, whose name is to be supplied to the update program. We need to obtain the size of the data file in order to know the offset of the new item and the size of the item; these two pieces of information will be appended to the index. We are going to assume now that the index file does not contain embedded delimiter characters. This will help [to] reduce its size a little, but [will] preclude us from using an ordinary line editor to access it. Here is the update program: /* add to fortune cookie database */ PARSE ARG new_cookie . datafile = 'fortune.cookies' indexfile = 'fortune.cookie.index' lrecl = 14 item_size = STREAM(new_cookie, 'c', 'query size') item_offset = STREAM(datafile, 'c', 'query size') + 1 index_size = STREAM(indexfile, 'c', 'query size') index_record = RIGHT(item_offset, 9) ||, RIGHT(item_size, 5) CALL CHAROUT indexfile, index_record, index_size + 1 CALL CHAROUT indexfile CALL CHAROUT datafile, , item_offset DO WHILE LINES(new_cookie) > 0 CALL LINEOUT datafile, LINEIN(new_cookie) END CALL LINEOUT datafile CALL LINEOUT new_cookie We had to use CHAROUT() in this example, instead of LINEOUT(), to add the index record to the index file in order not to have line delimiter characters inserted. Also, we were very careful to use CHAROUT() to specify an offset in both the index and data files at which writing is to begin. This may be redundant, since any reasonable REXX implementation will set the write pointer initially to the end of an existing file. This is so that data will be appended to the end instead of overwriting the beginning of the file. But it's better to be safe than sorry. There's another approach to handling this particular application [that] we should mention. The main data file could be assumed to contain explicit separators between each item, perhaps a short string of asterisks. The data file could be maintained entirely with a text editor, and the person who maintains it would manually add a separator every time a new item is added. Then a REXX program could be written that reads through the main data file and builds an entire index each time. Although this approach entails a great deal of extra processing every time an addition is made, it does have the advantage that the index can always be reconstructed easily if it gets damaged. The details of implementing this method are left as an exercise for the reader. @endnode @node Inpu10 "EXAMPLE: WRITING \"FILTERS\" IN REXX" To conclude this chapter, let's look at a different sort of example, a filter. The WORDFIND program in @{"Chapter 3" link Stru} was an example of a filter. We mentioned that this was a type of program which originated in UNIX. A filter reads from a standrad input file and writes to a standard output file. Of course, it can read and write any number of other files, too. Such programs are called filters, because UNIX and other operating systems that support the concept of pipes, a filter program can be inserted between two other programs. The filter reads its input from the output of the first program, and it writes its output to the input of the second program. How this is expressed depends on the specific operating system. In UNIX, MS-DOS, AmigaOS, and OS/2, "|" is the symbol for a pipe from one program to another, so a composite command might be written [as]: first | filter | second (Note that this use of "|" has nothing to do with the REXX use of "|" as the logical or operation. If the composite command were used in a REXX program, the whole thing should be enclosed in quotation marks.) A very common example of a filter is a sort program. If the program called second expects its input to be sorted in some way, but the first program does not sort its output, then inserting sort between them solves the problem: first | sort | second The philosophy of using filters is that each filter program should be as simple as possible. A filter program should read from standard input, perform one elementary operation, and write results to standard output. Filters can then be combined in a large number of ways to perform a wide variety of more complex processing tasks. Filters are easy to write in REXX. We'll take a very simple example here just to illustrate the mechanics. The example just deletes all blank lines from an input file: /* DELBLANK - Delete all blank lines in a file. */ ARG infile outfile DO WHILE LINES(infile) line = LINEIN(infile) IF LINE \\= '' THEN CALL LINEOUT outfile, line END CALL LINEOUT outfile CALL LINEOUT infile If this program is invoked with pipes for both input and output, then it will not actually have any arguments. Therefore, infile and outfile will be null strings. So, when they are used elsewhere in the program, they will refer to the standard input and output files respectively, which is just what we want. That's really all there is to writing a filter. The nice thing about how this works in REXX is that we could supply actual filenames if we wanted, instead of using pipes. A command line like: delblank original_file deblanked_file would work just as well. (The filenames should be different, of course. And if the second file already exists, it will be appended with the results rather than being overwritten.) And just for a little more redundancy, we could have used operating system redirection notation to accomplish the same thing: delblank deblanked_file This notation, as used in OS/2, MS-DOS, AmigaOS, and UNIX, means that original_file is to be the standard input and deblanked_file is to be the standard output. Again, the REXX program will work with null strings rather than the actual filenames, but the result will be the same. @endnode @node Queu "10: THE EXTERNAL DATA QUEUE" The external data queue is a curious hybrid. It is a concept that belongs partly to REXX and partly to the operating sytsem. Its function is partly I/O, partly interprocess communication, and partly "other". Conceptually, you may think of the queue (as we'll call it for short) as a temporary string storage area. Even the metaphors used to describe operations to the queue are a bit mixed. In the horizontal queue metaphor, strings can be added at the front or the back of the queue, but they can be removed only from the front. Sometimes a vertical metaphor is employed, and the queue is called a stack. In these terms, it is like a pushdown stack, and strings can be added to the top or at the bottom, but they can be removed only from the top. The command names for queue operations are derived from both metaphors. A string is added to the top (front) of the queue with the PUSH instruction and to the end (bottom) with the QUEUE instruction. Don't let the mixing of metaphors throw you. It's the same thing either way you look at it. Strings can be removed from the top (front) of the queue with the PULL instruction. (Since there's only one removal operation, REXX decided to be noncomittal as to metaphors, so it is not called either POP or DEQUEUE.) PULL is really shorthand for the instruction PARSE UPPER PULL. Therefore it can use a general PARSE template to parse the retrieved string into separate variables. Unfortunately, because of the UPPER option, it mangles strings into uppercase, so you may wind up using the more verbose PARSE PULL most of the time, unless you don't care whether strings come out of the queue in the same form they went in. QUEUE is a first-in, first-out operation (FIFO), in the sense that if you repeatedly QUEUE strings at the end of the queue, they will be removed by PULL in the same order as they were added by QUEUE. PUSH, on the other hand, is a last-in, first-out operation (LIFO), in that if you repeatedly PUSH strings on top of the queue, then they will be retrieved in the exact reverse order in which they were added by PUSH. If you mix QUEUE and PUSH, it's a little harder to keep track, but not too bad, since in either case new strings are added only at either one end of the queue or [at] the other, never in the middle. Strings in the queue generally retain their identity. That is, regardless of the lengths of the strings or their contents, they are removed from the queue one at a time in exactly the same form in which they were added. They are not combined or concatenated as a result of being added to the queue. To complete the roster of REXX facilities for working with the queue there is one built-in function, QUEUED(), which takes no arguments and returns the number of strings currently in the queue. @{" USAGE OF THE QUEUE " link Queu1} @{" RELATION OF THE EXTERNAL DATA QUEUE AND THE STANDARD " link Queu2} @{" INPUT STREAM " link Queu2} @{" RELATION OF THE EXTERNAL DATA QUEUE AND THE OPERATING " link Queu3} @{" SYSTEM " link Queu3} @endnode @node Queu1 "USAGE OF THE QUEUE" The queue can be employed for a wide variety of purposes. It is very commonly sued as a way to pass data to and from subroutines. Although data is usually passed to a subroutine through the arguments of a routine, there may be implementation limits on how much data can be passed this way. For instance, REXX in CMS allows only 10 arguments to any subroutine. Other implementations may not have such draconian limits, but still impose some because of limits on the size of single clauses. The stack, on the other hand, is usually limited only by the amount of available memory. So, if you want to pass all the text stored in lines of the array text. to a subroutine, you might use: DO i = 1 TO n QUEUE text.i END CALL text_routine n ... text_routine: PROCEDURE DO i = 1 TO ARG(1) PARSE PULL line /* process "line" */ ... END RETURN This avoids limitations on how much can be passed through arguments to a procedure, since only the number of strings needs to be passed. But it also saves a lot of typing. Imagine the effort involved in typing out all the arguments you would need if there were 100 lines of text.: CALL text_routine text.1, text.2, text.3, text.4,, /* etc.! */ Also, there isn't any way in REXX to code a single subroutine call with a varying number of arguments - you would have to pass the maximum number every time. The stack is at least as useful for returning data from subroutines. REXX definitely limits a subroutine to at most one return value. If you need to provide more, whether two or a very large number, the queue is a good way to do it. For instance, suppose you want to have a subroutine that returns a list of filenames. Perhaps it is a list of all filenames which match some name pattern involving wildcard characters. Then a routine like: get_file_names: PROCEDURE PARSE ARG pattern name = get_first_name(pattern) DO i = 1 BY 1 while name \\= '' QUEUE name name = get_next_name() END RETURN i - 1 might do it. The subroutine returns the number of names found as its value, but the names themselves are returned in the queue. (get_first_name() and get_next_name() are hypothetical, lower-level procedures for reading the file directory and retrieving one name at a time that matches the pattern. We should note that there is one obvious alternative to the use of the queue for passing a large number of strings into and out of subroutines. That alternative is a compound variable array. In this approach, you assign all inputs or outputs to successive elements of the array. You can pass the array name (or names) to the subroutine, but there is a little awardness in that you need to use the VALUE() function to read and write elements of the array when the name of the stem is passed as an argument. This is avoided if you can use some conentional names for the input and output stems, perhaps ARGS. and RESULTS.. Indeed, compound variable arrays have a lot in common with the queue for handling simple lists of strings. Compound variables actually provide a more powerful tool, since they permit random access rather than just access to the front element in the queue. A queue could be fully simulated with a compound variable, but it would be extra work to keep track of the indices of the front and back elements of the queue. So the queue is more easily used when just a simple access pattern is required and you don't want to both making up a new array name. Also, when the subroutines being called are external, the queue must be used for passing data, since external REXX procedures can't share variables. This comparison of the relative merits of compound variables and the queue points up one aspect [that] you need to be careful about when using the queue. That is, there is only one queue. Indeed, it is called the external data queue in part because it is external to any single REXX program. Within a nested set of external REXX procedures that call each other, there is just one queue which is the same for all of them. This is a great advantage in that the queue can be used, as above, to pass data into and out of subroutines (an advantage not shared by compound variables). But there is the potential disadvantage that the queue already contains data placed there by one routine at the time another routine needs to use the queue for a different purpose. Dealing with this problem is one of the reasons you might use the queue in a LIFO manner with PUSH instead of QUEUE. Using PUSH is somewhat harder, because it often requires you to do processing backwards. But it provides a way to share the one external data queue for several purposes. The earlier example of passing strings to a subroutine could easily be rewritten as follows to use PUSH instead of QUEUE: DO i = n TO 1 BY -1 PUSH text.i END CALL text_routine n ... text_routine: PROCEDURE DO i = 1 TO ARG(1) PARSE PULL line /* process "line" */ ... END RETURN All that needed to change was to run the loop index backwards. The subroutine retrieves the strings in the same order as before. (We presume that the order was important.) The difference here is that data is added only to the front of the queue, so that any strings already present will not be disturbed - as long as we are careful to read no more than were added. This problem of having to be careful about data already present in the queue is one reason for not using the QUEUED() function to take a certain shortcut. You might be tempted to think that it is not necessary to pass separately the number of strings passed into or out of a subroutine via the queue. After all, the subroutine could simply call QUEUED() to determine how many strings there were. But this approach gets into trouble if unrelated data has already been placed in the queue. Nevertheless, QUEUED() can be used to let you program defensively in using the queue. Consider the case of a subroutine which places information in the queue (LIFO), and the subroutine must return some value other than the number of strings added to the queue, a return code perhaps. You can still determine how many strings were added this way: old_count = QUEUED() IF queue_sub() = 0 THEN DO /* success */ count = QUEUED() - old_count DO i = 1 TO count /* process queue items */ ... END END @endnode @node Queu2 "RELATION OF THE EXTERNAL DATA QUEUE AND THE STANDARD INPUT STREAM" PULL and PARSE PULL are not purely queue access instructions. They are defined so that if the queue happens to be empty, then they will take a line from the standard input stream - and they will wait if none is available (eg. when reading from the keyboard). The reason for this is that the queue can be viewed as a surrogate for the standard input stream, if you think of PULL as primarily an I/O instruction rather than a queue instruction. In fact, if you consistently write REXX programs so that they use PULL or PARSE PULL for input, then you can at any time decide to call the same program from another REXX program, and provide through the queue some or all of the input [that] it requires. Perhaps, for instance, you have an external program called MOVEFILES which asks interactively for a list of filenames and a destination. It might start like this: /* file mover */ SAY "Enter names to move, end with null line." DO i = 1 BY 1 PARSE PULL name.i IF name.i = '' THEN LEAVE END SAY "Enter destination of move." PARSE PULL destination ... Then you could run this from the system command line, and it would prompt you for the information it needs. Or you could call it from another REXX program and supply all the information ahead of time: PUSH new_directory_name PUSH DO i = n TO 1 BY -1 PUSH file_name.i END CALL movefiles Notice that we have added to the queue LIFO with PUSH, as a precaution against the possibility that the queue already contains data. PUSH with no string specified put a null line into the queue at the appropriate place to end the list of filenames. In this case, the way MOVEFILES was written allows it to be called from another REXX program without having to disturb data already in the queue. If you regularly write REXX programs that us PULL or PARSE PULL to do input, you should be cautious about calling such programs from others that may use the stack for something else. PULL and PARSE PULL are the only REXX input facilities that use the queue and standard input stream together in this way. Everything else that does input (CHARIN(), LINEIN(), PARSE LINEIN, and interactive tracing) reads only from the standard input stream and ignores the queue. @endnode @node Queu3 "RELATION OF THE EXTERNAL DATA QUEUE AND THE OPERATING SYSTEM" In the VM/CMS operating system, where REXX originated, the external data queue is an integral part of the operating system rather than exclusively a REXX feature. This means that it is possible for programs written in any language to read and write to the queue in the same way that REXX does. Furthermore, the standard system input function behaves like PARSE PULL in that it will take a line from the front of the queue, if there is any, before reading from the keyboard. As a result, when most CMS programs are run from REXX they can have their input supplied through the queue without any special action on the program's part. Any program can also add to the queue in the way PUSH and QUEUE do, but this requires explicit programming. Many CMS utilities have command-line options to tell them to place their output into the stack instead of writing it to the screen. These utilities can then be used from REXX and their output interpreted by the REXX program in order to automate many procedures. However, because of the kind of confusion that can arise when the queue is used for several purposes simultaneously, CMS utilities have gradually added the ability to write their output directly to REXX variables. In addition, CMS provides extra capabilities in the queue that help [to] alleviate such contention problems. Primarily it adds the concepts of separate buffers in the stack. When a new buffer is created, strings added by PUSH and QUEUE go only into that buffer, as if it were the entire queue. This solves an error-handling problem that arises frequently with the queue. Namely, if a program wants to terminate prematurely because it has encountered some error condition, it is a very highly recommended practice to remove any data that may have been placed on the queue. A command is provided specifically for the purpose of deleting only the most recently created buffer in the queue, instead of the whole thing (which would also be an antisocial form of program behaviour). This is most important in systems like CMS which funnel most terminal input through the queue. Otherwise, orphan lines left in the queue when a program terminates unexpectedly can be read by the operating system and treated erroneously as system commands. Some implementations of REXX take the queue idea even further. Personal REXX for MS-DOS allows arbitrary keystrokes and time delays to be inserted into the queue, in addition to whole lines. This caters to the MS-DOS environment, since many programs make heavy use of special keystrokes rather than verbose commands. Personal REXX also allows the queue to be treated as a write-only device to which command output can be written by redirection. This permits retrieval of program output even from programs which have not been designed to write to the queue. A similar concept is used in OS/2, where there is a command (RXQUEUE) which copies its standard input stream to the queue (FIFO or LIFO). Piping output to RXQUEUE from another command then allows REXX programs to process it out of the queue. On the other hand, OS/2 does not have a general implementation of the queue that permits it to be a form of surrogate keyboard input to programs written in languages other than REXX. @endnode @node Exce "11: EXCEPTION HANDLING" We generally think of a program as a sequence of instructions which flow smoothly and sequentially from one to the next, unless the sequence is explicitly altered in accordance with the rules of one of a small number of well-defined control structures (IF, DO, SELECT, CALL, etc.). However, it has been found that this simple model is somewhat lacking when we think about error handling. And we should think about error handling. REXX is often used for small, one-shot, "quick and dirty" programs with a limited purpose. But in all cases except these, we want our programs to be as robust as possible - and the more we use them, the more robust we expect them to be. Robustness means ideally that a program never fails to produce the desired results. Short of the ideal, however, we still should expect that a program will not fail without at least a comprehensible error message, and that a failure will never cause irreversible damage. So we have to think about error handling in order to build robust programs. Because errors do occur, for reasons completely outside of a program's control, as well as for errors in the logic of a program itself (bugs). In fact, we may well prefer to employ the euphemism @{i}exceptional conditions@{ui} rather than errors - meaning [that] any conditions not foreseen in detail by the program. Such conditions are often not really errors, but they do need to be allowed for by a robust program. It is often observed that in good quality robust software, 90% or more of the code may actually be concerned with handling exceptional conditions in some way or another. Error handling often calls for nonsequential flow of control. The primary reason for this is that errors can occur in such a wide variety of places in a program. If a program were to check for errors [in] every place they could occur, the program itself would be overwhelmed by error checking. In order to overcome this problem without sacrificing robustness, REXX has adopted the position of treating errors as if they were actually asynchronous events, that is, events generated by unpredictable causes outside of a program. Such events are called conditions, and REXX allows code to be executed out of the normal sequence when particular conditions occur. Let's consider briefly the sorts of errors we need to contend with. I/O errors are among the most typical. Many kinds of errors can occur with I/O. External devices like disk drives can fail or simply be not ready (have no disk loaded). Printers can run out of paper. Magnetic media can be defective. Disk space may become filled, and so forth. But clearly it is very tedious to test every I/O operation within a program for any error at all, let alone [for] each error that can possibly occur. Another common source of errors which is fairly unique to REXX is that external programs invoked from REXX can malfunction or fail for a wide variety of reasons. The reasons are often operational in nature, such as failure to find a required file or insufficient memory. Or the operating system may have been unable to find or run the external program. Again, the locations where such errors can occur are numerous, and checking for each possible problem becomes prohibitively expensive. Finally, errors can occur because of bad data input to the program - data that is out of the expected range or simply invalid. And all of these possible error sources are in addition to programming logic errors in a narrow sense such as misspelled variable names or invalid syntax. (Since REXX has an INTERPRET instruction, syntactic errors can result from incorrect input and be impossible to detect before run-time.) Now, even though most of these errors can occur only at certain specific locations within a program, the number of such locations may be very large. And the same sorts of tests need to be performed in each appropriate location. So a great deal of duplicated code can be eliminated if we simply provide one place to handle each type of error and have that code invoked out of sequence whenever the corresponding error occurs. This same mechanism can handle genuine asynchronous events as well, of course. The one case where REXX does this is where an interactive user of a program chooses to terminate it while it is running, perhaps because it is in an infinite loop. (Just how this decision is indicated by the user depends on the particular operating system involved [but it is often Ctrl-C]. Even though the program should be ended as soon as possible, it is often desirable to perform some cleanup functions before stopping. REXX formally recognizes six types of events as conditions: ERROR When a command to an external environment terminates with an indication that it encountered an error, the ERROR condition is raised. The command may have been issued either directly or with the ADDRESS instruction. Command errors are usually indicated by means of return codes, as discussed in @{"Chapter 6" link Exte}. FAILURE When a command to an external environment cannot be executed at all, the FAILURE condition is raised. The command may not be executable for a variety of reasons, such as [that] it could not be found or [that] there was not enough memory to start it. The command may have been issued either directly or with the ADDRESS instruction. HALT The HALT condition is raised when an interactive user of the program requests the operating system to stop the program. Depending on [the] capabilities of the operating system, this request may also be made by the operating system itself or [by] another running program. NOVALUE The NOVALUE condition is raised when a symbol that is a valid variable name is used in certain contexts but the variable [that] it names has not been initialized. The contexts in which this can occur are [in] expressions, in PARSE VAR, or in a variable reference. (A variable reference is a variable name enclosed in parentheses, as can be used in a PARSE template, the PROCEDURE instruction, and the DROP instruction.) An uninitialized variable named in the VALUE() built-in function or used in the tail of a compound symbol does not, by itself, raise the NOVALUE condition. NOTREADY An I/O error that occurs in an I/O built-in function or the SAY, PARSE LINEIN, or PARSE PULL instruction will raise the NOTREADY condition. Attempts to read beyond the end of a file also raise NOTREADY. SYNTAX The SYNTAX condition can be raised by a wide variety of errors in the processing of a program. There are specific error numbers and (usually) standard messages which are associated with each such error. Many of these errors are truly syntactic, such as invalid expressions. But many others are nonsyntactic in nature, such as a variable with a nonnumeric value used in an arithmetic expression or the inability to find an external procedure. @{" ENABLING AND DISABLING CONDITION HANDLING " link Exce1} @{" USING TYPE 1 CONDITION HANDLERS " link Exce2} @{" USING TYPE 2 CONDITION HANDLERS " link Exce3} @{" THE CONDITION() FUNCTION " link Exce4} @endnode @node Exce1 "ENABLING AND DISABLING CONDITION HANDLING" When a REXX program begins, all conditions have no program-defined handlers for them. Such conditions are said to be disabled. This doesn't mean [that] they cannot occur, only that condition handlers other than the REXX defaults will not be invoked. The default handlers for ERROR, NOTREADY, and NOVALUE simply ignore the condition. In effect, these conditions are by default treated as if they do not occur. The default handler for FAILURE immediately raises the ERROR condition (which is then ignored if it is disabled). By contrast, however, the default handlers for HALT and SYNTAX immediately terminate the program, issue a message, and cause a return to the caller with a return code that indicates which error occurred. Program-defined handlers for each condition may be [of] one or two possible types. For simplicity we will refer to them as Type 1 and Type 2: Type 1 condition handlers: * Can be specified for any type of condition. * Are enabled with a SIGNAL ON instruction. * Are disabled as soon as the condition occurs. They must be reenabled in order to be used again. * Automatically terminate any active DO, IF, SELECT, or INTERPRET instruction. They do @{i}not@{ui} terminate the active procedure. * Permanently alter the sequence of execution. It is not possible to return to the point where the condition was raised. Type 2 condition handlers: * Can be specified for [the] ERROR, FAILURE, HALT, and NOTREADY conditions, but not [for] NOVALUE and SYNTAX. * Are enabled with a CALL ON instruction. * Are not disabled when the condition occurs, but are placed in a special @{i}delayed@{ui} state. The precise handling that occurs when another condition of the same type is raised while the condition is in the delayed state depends on the condition. * Do not terminate active instructions or the current procedure. * Do not permanently alter the sequence of execution. A RETURN instruction executed in the condition handler causes execution to resume at the point where the condition was raised. A Type 1 condition handler corresponds to the SIGNAL instruction, and a Type 2 condition handler corresponds to the CALL instruction. For instance, a Type 1 handler for the ERROR condition is invoked as if the instruction: SIGNAL ERROR were issued at the point [where] ERROR is raised. A Type 2 condition handler is invoked, instead, as if the instruction: CALL ERROR were issued. Notice that the type of the condition handler depends on how it has been enabled, which controls how the handler is invoked. The start of the condition handler is the first clause after a label which corresponds either to the name specified in the CALL ON or SIGNAL ON instruction or else to the name of the condition. For example, the handler for the ERROR condition would normally follow the (first occurrence of the) label ERROR:. Notice that it is quite possible for a condition handler to be enabled, but not actually defined. That is you could have: SIGNAL ON NOVALUE in your program but no NOVALUE: label anywhere. If the NOVALUE condition is ever actually raised, the SYNTAX condition will be raised immediately afterwards because the label is not found. If the SYNTAX condition has not been enabled, the program will then be terminated, since that is what the default REXX handler does. But if SYNTAX has been enabled and if there is actually a handler for it in the program, any appropriate action can be taken. In practice, one usually uses SIGNAL ON NOVALUE simply to catch the use of uninitialized variables quickly, and termination of the program with an error message that indicates the line where the error occurred is all that is wanted. If you wish to explicitly disable a condition handler, you can do so with either the SIGNAL OFF or the CALL OFF instruction. It does not matter which type of condition handler is involved. So: SIGNAL OFF NOVALUE disables any program-defined handling of the NOVALUE condition within the current procedure. It restores handling of the NOVALUE condition to the REXX default, which is to ignore the condition. If you read the description of signal handling in @{i}The REXX Language@{ui}, you may find it a little confusing. The reason is that certain terms are used somewhat loosely and certain facts about the sequence of events in the handling of a condition are not made clear. For instance, when the term @{i}trapped@{ui} is used, it seems to mean variously that a handler has been defined for a condition, that the events defining the condition have occurred, or that the handler for the condition has been invoked. We will try to be a little more precise. When we say that a condition has been @{i}enabled@{ui} we shall mean that a user- defined handler for the condition has been specified with the SIGNAL ON or CALL ON instruction. As has already been pointed out, a condition can still occur even if it has not been enabled, because REXX default handlers are always defined. When the circumstances which define the condition are first noticed by the REXX run-time system, we shall say that the condition has been @{i}raised@{ui}. Finally, when the condition handler is actually invoked, we shall say that the condition has been @{i}trapped@{ui}. Most of the time, a condition is trapped immediately after it has been raised. In particular, the language specifies that this is the case when a Type 1 condition handler has been enabled, as long as the label identifying the handler exists in the program. On the other hand, when a Type 2 handler has been enabled, REXX specifies that the condition may not be trapped until the end of a clause, which could be some time after when it is raised. We shall say that a condition is @{i}pending@{ui} during the time between when it is raised and the time it is trapped. Later, in the discussion of Type 2 handlers, we shall mention some problems that arise with pending conditions. @endnode @node Exce2 "USING TYPE 1 CONDITION HANDLERS" A Type 1 condition handler is enabled with the SIGNAL ON command, which has the form: SIGNAL ON condition [NAME handler] condition is the name of the condition to be trapped. handler is a symbol which specifies the label to which control will be passed if the condition is raised. By default, handler is the same as the condition name. But it could be any other symbol. So, though you cannot have more than one handler active for a given condition at any one time, you can switch easily among a number of different handlers as required. The label that actually identifies the handler may occur anywhere a label is allowed. As with labels on procedures, only the first occurrence of the label within the program can be used for a condition handler. If the condition is raised and the label is not found, the SYNTAX condition will be raised. The state of a condition's being enabled or not and the name of the handler (if any) are inherited by any internal procedures that are called. A procedure may change the handling of any condition, but the state of the condition is returned to what it was when the procedure returns to its caller, just as REXX treats other state information when internal procedures are called. In particular, any time it is important to handle some condition in a special way when a certain procedure is invoked, the procedure should give a SIGNAL ON instruction that specifies its own preferred handler. Then, regardless of where the handler is actually located, it will be invoked when the condition is raised while the procedure is active. It is very important to note that a Type 1 condition handler is invoked by the SIGNAL instruction, so it does not terminate the active procedure. (Though it does terminate all active DO, SELECT, IF, and INTERPRET instructions.) This means that if the condition is trapped in a deeply nested subroutine, especially one that has been called recursively, it can be tricky to get back to a predetermined location higher up in the calling sequence. The most common use of a Type 1 signal handler is probably to provide diagnostics in the event of a SYNTAX error. Here is an example of a simple handler for the SYNTAX condition: [1] SYNTAX: [2] SAY 'REXX error' RC '('ERRORTEXT(RC)||, [3] ') occurred in line' SIGL'.' [4] IF SOURCELINE() > 0 THEN [5] SAY '=====>' SOURCELINE(SIGL) [6] SIGNAL ON SYNTAX [7] SIGNAL RESTART This example illustrates several REXX features that are useful in dealing with conditions. The first line after the label provides an error message which is much like the one REXX would issue if there were no SYNTAX condition handler. REXX sets the special variable RC to the number of the error which occurred. Error numbers are more or less standardized in REXX, so that you can usually depend on being able to tell fairly well what sort of error occurred based on the value placed in RC. For instance, error number 5 is associated with the message "Machine resources exhausted", which means that the REXX language processor ran out of memory. This might be because the program had an error and was in a loop creating new variables. But in other cirumstances, it might not indicate an error, but merely a normal (though annoying) operational difficulty with a program that has large memory requirements. At the same time, the SIGL special variable is displayed, because REXX has set it to the number of the line in the source program that wa being executed when the SYNTAX condition was raised. (SIGL is always set when a SIGNAL or CALL instruction is executed.) This value is used later in order to display the actual line of source code involved. Another feature, the ERRORTEXT() built-in function, has been used in the first line of the handler. This function displays the error message associated with the number of the error that occurred. The second line of the handler [4] contains a call to SOURCELINE() with no arguments. This returns the number of lines of source code in the program. It illustrates another case of defensive programming. Some implementations of REXX (usually compilers) do not have access to the program source code in order to display them with SOURCELINE(). In this case, the function called with no arguments should return 0 to indicate that source code is not available. If source is available, the example displays the line of code that caused the condition to be raised. At this point, the handler has done only what REXX's default SYNTAX condition handler would do. Presumably a special handler was used to add some additional capability. Usually this is simply to afford an opportunity for the program to continue execution if it chooses to do so, instead of being terminated, which is the default action. Since REXX programs may use the INTERPRET instruction to execute other instructions, which might be based on expressions or data supplied interactively by a user, it can make sense for the program to wish to continue even when a seemingly severe error has occurred. If the program is in fact going to continue, the next step should be, as in the next to last line of the example [6], to issue another SIGNAL ON SYNTAX instruction. This is because REXX has automatically disabled handling [of] the SYNTAX condition at the time it was raised. This is always done for Type 1 condition handlers, in order to reduce the possibility of an infinite loop should the handler itself do something to raise the condition again. The final step is to use an ordinary SIGNAL instruction to transfer control back to some known location in the program so that it can proceed. We want to stress again that a condition handler like this is most easily used for conditions that are raised in the top-level (main) procedure of a program. Otherwise, it is necessary to provide additional logic (perhaps in the form of variables which indicate what the program was doing) in order to restart in a procedure somewhere above the one which was active when the condition was raised. @endnode @node Exce3 "USING TYPE 2 CONDITION HANDLERS" A Type 2 condition handler is enabled with the CALL ON command, which has the form: CALL ON condition [NAME handler] condition is the name of the condition to be trapped. handler is a symbol which specifies the label to which control will be passed if the condition is raised. By default, handler is the saame as the condition name. But it could be any other symbol. The SYNTAX and NOVALUE conditions cannot be handled with a Type 2 handler. This is because they can occur in the middle of expressions and it is probably not meaningful to continue after the point of failure for these conditions. As with a Type 1 handler, the first label in the program that matches the name of the handler (ie. usually the name of the condition) is invoked when the condition is trapped. It is invoked with a CALL instruction rather than a SIGNAL, so that the handler can return to the point at which the condition was raised. In fact, the handler generally must use the RETURN instruction when it is done (if it doesn't use EXIT). This is because since the handler is executed as a subroutine, use of SIGNAL does not terminate the subroutine. The handler should not specify a value on the RETURN instruction, since it will be ignored and not assigned to the RESULT variable. However, all other conventions of subroutine invocation are observed when a Type 2 handler is called. In particular, the state of the program is saved on entry to the subroutine and [is] inheritted by it. But any changes that the subroutine makes to the state persist only until the subroutine returns. (See @{"Chapter 5" link Subr} for a full explanation.) Therefore, the handler cannot make permanent changes to the state of the condition it is handling, or to any other for that matter. Consequently, after the handler returns the condition is still enabled, which is unlike the situation with a Type 1 handler where the condition has to be explitly enabled again. Another difference from a Type 1 handler is that a Type 2 handler does not terminate active DO, IF, SELECT, or INTERPRET instructions. When a Type 2 handler returns, execution is resumed at the point immediately after the condition was raised. This means that it is not possibly to retry the operation that failed. For instance, in the case of an ERROR or FAILURE condition, execution will resume with the statement following the command. If your program wants to reissue the command, it must do so in the handler itself. A subtle point about all of the conditions that may be handled with a Type 2 handler is that they can be trapped only at a clause boundary. This is obvious for ERROR and FAILURE, since a command to an external environment is a clause by itself. It is less obvious for HALT and NOTREADY. Nevertheless, in order to permit orderly resumption after a Type 2 handler is invoked and returns, REXX specifically provides that the HALT and NOTREADY conditions can be trapped only at clauses boundaries. In the case of I/O functions, in which the NOTREADY condition can be raised, it should be noted that the function will always return a well- defined result even if an error is encountered. For instance, CHAROUT() will return the number of characters that were not successfully written. In this way, expressions involving the I/O built-in functions will still have well-defined values even if an error occurs. REXX will trap the NOTREADY condition only after the expression is fully evaluated and the end of the clause is reached. This is a consideration mainly in DO, IF, and SELECT instructions which may involve multiple expressisons in a singgle clause, or multiple clauses in the instruction. Here is a simple example: CALL ON NOTREADY ... IF CHAROUT (output_file, output_string) > 0 THEN DO SAY 'Attempted to write' output_string SAY 'Unable to continue output.' RETURN END ... NOTREADY: SAY 'Error writing' condition('d') RETURN In this example, if an error occurs in the call to CHAROUT() the function returns with a nonzero value, which is compared to 0. The end of the clause is just before THEN. Only at that point will the NOTREADY handler be invoked. The handler will display a simple message, using the CONDITION() built-in function to determine the name of the file that was in use. (This function is described in the next section.) Upon return from the handler, execution continues with the DO group, which issues additional messages and returns. It turns out that only the HALT and NOTREADY conditions may possibly be trapped at a later time than they are actually raised, if a Type 2 handler has been enabled. So the question arises as to what happens if one of these conditions occurs again while one is pending. In principle, any number of NOTREADY conditions could occur during the execution of one clause. For instance, the clause might contain an expression involving a number of I/O functions. REXX guarantees that only the first NOTREADY condition that is raised will actually be trapped at the end of the clause. This is because the condition is put in a special delayed state while it is pending, and any other NOTREADY conditions that may subsequently occur in the clause are simply ignored. Though this may seem like a fine point, you should be aware of it if you write a program that depends on trapping NOTREADY conditions. You can use Type 2 handlers for purpooses other than error handling per se. For instance, most operating systems permit a user to generate some sort of a signal to interrupt a program. Normally this signal is used to terminate the program (if it seems to be in an [infinite] loop, for example). REXX recognizes this signal as a HALT condition and allows it to be trapped. You can then use the signal simply as an opportunity to provide information about the progress of a long-running computation, and (perhaps) allow the user to decide whether or not to continue. Here is a HALT condition handler to do that: HALT: SAY cases 'cases out of' total 'have been processed.' SAY 'Do you wish to continue?' PULL reply IF \\ABBREV('NO', reply, 1) THEN RETURN ELSE DO SAY 'Processing terminated.' EXIT END In this example, the variable cases is assumed to be maintained as the number of cases [that have been] completely processed. Unless the user types n or no, the program can continue as if nothing had happened, because the handler returns to the exact location nat which the HALT condition was raised and no program state information has been changed. After a condition occurs and before or during the execution of a Type 2 handler, the condition is in a delayed state, which is between enabled and disabled. This additional state is provided to minimize the chances of an infinite loop of errors. The provision of a delayed state for conditions means that REXX does not have to go so far as to completely disable the condition in order to prevent possible loops. A condition normally occurs while it is in a delayed state only if it occurs in a Type 2 handler for the condition. This is because the delayed state reverts to the normal enabled state when the condition handler returns. If the condition does occur while it is in a delayed state, then it will simply be ignored if it is an ERROR, FAILURE, or NOTREADY condition. In effect, the condition is disabled from the time it occurs until the Type 2 handler returns. If a user causes a second HALT signal to be generate while the HALT condition is in the delayed state, then raising of the condition will simply remain pending until the handler returns. It is relatively safe to do this, since HALT conditions arise from circumstances outside [of] the program and are unlikely to lead to an infinite loop. The delayued state of a condition can also be changed if a CALL ON or SIGNAL ON instruction is executed in the handler. If this is done while a second interrupt happens to be pending, the condition is raised immediately, and control returns to the beginning of the handler, via CALL or SIGNAL, as appropriate. Also, if CALL OFF or SIGNAL OFF is used in the handler, the state of the condition changes from delayed to disabled. Therefore, should another condition of the same type occur, the default REXX action for the condition will be taken. (For HALT, the only condition for which this can occur, that action is to terminate the program.) @endnode @node Exce4 "THE CONDITION() FUNCTION" Additional information about trapped conditions is available to both types of condition handlers with the CONDITION() built-in function. It can identify the name of the current trapped condition, the instruction that invoked the condition handler (SIGNAL or CALL), the state of the condition (enalbed, disabled, or delayed), and [it can] also provide a short descriptive string providing further details about the condition. Note that the CONDITION() function reports information only for the current trapped condition. If no conditions have been raised within the program, CONDITION() will return a null string. Also, it cannot tell you the state (on, off, or delayed) for conditions other than the one [which is] currently pending or trapped. After a condition has been trapped by a Type 1 handler, CONDITION() will continue to report information about the condition until the next one occurs or the active procedure returns to its caller. In the case of a Type 2 handler, however, CONDITION() is applicable only until the handler issues a RETURN instruction to return to the place where the condition was raised. The syntax of CONDITION() is: CONDITION([option]) If option is not specified, it defaults to 'I'. Otherwise it should be one of the following, to specify what sort of information is needed: 'C' indicates the name of the trapped @{i}C@{ui}ondition. 'D' indicates the @{i}D@{ui}escription of the trapped condition. This may be a null string if no description is available. Otherwise it varies depending on the type of condition: ERROR: the command string which was issued and caused the condition FAILURE: the command string which was issued and caused the condition HALT: extra information provided with the request to terminate the program (if any) NOVALUE: the derived name of the uninitialized variable that caused the condition to be raised NOTREADY: the name of the I/O stream in which an error occurred SYNTAX: addition implementation-dependent information (if any) regarding the error 'I' indicates the @{i}I@{ui}nstruction that invoked the condition handler (either CALL or SIGNAL). 'S' indicates the @{i}S@{ui}tate of handling for theh condition: ON (enabled), OFF (disabled), or DELAYED. Of course, most of this information can be deduced by the program on its own. Usually a given handler is specified for only one possible condition and as either a Type 1 or Type 2 handler. And the state of handling for the condition can be deduced from the type of the handler. But some interesting things may be done with the condition description. In the case of [the] ERROR and FAILURE conditions, the handler can examine the actual command that was issued. It may be determin.ed, for instance, that the operating system simply did not have the right search path for the command, and a new, more appropriate one, may be established. The program could even ask the user for help in finding the command or in otherwise correcting the error. This might be useful in REXX programs that are packaged with other software and are used to install the software. Such program need to be run in very diverse environments and may not always be able to find the commands they need to run. For the NOTREADY condition, having the name of the file that caused the error makes it possible to process a long list of files and keep a log of any in which errors were encountered. This is more easily done in a condition handler than in the main lines of the program if the file is referred to in a large number of places, so that testing each I/O operation is cumbersome. Here's a final example that uses CONDITION() in a FAILURE handler to attempt reexecution of a command that could not be found: FAILURE: ccommand = CONDITION('d') SAY 'Error' RC 'occurred running' command IF RC \\= -3 THEN DO 'Please notify a systems programmer.' EXIT END PARSE VAR command name tail DO FOREVER SAY 'The' name 'program was not found.' SAY 'Enter name of directory for' name, 'or a null string to quit.' PULL directory IF directory = '' THEN EXIT /* try to reexecute command with new directory */ directory||name tail IF RC = 0 THEN RETURN END This example assumes that a return code (RC) of -3 means that the command could not be found. This is the convention used on various systems, including CMS and MS-DOS [but not AmigaOS]. @endnode @node Inte "12: THE INTERPRET INSTRUCTION" In this chapter, more than any other in this book, it would be well to recall the remark of the anonyumous sage: "A language is not worth knowing unless it teaches you to think differently." Efficient and advantageous use of the INTERPRET instruction requires a very new mindset towards programming. Yet, in the appropriate circumstances, and once you get it, many otherwise difficult problems can be solved quickly and efficiently with INTERPRET. INTERPRET offers a capability in REXX that can be found in few other languages. It allows a program to create REXX instructions and execute them dynamically. That is, it permits programs whose instructions are not fully determined until execution time - they ccan vary as the program is run. The following example is one without which no book on REXX is complete. It is usually called REXXTRY, because it allows you to type in one or more REXX instructions at the keyboard and have them executed immediately. Though it is mainly of interest as a quick way of learning REXX interactively by trying out actual REXX code, it can be of use in allowing you to invoke REXX utility services without going to the trouble of creating a program. /* test individual REXX commands */ SAY 'Enter REXX statements:' restart: SIGNAL ON SYNTAX DO FOREVER _ = CHAROUT(, 'REXX>') /* display prompt */ _command = LINEIN() /* read input */ INTERPRET _command END RETURN SYNTAX: SAY 'REXX error' RC '('ERRORTEXT(RC)||, ') occurred.' SAY '=====>' _command SIGNAL restart All of the real action here occurs inside the loop. It simply puts up a prompt, reads a line of input, and uses INTERPRET to execute it. The remainder of the program is a handler for SYNTAX errors, much like the one discussed in @{"Chapter 11" link Exce}. Since typing errors as well as language usage errors are all too easy to make, this is a handy safety net that allows REXXTRY to keep running regardless (almost) of what is entered. A couple [of] remarks about some of the programming decisions made in this example may be helpful. First, CHAROUT() is invoked as a function rather than with the CALL instruction, since we wanted to avoid setting the RESULT variable as a side effect. This way, RESULT is affected only by the command that is interpreted. Second, and more importantly, we used LINEIN() to read input instead of PARSE PULL. This prevents any confusion due to reading data that might get put on the queue with PUSH or QUEUE instructions. You are strongly encouraged to type in this program and try it out. It is an excellent way of seeing exactly what REXX instructions and built-in functions do. This includes, in particular, the INTERPRET instruction itself - there is no restriction on using INTERPRET recursively (though it can be hard to follow what is going on!). The syntax of INTERPRET is: INTERPRET expression expression is any REXX expression, which is first evaluated according to all the normal rules of REXX: substitution of values for symbols, evaluation of string and arithmetic operators, etc. Then the result of that evaluation is executed just as if it were part of the program, so that another level of expression evaluation can occur. For instance, in the sequence: x = 'a + b' a = 1 b = 2 INTERPRET 'SAY' x the expression after INTERPRET evaluates to: SAY a + b and when this itself is executed, substitution and expression evaluation occur again so that the result 3 is finally displayed. Although REXXTRY uses INTERPRET, it is set up so that what you type in is executed just as if it appeared in the program. So, suppose you typed the following lines into REXXTRY: command = 'SAY' varname = 'x' x = 'Hello world!' Then you can try some experiments. If you type: SAY x the program displays: Hello world! just as it would have if the line had occurred in the program. To see how INTERPRET itself works, you can type either: INTERPRET "SAY x" or: INTERPRET command varname and the program again displays: Hello world! because the expression command varname evaluates to SAY x, which is then executed normally. That is, it was subjected to. a second level of interpretation, in which SAY was recognized as a keyword and x as a variable, whose value was substituted into the final result. But if you type: command varname then the program will try to pass the command SAY x to the external environment (where it will probably be rejected as an unknown command). Why? Because the st]ring command varname will be processed just as if it were a line in the program. Since the first token isn't a REXX keyword and the instruction isn't an assignment, the instruction is assumed to be a command. Then substitution of the values of command and varname occurs and an attempt is made to execute the command. If you find this [to be] a little confusing (quite possible!), it is suggested again that you experiment with REXXTRY a little. Or, read further to see some additional examples. @{" RULES FOR INTERPRET " link Inte1} @{" EXAMPLES OF INTERPRET USAGE " link Inte2} @endnode @node Inte1 "RULES FOR INTERPRET" Almost any valid REXX statement can be the object of an INTERPRET() instruction. In fact, almost any sequence of statements separated by semicolons can be INTERPRETed, even an entire DO...END loop. It is, however, required that any complex statements (IF, DO, or SELECT) be complete. Also, LEAVE and ITERATE can only refer to DO loops contained withinn the interpreted sequence of statements. But you could very well have a CALL to a subroutine or a RETURN from one. Labels are also not allow;ed. So if you do use CALL, it must be to a label existing elsewhere in the program. Execution of the interpreted statements occurs within the current program context. That is, all variables are available and have whatever value was last assigned to themn. Variable values can be changed and new variables created. Any such changes persist after the conclusion of INTERPRET. SIGNAL can be used within INTERPRET. It causes a transfer of control just as it normally would, and it also immediately terminates the INTERPRET instruction. The same is true when SIGNAL is used to invoke an enabled condition handler. Handlers can be enabled by the instructions that are INTERPRETed. Indeed, all other instructions that change the state of program execution can be used (ADDRESS, NUMERIC, etc.), and their effects persist after INTERPRET finishes. Implementation of the INTERPRET instruction obviously requires the full capabilities of a REXX interpret at execution time. Therefore, the instruction is often unavailable in compiled implementations of REXX. Because of this, it is a good idea to use INTERPRET sparingly if there is any chance your program will ever have to run in other environments. Use of alternatives like the VALUE() function (where possible) will probably run faster, in addition to being more portable. If use of INTERPRET is unavoidable, it may still be a good idea to test at the beginning of the program whether it is available. You can use code like this to test: SIGNAL ON SYNTAX NAME interpret_check x = 0 INTERPRET 'x = 1' interpret_check: IF x = 0 THEN DO SAY "INTERPRET instruction unavailable!" EXIT END SIGNAL OFF SYNTAX @endnode @node Inte2 "EXAMPLES OF INTERPRET USAGE" In earlier versions of the REXX language INTERPRET was needed for certain things that can now wbe done with the VALUE() function. In particular, if you wanted to pass a stem name to a subroutine and be able to read and write compound variables using that stem, it was convenient - or necessary - to use INTERPRET. For instance, a bubble sort typically has something like: IF x.i >> x.j THEN DO temp = x.j x.j = x.i x.i = temp END to exchange adjacent items if they are out of order. But if we want this to work for an arbitrary stem whose name is passed to the routine, then we need something like: bubble_sort: PARSE ARG stem, size DO n = size TO 2 BY -1 DO i = 1 TO n - 1 j = i + 1 INTERPRET "IF" stem".i >>" stem".j THEN DO;", "temp =" stem".j;", stem".j =" stem".i;", stem".i = temp;", "END" END END RETURN This is a complete sort subroutine which takes two arguments: the name of the array to be sorted and the number of elements. This sort of thing can be confusing to read. The main trick in reading examples like this is to see what is inside a quoted string and what is outside. In this case, all references to stem are outside of quotation marks, so that its value, which was passed as an argument, can be substituted for it. Notice that this example is written with a single INTERPRET instruction rather than one per line. This was necessary in order to have the entire DO...END sequence in the same instruction. It is also more efficient, even if the DO group had not been a consideration. It was necessary to separate clauses with semicolons, since there are no line-ends within the string that is INTERPRETed. But by writing the expression across several continued lines (which causes concatenation), a similar appearance results. Since the VALUE() function can be used to assign values as well as [to] retrieve them, this could be rewritten: bubble_sort: PARSE ARG stem, size DO n = size TO 2 BY -1 DO i = 1 TO n - 1 j = i + 1 IF VALUE(stem'.i') >> VALUE(stem'.j') THEN CALL VALUE stem'.i',, VALUE(stem'.j', stem'.i') END END END RETURN Here we have used the fact that VALUE() returns the current value of its first argument before reassigning it. This approach is more efficient, and works even with a REXX compiler that doesn't support INTERPRET. However, if you have a version of REXX in which VALUE() can't do this, INTERPRET is the only alternative. Probably the most common circumstance in which INTERPRET is hard to avoid using involves using variable subroutine names. Taking the sorting example a little further, it is common to want to sort elements of an array on some basis other than simple string comparison. For instance, the array might consist of indices into another array whwich is really the thing that is to be sorted. That is, we wish to say x.i is "less than" x.j just in case: ii = x.i jj = x.j real_array.ii << real_array.jj The way this is handled in full generality is to pass to the sorting routine the name of another routine which will perform the comparison any way it likes. This routine, in other words, defines the ordering relation. In other languages, like C, it is very common to pass the names of functions (ie. pointers to them) to other routines so that the programmer has control over which function is to be called at any particular time. This idea is used even more heavily in object-oriented languages like C++ which use methods or member functions associated with objects in order to provide customized object behaviour. The only way to do this kind of thing in REXX is to use INTERPRET, because the CALL instruction (or function reference) treats the procedure name to be called as a literal. So, let's assume a third argument is passed to the sorting routine and gives the name of the comparison function. The comparison function in turn takes two arguments. It returns -1 if the first argument is less than the second, 1 if the first is greater, and 0 if the two arguments are the same. Then our sorting example could be written: bubble_sort: PARSE ARG stem, size, compare DO n = size TO 2 BY -1 DO I = 1 TO n - 1 j = i + 1 INTERPRET, "IF" compare"("stem".i,"stem".j) > 0 THEN DO;", "temp =" stem".j;", stem".j =" stem".i;", stem".i = temp;", "END" END END RETURN which is really just a very simple change. Notice that the reference to compare is outside of quotation marks, as is stem, so that proper substitution occurs. INTERPRET is usually an expensive instruction to use, in terms of time. It is slow because it has to perform all the tokenization and syntactic analysis REXX needs every time it is invoked, whereas most REXX implementations are optimized to do that sort of thing on any given instruction only once, when the program is first loaded. This can be a problem especially in a sorting routine, which performs the same operation many times. (And even more especially wwith an inefficient sorting algorithm like the bubble sort.) One way to minimize the impact of this sis to take INTERPRET outside of any loops, if possible. In other words, make most of the body of the subroutine into the object of INTERPRET: bubble_sort: PARSE ARG stem, size, compare INTERPRET, /* begin interpreted code */ "DO n = size TO 2 BY -1;", "DO I = 1 TO n - 1;", "j = i + 1;", "IF" compare"("stem".i,"stem".j) > 0 THEN DO;", "temp =" stem".j;", stem".j =" stem".i;", stem".i = temp;", "END;", "END;" "END;" /* end of interpreted code */ RETURN This admittedly is tricky; you have to remember the semicolons and continuation characters. And make sure [that] the right things are outside of quoted strings. In @{"Chapter 5" link Subr} we saw that one of the problems with passing arrays to subprocedures was that of exposing the array so that it is accessible, in case the subprocedure begins with a PROCEDURE instruction. The recommended solution involved something like this: argname = 'array.' CALL function ... function: PROCEDURE EXPOSE (argname) /* all references to arguments use VALUE() function */ ... It was noted that this is a little clumsy, in part because function could not be invoked with normal CALL or function reference syntax. This is a serious problem if we want to use references to the function in an expression. INTERPRET provides a way around the problem. Let's define a new function called APPLY that will take a function name and arguments as its arguments and return the value of the function applied to the given arguments, so that we could get the desired result with: x = apply('function', 'array.') Here's a first cut: apply: argname = ARG(2) INTERPRET "RETURN" ARG(1)"()" If it helps, you can think of this as something like a macro. That is, it's an expression with function-like syntax that expands into a number of REXX instructions. We can generalize this in several directions. Obviously we may want to work with functions of any number of arguments, some of which need to be passed by reference and some by value. Passing an argument by value is the normal way, and passing by reference is what we are trying to simulate. Let's assume we want all arguments being passed by value to be named in a string which will be the second argument of APPLY. That is, we want to be able to say: x = apply('function', 'arg1. arg2.', 'x', '3', 'a+b') in order to pass arg2 and arg2 by reference, and the rest as ordinary arguments. Also, we would like each function called this way to have its own private name for its by-reference argument list, to avoid conflicts. That is, rather than use argname all the time, we would adopt the convention that the name of the function with the suffix _args is the name of the list of by-reference arguments. Then we might have: apply: CALL VALUE ARG(1)"_args", ARG(2) arglist = '' DO i = 3 TO ARG() IF i > 3 THEN arglist = arglist',' arglist = arglist||ARG(1) END INTERPRET "RETURN" ARG(1)"("arglist")" We wanted to avoid double evaluation, and so we also passed the actual arguuments to be used as quoted strings. The loop builds a valid function argument list. Notice in the last line that evaluation of ARG(1) and substitution of the value of arglist occur before INTERPRET is actually executed. Then, when the resulting expression is interpreted, further substitutions and expression evaluations can occur. When this is all executed it is as if, in the present instance, we had: apply: function_args = "arg1. arg2." RETURN function(x,3,a+b) in the program. A completely different direction in which to purse generalization of this example is to pick up on the remark that INTERPRET can add capabilities to REXX that are much like the macro feature of languages such as PL/I, C, and most assemblers. That is, you can write code which has a function call syntax but which actually expands into code customized for some specific purpose. For example, suppose we want a tidy way to assign to all elements of one array the value of an arbitrary function applied to the corresponding elements of another. It could be done with inline code like: DROP target. DO i = 1 TO n target.i = function(source.i) END The problem here is that this only works for one specific function. But we want this to work for any function we wish to name. Except for this condition, we could do this with an ordinary REXX procedure. So let's make a macro to do it. We'll call it ASSIGN, and specify that we want to invoke it with a call like: CALL assign 'target.', 'source.', 'function', n We could do this as follows: assign: _assign_args = ARG(1) ARG(2) CALL _assign ARG(3), ARG(4) RETURN _assign: PROCEDURE EXPOSE (_assign_args) stem1 = WORD(_assign_args, 1) stem2 = WORD(_assign_args, 2) INTERPRET, "DROP" stem1";", "DO i = 1 TO ARG(2);", stem1".i =" ARG(1)"("stem2".i);", "END" RETURN We have taken the extra step in this example of adding an extra lower- level procedure (called _assign) which uses a PROCEDURE instruction to avoid any trouble from use of i as a private index variable. The rest of the details are much the same as preceding examples. Of course, this macro is not terribly different from an ordinary REXX subroutine. It merely does things [which are] not possible without INTERPRET, in that it allows the name of a function as an argument. It might seem so far that INTERPRET doesn't add much to REXX besides an ability to perform indirect function calls. But this is far from the truth. Another whole class of applications for INTERPRET is the handling of expressions read from a file or [from] the user at a terminal. An obvious example of this is a calculator program, which is just a slight variation of the REXXTRY program: /* expression calculator */ restart: SIGNAL ON SYNTAX DO FOREVER SAY "Enter expression:" _expr = LINEIN() IF _expr = '' THEN LEAVE PARSE VAR _expr variable '=' value IF value \\= '' THEN INTERPRET _expr ELSE INTERPRET 'SAY' _expr END RETURN SYNTAX: SAY 'REXX error' RC '('ERRORTEXT(RC)||, ') occurred.' SAY '=====>' _expr SIGNAL restart The modification we've made to REXXTRY is to examine each line of input. If it looks like an assignment statement, it is executed. This allows the calculator to have memory by storing numbers in variables. Any input other than an assignment is assumed to be an expression, and its value is displayed. We still rely on the SYNTAX error handler to inform us of any errors, instead of allowing them to terminate the program. We also test for a null input line as a way to get out of the calculator (since we can't simply enter the EXIT instruction). Endless elaborations are possible on this simple example. For instance, REXX does not have a wealth of built-in mathematical functions such as the trigonometric, exponential, logarithmic, or other transcendental functions. But if you need them, theey can easily be included in the calculator program itself. In another direction, this kind of progrgam could be extended to plot graphs of any desired expressions. This would be especially nice if a decent graphics library is available. Without such a library, alternatively, the program could produce output on PostScript printers by generating the appropriate PostScript code. Our final example of INTERPRET maay be somewhat surprising. We noted earlier that INTERPRET can be a slow instruction to use. Nevertheless, there may be situations in which it can be used to speed up a program. One situation involves the use of large SELECT statements that involve many conditions to be tested. For instance, a program that is command-driven may consist of a main loop that reads commands, parses them, and then uses a large SELECT statement to invoke appropriate code for processing the command, something like this: DO FOREVER PARSE LINEIN verb rest verb = TRANSLATE('verb') /* upper case */ SELECT WHEN verb = 'ANALYZE' THEN CALL analyze rest WHEN verb = 'BUILD' THEN CALL build rest /* etc. */ OTHERWISE SAY "Invalid command." END END While this is satisfactory for a dozen or so commands, it could be very slow if there are several dozen or more different commands, because the SELECT statement would have to make on average a number of comparisons equal to half the number of commands (unless the statement were carefully constructed to put the most likely commands first). Morever, many REXX implementations are not smart enough to skip efficiently to the end of a long SELECT statement after finding the first condition that is true. Even if the performance were acceptable, with many commands you would have a single SELECT statement sprawling over hundreds of lines of code, which would make the program hard to read. And, if nothing else, think of all the boilerplate WHENs and THENs that would need to be typed. Then think how much easier it would be to do somemthing as simple as: DO FOREVER PARSE LINEIN verb rest INTERPRET "CALL" verb "rest" END This assumes that each command is handled by a subroutine having the same name. Ifg this is not the case, you could use a compound variable that provides the dictionary telling which subroutine to call for each command. Or, to be fully general, thhe compound variable co+uld contain the actual code to execute for each command, eg.: code.analyze = "CALL prepare; CALL analyze rest" code.build = "CALL build rest; SAY 'Done!'" code.collate = "SAY 'Not implemented yet!'" ... and the corresponding line to do the right thing for each command is just: INTERPRET code.verb @endnode @node Arit "13: REXX ARITHMETIC" One significant aspect of REXX that has not been given special prominence in this book is the way numbers and arithmetic are handled. We have emphasized REXX as a language for personal programming, for writing command proceedures, for working with character string data, and so forth. Numerically intensive computation does not ordinarily play a large part in this kind of programming. And the fact that REXX is interpretive and treats numbers as character strings tends to make it slow for numeric computing. However, REXX does have a very distinctive way of dealing with numbers, which can be very important in some cases. For instance, REXX handles high-precision arithmetic very naturally and easily. This can be quite useful when one needs to deal with very large numbers or many digits of precision. REXX is very unusual among programming languages in that it never works (as far as the user is concerned) with numbers using the standard arithmetic instructions of the computer. Instead, REXX works entirely with a general, abstract definition of numbers and arithmetic. Thus the word length of the host computer and the various sizes of its different types of internal representations of numbers are irrelevant. Underflow and overflow of quantities, as understood by the computer, cannot occur. Intricacies of binary representation of floating point numbers may be ignored. The result is that REXX programs can be much more portable as far as their numeric computations are concerned, since the programmer needs to understand only REXX's rules for arithmetic, not the rules of every computer on which the program might run. If a program is written correctly according to the rules of REXX, it should produce exactly the same numeric results regardless of where it is run. REXX is able to do this because, as we have observed, numbers are always represented as character strings. Numbers may be used as character strings and vice versa. The way in which a number is expressed as a character string is important only when arithmetic operations are to be performed, or when using certain instructions and built-in functions which require numeric arguments of a certain type. Strings that represent numbers can be created in a number of ways. They may be literals in the program, and they may be either unquoted literals, or quoted strings, including hexadecimal or bit strings. Thus 1, '1', '31'x, and '0011 0001'b are all valid representations of the number 1 (in ASCII). Notice that these are machine-independent representations. The machine-specific binary representation of a number, such as '01'x, is not a valid representation of the number 1. Valid numeric strings can also be created by character operations, read from a file, etc. Thus: '1'||'1' COPIES('1',2) and so forth create valid represenations of the number 11. A character string is a valid number if: 1. It is a sequence of 0 or more digits ("0" through "9") followed by a period, followed by another sequence of 0 or more digits, except that a period by itself isn't a valid number. 2. It is a number as in 1. preceded by a + or - sign and 0 or more spaces, eg. '+1', '+ 1', '- 1'. 3. It is a number as in 1. or 2. with 0 or more leading or trailing blanks, eg. ' 1 ', ' + 1 '. Valid numbers may also use exponential notation. This means a number more or less as just described, followed by e or E, optionally followed by a + or - sign, and ending with 1 or more digits. Examples: 1e9 ' + 3E+4 ' ' 666.000e-10 ' The part of the number before the E or e is called the mantissa and the part after is scalled the exponent. The meaning of the notation is that the number represented is the mantissa times ten to the power of the exponent. (If the exponent is negative the mantissa is multiplied by one over ten to the absolute value of the exponent.) No blanks are allowed between the mantissa, the E or E, the sign of the exponent, and the exponent itself. Although strings as described above may be used as numbers, when REXX prrocedures a numberic result in the form of a character string, a certain standard form is used. For instance, such strings will never contain embedded blanks. They will have a sign only for negative numbers, and E is used in exponential notation rather than e. In adddition, when a result is not given in exponential notation, it will always begin with a zero before the decimal point, for numbers less than one in absolute value, but otherwise will have no leading zeroes. If the result is given in exponential notation, numbers in standaard form will always have just one nonzero digit before the decimal point. The determination of whether or not to present the result in exponential form or not depends on the value of NUMERIC DIGITS, which we discuss below. Zero itself will always be represented simply as 0, with out [a] decimal point or fractional part. You can always force a valid numeric string into the standard form by adding 0. These details are important when dealing with numeric results as character strings - for instance when you are concerned with how numbers appear in a report. @{" PRECISION OF ARITHMETIC " link Arit1} @{" ARITHMETIC OPERATIONS " link Arit2} @{" EXPONENTIAL REPRESENTATION0 " link Arit3} @{" WHOLE NUMBERS " link Arit4} @{" ARGUMENTS TO BUILT-IN FUNCTIONS " link Arit5} @{" BUILT-IN FUNCTIONS FOR NUMERIC FORMATTING AND " link Arit6} @{" ARITHMETIC " link Arit6} @{" ADDITIONAL MATHEMATICAL FUNCTIONS " link Arit7} @endnode @node Arit1 "PRECISION OF ARITHMETIC" One other very important fact about the way REXX handles numbers and arithmetic is that at any one time it works with, at most, some maximum number of significant digits. This affects not only arithmetic, but also numbers used directly by REXX, such as positional parameters in PARSE templates. This is because it is inefficient to do arithmetic with truly unlimited precision numbers. For most ordinary purposes, all that is required is to work with a sufficient number of significant digits, rather than exact values. Why suffer the expense of computing with hundreds of digits when all you care about (perhaps) is only 10 or so? Most fractional quantities cannot be expressed exactly in decimal notation anyway. There isn't any way to represent 1/3 precisely as a decimal, and .333333333 is usually more than close enough. Even disregarding the inefficiency of computing with many digits, just think how annoying it would be to see reports filled with quantities that look like: .333333333333333333333333333333333333333333333333333333333333333333333 To avoid such excess, REXX always has some specific limit on the number of digits it will work with. The default is nine digits, but it can be changed at any time with the NUMERIC DIGITS instruction. This has the form: NUMERIC DIGITS [expression] where expression evaluates to a whole number greater than 0. The current value of NUMERIC DIGITS may always be obtained with the DIGITS() built-in function. A very important fact about REXX is that it does not cause a program failure or even an error when a computation exceeds the specified maximum precision. Nor does REXX silently just produce completely invalid results by discardding the most significant digits of a result, as occurs with many other programming languages. Instead, it discards the least significant digits of a result, to stay within the specified number of digits of precision. The concept of NUMERIC DIGITS is actually quite hard to define precisely. It is best understood in terms of the effects it has on REXX handling of numbers, as described below. There are a large number of such effects. They include rounding which may occur to numbers used in arithmetic, the default way in which arithmetic results are represented, and whether a given number is regarded as a valid whole number. @endnode @node Arit2 "ARITHMETIC OPERATIONS" First and foremost, NUMERIC DIGITS affects how each of the arithmetic operations are performed and how the results are represented. To begin with, both operands are truncated (not rounded) to NUMERIC DIGITS + 1 significant digits (in the REXX sense). This provides one extra guard digit to help preserve accuracy. After the operation has been performed on the numbers, the result is rounded to NUMERIC DIGITS places, starting from the high-order nonzero digit. It is quite common for a result to have more than NUMERIC DIGITS of precision before this rounding. For instance, the product of two three-digit numbers can easily have six digits. And the quotient 1/3 has an infinite number of digits. The rules for arithmetic in REXX are mostly the same as in pencil-and- paper arithmetic. However, the common rules are not always precise or unambiguous enough, so REXX has made a few arbitrary rules which in some cases are not intuitively natural. Addition and subtraction present some of the more unusual cases. Subtraction, in particular, because it may involve a great deal of cancellation, provides some interesting examples. First of all, addition or subtraction when one of the operands is zero is a special case. The result is simply the value of the nonzero operand rounded according to NUMERIC DIGITS and with the sign adjusteed. Adding zero to a number is sometimes a useful way of putting it in the standard form. The general rule for addition and subtraction is first to normalize both operands (after both have been truncated to NUMERIC DIGITS + 1 digits). This is done by expressing in exponential notation the operand which has the largest absolute value. Then express the other operand in exponential notation using the same exponent as the first. This process may result in expressing the second operand in a form with more than NUMERIC DIGITS + 1 digits. If so, it should be truncated to NUMERIC DIGITS + 1. This truncation prevents the inclusion of illusory precision in the result. The addition and subtraction can then be performed on the mantissas of the operands. The result, finally, is roundned to NUMERIC DIGITS - counting from the leftmost nonzero digit (if there is at least one to the left of the decimal point), otherwise from the digit just to the left of the decimal point, even if it zero. Again, this prevents keeping apparent precision that really isn't there (if subtraction has led to significant cancellation). These rules are fairly complex and sometimes produce surprising results, but they are design to properly represent the precision of a result. For example, suppose NUMERIC DIGITS is 2. A simple case is: 100 - 95 In exponential form this is 1E2 - 0.95E2, which is 0.05E2. This has to be rounded to just two significant digits, so it is 0.1E2 - ie. 10! Even more surprisingly, 100 - 96 becomes 0.04E2, which rounds to a result of 0. Clearly, choosing a small number like 2 for NUMERIC DIGITS can have a drastic effect on arithmetic. Of course, the same effect occurs with NUMERIC DIGITS 9. It's just easier to ignore it because of the much smaller relative size of the rounding effect. Notice also that REXX does not round off the numbers before operating on them. If that were the case, 95 would round to 100, and 100 - 95 would be 0. For a more complicated example, let's figure out the value of: 0.00445 - 0.004505 In exponential form the first operand is 4.45E-3, so it does not need to be truncated, but the second is 4.505E-3, so it is truncated (not rounded) to 4.50E-3 to have three significant digits. Then we compute: 4.45E-3 - 4.50E-3 which is -0.05E-3. Now we have to round this result since there are more than two digits, including the one to the left of the decimal point. Rounding is done in the usual way: if the digit to be rounded off is 5 or more, add 1 to the digit to its left, otherwise just drop it. (The sign of the number is not considered.) So we have -01E-3. In nonexponential form this is -0.0001, and this is the way [in which] the result is finally expressed. This follows from the rules below about the use of exponential representation because there are not more than four digits to the right of the decimal point. If we had computed instead: 0.00446 - 0.004505 we would get -0.04E-3. We are still obliged to round this to two digits, but now the result is 0 after rounding. So the final result is 0. To some extennt, cases like this may seem paradoxical. Why, after all, should one expect .00446 - .004505 to be 0? The answer is the somewhat artificial nature of the example. We have been illustrating how the setting of NUMERIC DIGITS (just 2 in this example) affects arithmetic operations. In practice, most programs will hardly ever use anything other than the default NUMERIC DIGITS 9. There is almost no performace penalty for doing so, since the number of digits of precision in most numbers used in typical REXX programs is usually much less anyway. When doing scientific or engineering calculations where the quantities involved approach nine digits of precision, you might well raise NUMERIC DIGITS somewhat higher, to avoid unnecessary loss of precision. Whether you actually need great precision or only a little, it's easy in REXX to compute with a lot more than you really need. Then, for saving or reporting final results, you can use the TRUNC() or FORMAT() built-in functions to express your answers. These considerations of how NUMERIC DIGITS affects arithmetic operations, subtraction in particular, are also relevant to comparison of numbers. When both operands are nummbers, the normal REXX comparison operators (<, <=, =, >, >=) become numeric comparisons. That is, they are based on the actual numbers rather than on the exact string representation, which may be misleading. (For instance, the string representation of a number can have leading blanks without affecting the number. The strict comparison operators (<<, ==, etc.) should be used when the character representation is the important thing.) Numeric comparisons in REXX are defined in terms of the subtraction and comparison to 0. That is, A < B just in case A - B < 0, A = B just in case A - B = 0, and so forth. REXX uses this slightly roundabout definition because, as we have seen, it is quite possible for two "different" numbers to have a difference of 0. In this case, REXX stipulates that the numbers are equal. Thus the expressions: 0.00445 < 0.004505 0.00446 = 0.004505 both have the value 1, by the preceding calculations, if NUMERIC DIGITS is 2. Of course, if NUMERIC DIGITS is 9, as usual, then: 0.00446 < 0.004505 as you would expect. This is true even if NUMERIC DIGITS is 3. The point is, use a small value for NUMERIC DIGITS only if you find this way of looking at numeric comparisons useful for your purposes. @endnode @node Arit3 "EXPONENTIAL REPRESENTATION" Even when a number produced as an arithmetic result does not require rounding to stay within NUMERIC DIGITS of precision, REXX may still change its representation to the exponential form. For instance, if NUMERIC DIGITS is 3, then an arithmetic result of 1000 will be expressed as 1.00E+3. The rule is that a reesult will be expressed in exponential notation if the number of digits before the decimal point is more than NUMERIC DIGITS or if the number of digits after the decimal point (disregarding any trailing zeros) is more than twice NUMERIC DIGITS. Otherwise, a result will be expressed in nonexponential notation. Note that this does not apply to the results of built-in functions. For instance (still with NUMERIC DIGITS 3): POS('1', COPIES('0',1000)'1') has 1001 as a result. @endnode @node Arit4 "WHOLE NUMBERS" Many instructions and built-in functions in REXX, as well as certain other circumstances, require argument values which are whole numbers. This is a number which is an integer, ie. has no fractional part, or nothing but zeros after the decimal point when expressed in nonexponential form. And in addition there is a limit on how large the number can be: a whole number must have no more than NUMERIC DIGITS to the left of the decimal point (excluding leading zeros). In other words, a whole number is an integer which would not need to be expressed in exponential form according to the preceding rule. This isn't to say that a number in exponential form can't be a valid whole number. For instance, if NUMERIC DIGITS is 9, 1.1E2, 1E1, and 1E8 are all whole numbers, though 1E9 is not. The following are circumstances in which REXX requires whole numbers: * positional patterns in PARSE templates. * the right operand of the exponentiation operator (**) (but not necessarily the exponent of a number represented in exponential form: eg. 1E9999 is acceptable even if NUMERIC DIGITS 3). * the repetition count in a DO instruction. * values specified in NUMERIC DIGITS or FUZZ. * trace counts specified in the TRACE instruction. * certain arguments of some built-in functions: ARG(), D2C(), D2X(), and SOURCELINE(). Also, string functions which take length or position arguments require them to be whole numbers. In general, such functions require whole numbers because their arguments must be exact values which cannot admit rounding to NUMERIC DIGITS of precision. * the string arguments to C2D() and X2D() must be such that the function result is a whole number. * results of the operations of integer division and remainder must be whole numbers. There is currently some variability in how different REXX implementations deal with these rules. The problem occurs with values less than 9 of NUMERIC DIGITS. Many implementations allow integers [of] up to nine digits to be used as valid whole numbers for most purposes even though, strictly speaking, they are not when NUMERIC DIGITS is less than 9. This is a sensible policy, but it points up another good reason not to set NUMERIC DIGITS [to] less than 9, because you can't be sure of portability. The DATATYPE() built-in function can always be used to check whether a given string represents a valid whole number. That is: DATATYPE(string, 'w') has a value of 1 if string is a valid whole number, and 0 otherwise. @endnode @node Arit5 "ARGUMENTS TO BUILT-IN FUNCTIONS" Many built-in functions do not require that their arguments be whole numbers but do round off the arguments before using them. These are primarily the mathematical functions ABS(), FORMAT(), MAX(), MIN(), SIGN(), and TRUNC(). This is appropriate, since such functions are really arithmetic operators in function form. @endnode @node Arit6 "BUILT-IN FUNCTIONS FOR NUMERIC FORMATTING AND ARITHMETIC" REXX arithmetic often produces results with quite a few digits after the decimal point, particularly in calculations that involve any division. Unless you use integer division, or happen to get results that have an exact decimal representation, this is more or less guaranteed: 1/3 becomes 0.333333333. This is usually more digits than you ordinarily want to bother with when displaying results. REXX doesn't require you to use arcane format statements to display output as do most other languages. But if you care about the appearance of reports you will usually want to change some of REXX's formatting defaults. The simplest way to do this is with the TRUNC() built-in function. Its format is: TRUNC(number, [digits]) where number is the number to be truncated, and digits is the number of digits to be included to the right of the decimal point. The default for digits is 0, which causes the function to return the integer part of the number. Although the function is called @{i}truncate@{ui}, extra zeros will be added if necessary. If zero digits are requested, the decimal point itself will be omitted. The result is never affected by the NUMERIC DIGITS setting. Also, exponential form will never be used, so TRUNC() provides a convenient way to convert to nonexponential form. Sometimes, more control is needed over the representation of a number, particularly for numbers used in tabular reports. The FORMAT() built-in function can handle this problem. Its form is: FORMAT(number, [m], [n], [exp1], [exp2]) number is the number to be formatted. m is the number of digits allowed before the decimal point, and n ais the number [of digits allowed] after the decimal point. The number will be represented in exponential or nonexponential form according to the usual rule: exponential form is used just in [the] case [where] more than NUMERIC DIGITS are required before the decimal point or twice NUMERIC DIGITS are required after the decimal point. The exp2 argument can be used to change this trigger point: if specified, it is the number used instead of NUMERIC DIGITS to determine whether to use exponential form. exp1 determines how many digits should be used for the exponent (if required), excluding E and the sign. All arguments except number must be nonnegative whole numbers. If the number of places before the decimal point isn't specified, only as many as required will be used, with any blanks before or after the sign of the number being removed. A + sign, if any, is also removed. But if the result is negative, m must allow room for the - sign. An error results if not enough room is allowed. The number will be padded on the left with blanks if m is specified and there are fewer digits than that before the decimal point. Extra zeros are added after the decimal point if n is larger than the number of existing digits. If there are more digits after the decimal point than will fit, the number is rounded. (Note that this differs slightly from the TRUNC() function, which truncates instead of rounds.) Examples: FORMAT(' + 1.2 ', 2, 2) " 1.20" FORMAT(' - 1.2 ', 2, 2) "-1.20" FORMAT('1.23456', 2, 3) " 1.235" FORMAT('1.23E1' , 2, 3) "12.300" FORMAT('1.567' , 2, 0) " 2" You can force a number into exponential form by using a value of 0 for exp2. However, if the exponent is 0, then the exponent is simply omitted (if exp1 isn't specified), or replaced by exp1 + 2 blanks, to keep the field width right (if exp1 is specified). Examples: FORMAT(-12.3 , 3, 3, , 0) " -1.230E+1" FORMAT(-12.3 , 3, 3, 2, 0) " -1.230E+01" FORMAT( -1.23, 3, 3, 1, 0) " -1.230 " FORMAT( -1.23, 3, 3, , 0) " -1.230" You can force a number in nonexponential form by using a value of 0 for exp1. If exp1 is specified but [it] is not large enough, an error will result. Examples: FORMAT(1.23456789E9, , 1, 0) "1234567890.0" FORMAT(1.23456789E8) "123456789" The last example is converted to nonexponential form, because when FORMAT() is used with no arguments other than the number, its result is the same as the expression number + 0. If you are using FORMAT() to present numbers in tables or in other ways where the exact total field width occupied by the number matters, you should use the exp1 and exp2 arguments to ensure either that all possible numbers are in exponential form or nonexponential form. There is no way to specify the total field width except by m, n, and exp1 individually. The total field width will be m + n + 1 (nonexponential form) or m + n + exp1 + 3 (exponential form). @endnode @node Arit7 "ADDITIONAL MATHEMATICAL FUNCTIONS" REXX does not have standard built-in functions for the transcendental functions such as LOG, EXP, SIN, COS, TAN, etc. It does, however, support a few useful arithmetic functions: ABS(), SIGN(), MIN(), and MAX(). They work more or less a would be expected: ABS(number) Returns the absolute value of number. It is the same as number + 0, without the sign. SIGN(number) Returns -1 is number is less than 0, 0 if it is equal to 0, and 1 if it is greater than 0. MIN(number, [number], ...) Returns the smallest argument. Rembmer that comparison is REXX is done by subtracting and comparing the result to zero. Arguments that are actually different may be considered to be equal, especially if NUMERIC DIGITS is small. For instance, if NUMERIC DIGITS is 2, 96 is @{i}equal@{ui} to 100, ie. their difference is zero. In a case like this, MIN() will always return the value of the first argument which is less than or equal to all the others. Thus MIN(96, 100) is 96, but MIN(100, 96) is 1.0E+2. MAX(number, [number], ...) Returns the largest argument. More precisely, it returns the first argument which is greater than or equal to all of the others in the special sense that REXX uses. There is just one other purely mathematical function, which is useful in games or simulations: RANDOM(). It generates pseudo-random numbers according to an algorithm which is not specifies and may vary from implementation to implementation. A seed value can optionally be specified. The random number algorithm uses the seed to determine the first number in the sequence. No number by itself, strictly speaking, is random. It is, rather, the sequence that is random. Each seed determines the whole sequence (which is why it is not really random), and different seeds almost always produce different sequences. Consequently, you should supply a seed only on the first call to RANDOM() in each program. This causes the same sequence to be generated each time, so [that] the program is repeatable. If you don't supply a seed on the first call, one is chosen automatically, typically based on the time of day. This is better for games, since you will generally get different sequences each time. You can also specify the range of numbers to be produced by RANDOM(). If you simply want a whole number between 0 and some maximum value (inclusive) you can use: RANDOM(max) where max is the maximum possible value. This form does not allow you to specify the seed. More generally you can use: RANDOM([min], [max], [seed]) where min is the minimum possible value and max is the maximum. min, max, and seed must be nonnegative whole numbers. In addition, the range between the minimum and maximum cannot exceed 100,000. There is one other function which is partly mathematical, but which has other uses as well: DATATYPE(). Its syntax is: DATATYPE(string, [type]) string is always the character string whose type is to be determined. If type is omitted, the function returns NUM if the string is a valid number, [or] otherwise CHAR. A string is a valid number or not depending on whether it can be used in arithmetic without error, according to the rules listed at the beginning of this chapter. DATATYPE() can be used to validate any data before using it in calculations if you want to bullet- proof your program against unexpected termination due to bad data. Finer discriminations of the type of a string can be made by supplying the type argument. If it is not present, the function returns 1 or 0, accordingly as string is or is not a valid instance of the specified type. The possible values are: 'A' Alphanumeric - all characters in the string are upper- or lowercase alphabetic characters or a digit from 0 to 9. 'B' Bit - all characters in the string are either 0 or 1. Blanks are not permitted, so this is slightly different from the format that is permitted for a bit string literal. 'L' Lowercase - all characters in the string are lowercase alphabetic characters. 'M' Mixed case - all characters in the string are upper- or lowercase alphabetic characters. 'N' Numeric - the string is a valid number. This is 1 just in case DATATYPE() with only one argument would return NUM. 'S' Symbol - the string consists only of characters which are valid in REXX symbols. This is not quite the same thing as saying [that] the string is actually a valid REXX symbol. It might, for example, be lnger than the implementation allows. 'U' Uppercase - all chraacters in the string are uppercase alphabetic characters. 'W' Whole number - the string is a valid whole number, as discussed earlier, according to the current value of NUMERIC DIGITS. Remember that a number may not be whole if it is too large, as well as if it is not integral. 'X' Hexadecimal - the string represents a valid hexadecimal number. This means that it may contain blanks, digits 0 through 9, "a" through "f", or "A" through "F". If spaces are used, the string must follow the rules for hexadecimal literals. A null string is specifically included. Except for the 'X' type, null strings are not included in any of the types, and DATATYPE() will return 0 if the first argument is a null string. @endnode @node Trac "14: TRACING AND DEBUGGING" Debugging probably ranks near the bottom of the list of the aspects of program,ming that programmers most enjoy, right along with documentation. Unfortunately, it is even more unavoidable. Few programming languages have debugging facilities of any kind as part of the language definition, but REXX is an exception. The TRACE instruction has been provided to enable a number of useful tracing and debugging capabilities. On the other hand, most modern implementations of popular languages, such as C, now come with very powerful debugging tools external to the language itself. These tools include features like full-screen displays, sophisticated breakpoint capabilities, and automatic display of program data as it changes. Though REXX's debugging capabilities are relatively primitive in comparison with the current state of the art, being part of the language definition confers on them the advantage that one can work with a complete REXX implementation on any platform and be able to use familiar, standardized tools for debugging. @{" THE TRACE INSTRUCTION " link Trac1} @{" PASSIVE TRACING " link Trac2} @{" INTERACTIVE TRACING " link Trac3} @endnode @node Trac1 "THE TRACE INSTRUCTION" All REXX debugging services are enabled and controlled with the TRACE instruction. Its syntax is: TRACE setting where setting is a code that selects a tracing pattern. Occasionally it is useful to have this setting determined dynamically, so it is also possible to use the form: TRACE VALUE expression where expression is a REXX expression that evaluates to one of the allowable settings. One might do this, for instance, in order to be able to control tracing centrally with a program command or option. The TRACE instructions could then be left in the code but rendered inoperative unless requested. One thing to remember is that TRACE is just a normal, executable REXX instruction. So its use can also be governed with ordinary REXX conditional statements like IF and SELECT. In particular, it does not take effect until encountered in the normal flow of program execution. Therefore, you can place it only where tracing is actually needed within a program to examine a particular problem. Like other REXX state information, the TRACE setting is saved before calling a subroutine and [is] restored afterwards. Though the current setting is in effect when the subroutine is entered, you can change it in the subroutine. Then when the subroutine returns, the original setting will be restored. In many REXX implementations it is also possible to control tracing externally with system environment variables or with options on the command line that starts a REXX program. Although this is often very useful in debugging, there's not much we can say about it here since the actual usage varies among implementations. Once enabled by whatever means, however, specific trace settings should behave the same on any implementation. REXX tracing occurs in one of two modes: passive or interactive. In the passive mode, certain program data is traced through messages written to the standard output stream, but the program does not pause and its operation is not otherwise affected. In the interactive tracing mode, program execution actually stops after most clauses are traced to allow the user to enter any desired REXX statement. One may invoke SAY statements to display the values of variables, call subroutines to perform more complex tasks, or use assignment statements to change the values of variables. Many clauses can actually be reexecuted after such a pause, to test the effect of any changes made. @endnode @node Trac2 "PASSIVE TRACING" We'll consider passive tracing first. This is invoked by choosing appropriate TRACE settings. For general debugging you will primarily want to see exactly which statements of a program are executed, and possibly the results of expressions used in those statements. The setting to use may be one of the following: A - trace all clauses R - trace all clauses and all expression results I - trace all clauses and all intermediate evaluation results O - turn off tracing Only one setting may be used at a time; they cannot be combined. The difference among the first three settings is the level of detail presented on the evaluation of REXX expressions. The instruction: TRACE A simply enables the display of each clause as it is executed. To get more detail, use: TRACE R In addition to displaying each clause as it is executed, REXX will also display the final results of the evaluation of any expression in the clause. For instance, when the following instruction is executed: x = '***' COPIES('-', 10) '***' then REXX displays on the screen: 3 *-* x = '***' COPIES('-", 10) '***' >>> "*** ---------- ***" The first line here is the trace of the clause before it is executed. It begins with the line number of the clause in the source file (3). This is followed by *-*, which is an eye-catcher used by REXX to indicate the trace of a source line. After that is the actual clause being traced. The following line contains the result of evaluating the expression in the assignment. The >>> is an eye-catcher that indicates an expression result. A third level of trace detail is enabled with the instruction: TRACE I which traces all intermediate results in expression evaluation. If this is in effect, then for the previous example REXX would display: 3 *-* x = '***' COPIES('-', 10) '***' >L> "***" >L> "-" >L> "10" >F> "----------" >O> "*** ----------" >L> "***" >O> "*** ---------- ***" This is obviously very detailed, probably too detailed for general use. Here, after tracing the source line, REXX displays the result of every intermediate step in the evaluation. Each line begins with an eye-catcher that indicates what is going on. The possible values for this three- character prefix are: >L> indicates a literal value >F> indicates the result of a function call >O> indicates the result of a binary operation >V> indicates the value of a variable >C> indicates the fully substituted name of a compound variable >P> indicates the result of a unary (prefix) operation The example above illustrates only the first three of these. As you can see, the trace gives you very explicit information about what is happening as REXX evaluates an expression. In particular, it can help you understand better how REXX works, because it indicates the precise sequence in which operations take place. This can be very instructive as you are learning REXX. It is also very helpful during debugging in cases where you don't understand why a given expression results in a particular value. Here's another example that illustrates some of the other information that can be presented as a result of using TRACE I: [1] /* trace compound variables */ [2] TRACE I [3] i = 1 [4] x.1 = -3 + i [5] y = x.i And here is the output: 3 *-* i = 1 >L> "1" 4 *-* x.1 = -3 + i >L> "3" >P> "-3" >V> "1" >O> "-2" 5 *-* y = x.i >C> "X.1" >V> "-2" This example shows tracing of variables, prefix operations, and compound variable substitution. You may find this useful while learning REXX to help understand just how compound variables work. TRACE R is also useful for understanding how the PARSE instruction works. It will show exactly what is assigned to each variable. For instance, the statement: PARSE VALUE 'The Wrath of Khan' WITH a b . would produce the trace: 3 *-* PARSE VALUE 'The Wrath of Khan' WITH a b . >>> "The Wrath of Khan" >>> "The" >>> "Wrath" >.> "of Khan" The first line after the trace of the instruction is the value of the literal expression. The next three lines are the assignments to a, b, and the period used as a placeholder. A general debugging strategy using the passive tracing facilities would be to use TRACE A first to get an overview of how the program is behaving. Very often when a program under development is tested for the first time you will find it does something strange, like exiting mysteriously, going into a loop, or producing completely unreasonable result. Usually this is because DO loops or condition instructions like IF do not work as expected. TRACE A is the easiest way to understand the overall flow of control. It will tell you exactly what statements of the program were executed. Generally, you will find that a conditional instruction did not work as you expected, causing the program to take an unexpected path of execution, because some expression did not evaluate the way [that] it should have. Once you have identified where things went wrong, TRACE R is a good way to try to find out why they went wrong. Sometimes, in dealing with particularly complicated expressions, you will need to use TRACE I to see how they are actually evaluated. But because of the volume of information these tracing directives cause, it is best to place them in your code as near as possible to the location where the problem occurs. There are several other kinds of passive tracing settings that can be used to handle a different class of problems. Many REXX programs have as their primary purpose the issuing of commands to an external environment such as the operating system. Errors occur when the program issues commands which are not exactly wht you intended. To deal with problems like this, you can use one of the following settings: F - trace commands that end with a "failure" error code E - trace commands that end with any abnormal error code C - trace all commands When you use TRACE C, all commands are traced before they are executed. The trace includes the original source code statement in the program, as well as the evaluated result which is passed to the external environment. For instance: TRACE C file = 'payroll.dat' 'listfile' file might produce: 3 *-* 'listfile' file >>> "listfile payroll.dat" LISTFILE Error 135: File(s) not found. +++ RC(28) ++ This begins with the instruction as it appeared in the program, followed by the result of evaluation. This occurs before the program is actually executed. The third line of output is an error message from the program. The fourth line is trace output that is generated because an abnormal (nonzero) return code was produced by the program. The number (28) is the actual return code, which is assigned to the RC variable. Usually you do not want this much detail. TRACE C will generate output for all external commnds executed. Normally you only need to know about commands that do not work properly. As discussed in the @{"chapter" link Exte} on commands to external environments, this is usually indicated by a nonzero return code from the command. However, the specific details vary quite a bit from one environment to another, and sometimes commands will place information in return codes even when they have not encountered an error condition. REXX provides the TRACE E instruction to trace only those commands which end with an error. If we had used this instead of TRACE C in the last example, we would get the output: LISTFILE Error 135: File(s) not found. 3 *-* 'listfile' file >>> "listfile payroll.dat" +++ RC(28) +++ This is different only in that the program error message comes first, because the trace output occurs after the command has run, when REXX knows that it ended with an error. If it had produced a return code of 0, no trace would have occurred at all. TRACE E is a useful instruction in programs that depend on the execution of external commands, because it alerts you when the commands do not work correctly. Sometimes even this produces extraneous trace information, for instance if your progrm already tests the RC variable to detect errors. One further alternative is TRACE F, which traces instructions only when they end in a failure. This condition is usually defined [on lame systems] as the production of a negative return code by the command. Normally this means a more severe type of error, such as an inability of the operating system to even run the command, perhaps because it could not be found or there was not enough memory to run it. These are still just debugging tools. If you want to write your program so that it responds appropriately to error conditions, you need to test return codes explicitly, or else use the CALL ON ERROR or SIGNAL ON ERROR instructions to handle the situation. @endnode @node Trac3 "INTERACTIVE TRACING" Passive tracing can illuminate many problems, but for serious debugging work it is much more effective to be able to interact directly with the program. The way interactive tracing works is that, when it is active, REXX will pause after executing most statements that have been traced. The user is prompted for input. The input can be either a null line, to proceed with the program, or an = sign, to reexecute the clause that was just traced. Any other input is assumed to be one or more REXX statements, which will be handled generally as they would [be] by the INTERPRET instruction. Interactive tracing is requested by prefixing the trace setting in the TRACE instruction with a question mark. For instance: TRACE ?A causes tracing of all clauses the addition of the interaction as just described. TRACE ?C similarly traces only external commands, and pauses for interaction only after such commands have been executed. There are several cases in which REXX will not pause after executing a clause even if it has been traced. Clauses consisting of END, THEN, ELSE, OTHERWISE, RETURN, EXIT, SIGNAL, and CALL are in this category. The reason is that REXX would be unable to safely reexecute such clauses since they have already altered the flow of control within the program. Similarly, clauses that raise a condition for which there is an enabled condition handler or that cause a SYNTAX error cannot be reexecuted, and REXX will not pause for them. When REXX does stop during interactive tracing, you can issue just about any valid REXX statement or group of statements (separated by semicolons). Typically you would use one or more SAY statements in order to examine the contents of variables or the values of expressions. If you need to examine a large number of variables you might even provide special purpose subroutines in your program to display the data. These routines can be invoked from a trace prompt with a CALL instruction. You can change any of the current generation of variables with one or more assignment statements. You can also invoke procedures that make changes to variables. Such changes are persistent, just as if the statements had been executed normally as part of the program. If you then resume program execution by entering an = sign, the statement that caused the pause is reexecuted with the new variables. In this way you can, for example, change the outcome of an IF or WHEN test. Another thing you can do at a trace prompt is to issue commands to external environments. You might, for instance, view or modify files used by the program to examine their current state. Of course, this could be tricky if the file is currently open in the REXX program, so it should be done only with some caution. If things look really hopeless, you can simply enter: EXIT and the program will immediately terminate. Lastly, you cxan modify the operation of tracing itself from the interactive trace prompt. For instance, if you use the instruction: TRACE O tracing will be turned off and the program will resume execution immediately. You might do this if you have been tracing all statements, but you enter a subroutine you don't care to trace. Tracing will then resume when the subroutine returns. You can also use: TRACE ? to turn off interactive tracing, but [to] continue to trace instructions passively. Any other form of the TRACE instruction may also be used interactively to change the type of tracing [that is] in effect. There is another form of the TRACE instruction which is particularly useful during interactive tracing. If you specify a positive number as the TRACE operand, then tracing will proceed for that number of traced clauses without pausing in a case where REXX ordinarily would pause. That is, exactly the same statements are traced, but REXX does not pause until the specified number have been traced. You can also turn off tracing completely for a given number of statements by specifying the number as a negative quantity. The tricky part is estimating the correct number of statements to specify. Normally you would do this in a loop, because you want to let it run to a certain point and you know fairly well how many statements to go. Unfortunately, REXX has no more advanced debugging capabilities, such as executing until a particular variable is changed or reaches a certain value, or until an expression has a certain value. Another thing that REXX tracing cannot do is to execute until a particular routine is called. It can, however, trace labels, that is, trace whenever a label is encountered. This is done by using L as the TRACE setting: TRACE L So, if yuou have a loop which contains only one subroutine call, and you want to skip over the first 50 calls, you could use: TRACE ?L TRACE 50 to stop at the 51st subroutine call. When you enable interactive tracing with a TRACE instruction like: TRACE ?C (and especially in a case such as this, where not all subsequent instructions will be traced), then further TRACE instructions in the program will be ignored. This is to avoid prematurely terminating interactive tracing mode. However, if you want to be sure that trace directives are always effective, there is a TRACE() built-in function. Its syntax is: TRACE([setting]) where setting is the trace setting to use. The function returns the current trace setting and changes to the new one (if any). Numeric values for setting are not allowed in this case. When you enter one or more statements at the interactive trace prompt, there are certain subtle differences in the way [that] they are executed: * TRACE instructions in the input (but not in other code which might be called from an instruction in the input) are honoured. Moreover, they cause REXX to resume program execution until the next statement (if any) traced accordin to the new setting is executed. So if you want to alter the trace setting and then reexecute the current clause, you must use the TRACE() built-in function. * No tracing of clauses is performed except for the display of return codes from commands (if appropriate). * Commands to external environments do not cause the RC variable to be set. * Enabled condition handlers are ignored, even in code called from input statements. If a SYNTAX or HALT condition is raised during execution, a message is displayed, execution stops, and REXX returns to the interactive trace prompt. @endnode @node AppA "A: REXX Instructions" Several conventions are used in the following instruction syntax summaries. REXX keywords are in uppercase. These must be spelled as shown, though any mixture of lower- and uppercase may actually be used. Elements in lowercase represent user-supplied information. Anything enclosed in brackets ([]) is optional. Alternative forms of the instruction are listed on separate lines. Ellipses (...) indicate that the preceding element may be repeated. Semicolons may be included at the end of any clause. They are included below only when required in a context that is not the end of the line. @{" ADDRESS " link AppA1} @{" ARG " link AppA2} @{" CALL " link AppA3} @{" DO " link AppA4} @{" DROP " link AppA5} @{" EXIT " link AppA6} @{" IF " link AppA7} @{" INTERPRET " link AppA8} @{" ITERATE " link AppA9} @{" LEAVE " link AppA10} @{" NOP " link AppA11} @{" NUMERIC " link AppA12} @{" PARSE " link AppA13} @{" PROCEDURE " link AppA14} @{" PULL " link AppA15} @{" PUSH " link AppA16} @{" QUEUE " link AppA17} @{" RETURN " link AppA18} @{" SAY " link AppA19} @{" SELECT " link AppA20} @{" SIGNAL " link AppA21} @{" TRACE " link AppA22} @endnode @node AppA1 "ADDRESS" ADDRESS [environment [command]] ADDRESS VALUE [environment] @{i}Summary:@{ui} Changes the current default external command environment or issues a command to a specified external environment. @{i}Arguments:@{ui} environment: name of a command environment. command: command to issue to a command environment. @{i}Notes:@{ui} When ADDRESS is used by itself, it makes the previous command environment the current environment. If VALUE is not used, the specified environment is taken literally as a name without evaluation. If no command is specified, the environment named becomes the current command environment. If a command is included, it is issued to the specified environment. @endnode @node AppA2 "ARG" ARG [template] @{i}Summary:@{ui} Converts program or procedure arguments to uppercase and parses them according to a supplied parse template. @{i}Arguments:@{ui} template: a parse template. @{i}Notes:@{ui} The template may contain one or more subtemplates, separated by commas. Each subtemplate is used to parse the corresponding argument. If there are more subtemplates than arguments, variables named in the subtemplate are set to a null string. ARG is equivalent to PARSE UPPER ARG. @endnode @node AppA3 "CALL" CALL name [expression] [,expression] ... CALL ON condition [NAME] handler CALL OFF condition @{i}Summary:@{ui} Either calls a subroutine with specified expression values as arguments or enables or disables a handler for a specified condition. @{i}Arguments:@{ui} name: the name of a subroutine, which may be a label in the program, the name of a built-in function, or the name of an external function. expression: argument to the subroutine. condition: one of the following condition names: ERROR, FAILURE, HALT, or NOTREADY. handler: the name of a handler for the specified condition. @{i}Notes:@{ui} The first form of CALL is a normal subroutine call. The second form enables a handler for a particular condition. If no handler name is specified, it is the same as the condition name. The third form disables any existing handler for the specified condition. @endnode @node AppA4 "DO" DO [repetitor] [conditional]; [statement-list] END [symbol] @{i}Summary:@{ui} Delimits a group of statements which may be treated as a single statement and optionally controls repetitive execution. @{i}Arguments:@{ui} repetitor: either an expression, the keyword FOREVER, or a phrase of the form assignment [TO expt] [BY expb] [FOR expf], where expt, expb, and expf are expressions. conditional: either WHILE expression or UNTIL expression. statement-list: zero or more statements separated (if on the same line) by semicolons. symbol: the symbol which is the target of an assignment when the assignment form of repetitor is used. @{i}Notes:@{ui} TO, BY, and FOR may be used in any order in an assignment repetitor. The expressions following TO, BY, or FOR may not contain the keywords WHILE or UNTIL. @endnode @node AppA5 "DROP" DROP name [name] ... @{i}Summary:@{ui} Resets simple and compound variables to an uninitialized state. @{i}Arguments:@{ui} name: a symbol that names a variable or a stem, or a symbol enclosed in parentheses. @{i}Notes:@{ui} When a stem is dropped, all variables having that stem become uninitialized. When a name is enclosed in parentheses, it is assumed to be a string consisting of names of other variables. All variables named in the list (but not the list variable itself) become uninitialized. @endnode @node AppA6 "EXIT" EXIT [expression] @{i}Summary:@{ui} Terminates execution of a REXX program and passes a return value to the caller. @{i}Arguments:@{ui} expression: value to be returned to caller. @{i}Notes:@{ui} Only the current REXX program is terminated. A calling REXX program (if any) will resume execution at the point [where] the current program was invoked. @endnode @node AppA7 "IF" IF expression THEN statement1; [ELSE statement2] @{i}Summary:@{ui} Conditionally executes a statement based on the value of an expression. @{i}Arguments:@{ui} expression: a REXX expression that evaluates to 0 or 1. statement1: statement that is executed if the expression value is 1. statement2: statement that is executed if the expression value is 0. @{i}Notes:@{ui} THEN is a reserved word and may not be used in the expression. Either statement may be a DO group, consisting of a list of statements contained between DO and END. @endnode @node AppA8 "INTERPRET" INTERPRET expression @{i}Summary:@{ui} Executes one or more REXX statements that are generated as the value of an expression. @{i}Arguments:@{ui} expression: an arbitrary REXX expression. @{i}Notes:@{ui} The value of the expression should be a list of REXX statements separated by semicolons. DO, IF, and SELECT statements (if any) must be complete. The statements are executed as if they were part of the program at that point. @endnode @node AppA9 "ITERATE" ITERATE [symbol] @{i}Summary:@{ui} Causes control to pass to the top of an iterative DO group. @{i}Arguments:@{ui} symbol: the name of the control variable of an active DO group. @{i}Notes:@{ui} The control variable (if any) will be incremented appropriately and the terminating conditions will be tested as if the END statement closing the DO group had been encountered. A symbol may be specified to identify the DO group. @endnode @node AppA10 "LEAVE" LEAVE [symbol] @{i}Summary:@{ui} Causes control to pass to the statement following the END of an iterative DO group. @{i}Arguments:@{ui} symbol: the name of the control variable of an active DO group. @{i}Notes:@{ui} A symbol may be specified to identify the DO group. @endnode @node AppA11 "NOP" NOP @{i}Summary:@{ui} Instruction that does nothing. @{i}Notes:@{ui} NOP can be used as the instruction required after THEN in an IF or SELECT instruction. @endnode @node AppA12 "NUMERIC" NUMERIC DIGITS [expression] NUMERIC FORM [form] NUMERIC FUZZ [expression] @{i}Summary:@{ui} Defines certain parameters of REXX numeric representation and arithmetic. @{i}Arguments:@{ui} expression: [a] REXX expression that evaluates to a positive integer. form: either a literal SCIENTIFIC or ENGINEERING, or an expression that evaluates to SCIENTIFIC or ENGINEERING. @{i}Notes:@{ui} NUMERIC DIGITS is (roughly) the number of significant digits retained in a numeric value. NUMERIC FUZZ is the number of least significant digits [which are] ignored when doing numeric comparisons. NUMERIC FORM specifies whether the exponent of a number in exponential form should be a multiple of three. @endnode @node AppA13 "PARSE" PARSE [UPPER] source [template] @{i}Summary:@{ui} Parses an input string into REXX variables according to rules specified in a template. @{i}Arguments:@{ui} source: defines the source of the input string, which can be: ARG program or subroutine arguments. LINEIN line read from [the] standard input stream. PULL line read from [the] external data queue. SOURCE information about the program. VALUE the value of an expression. VAR the value of a variable. VERSION information about the REXX language processor. template: a parse template. @{i}Notes:@{ui} When the source is VALUE it must be followed by an expression and then the reserved word WITH (which cannot occur in the expression). When the source is VAR it must be followed by the name of a variable. @endnode @node AppA14 "PROCEDURE" PROCEDURE [EXPOSE name [name] ...] @{i}Summary:@{ui} Creates a new generation of variables for a subroutine. @{i}Arguments:@{ui} name: a symbol that names a variable or a stem, or a symbol enclosed in parentheses. @{i}Notes:@{ui} When a stem is exposed, all variables having that stem are exposed. When a name is enclosed in parentheses, it is assumed to be a string consisting of names of other variables. All variables named in the list and the list variable are exposed. @endnode @node AppA15 "PULL" PULL [template] @{i}Summary:@{ui} Converts to uppercase and parses a line of input read from the external data queue or [from] the standard input stream. @{i}Arguments:@{ui} template: a parse template. @{i}Notes:@{ui} PULL is equivalent to PARSE UPPER PULL. @endnode @node AppA16 "PUSH" PUSH [expression] @{i}Summary:@{ui} Places a line of data in the external data queue. @{i}Arguments:@{ui} expression: data to be placed in the queue. @{i}Notes:@{ui} The data is placed in the queue LIFO (last in first out). A null string is placed in the queue if the expression is omitted. @endnode @node AppA17 "QUEUE" QUEUE [expression] @{i}Summary:@{ui} Places a line of data in the external data queue. @{i}Arguments:@{ui} expression: data to be placed in the queue. @{i}Notes:@{ui} The data is placed in the queue FIFO (first in first out). A null string is placed in the queue if the expression is omitted. @endnode @node AppA18 "RETURN" RETURN [expression] @{i}Summary:@{ui} Terminates execution of a subroutine and passes a return value to the caller. @{i}Arguments:@{ui} expression: [the] value to be returned to [the] caller. @{i}Notes:@{ui} RETURN does not terminate a REXX program unless it occurs in the topmost procedure of the program. @endnode @node AppA19 "SAY" SAY [expression] @{i}Summary:@{ui} Writes data to the standard output stream (usually the terminal). @{i}Arguments:@{ui} expression: the data to be written. @{i}Notes:@{ui} A null string is written if the expression is omitted. SAY is generally equivalent to a call to LINEOUT() with the first argument omitted. @endnode @node AppA20 "SELECT" SELECT; when-list [OTHERWISE [statement-list]] END @{i}Summary:@{ui} Executes a statement depending on a set of conditional expressions. @{i}Arguments:@{ui} when-list: a list of clauses of the form WHEN expression THEN statement. statement-list: one or more REXX statements, separated by semicolons (if on the same line). @{i}Notes:@{ui} Each expression following a WHEN is evaluated in sequence. The expression must evaluate to 0 or 1. The statement following THEN is executed for the first expression that has the value 1. THEN is a reserved word which cannot be used in any of the expressions. If none of the expressions has the value 1, the statements following OTHERWISE (if present) are executed. @endnode @node AppA21 "SIGNAL" SIGNAL name SIGNAL VALUE expression SIGNAL ON condition [NAME handler] SIGNAL OFF condition @{i}Summary:@{ui} Either transfers control to a specified label in the program or enables or disables a handler for a specified condition. @{i}Arguments:@{ui} name: a label in the program. expression: a REXX expression whose value is a label in the program. condition: one of the following condition names: ERROR, FAILURE, HALT, NOTREADY, NOVALUE, or SYNTAX. handler: the name of a handler for the specified condition. @{i}Notes:@{ui} The first two forms of SIGNAL are used to transfer control to the specified label. All active DO loops are terminated, but control remains within the currently active procedure. The third form enables a handler for a given condition. If no handler name is specified, it is the same as the condition name. The fourth form disables any existing handler for the specified condition. @endnode @node AppA22 "TRACE" TRACE [VALUE] expression @{i}Summary:@{ui} Controls REXX program tracing. @{i}Arguments:@{ui} expression: selects [the] type of tracing as follows: A - trace all clauses. C - trace commands to external environments. E - trace external commands that end with an error. F - trace external commands that end with a failure. I - trace all clauses and intermediate results of expressions. L - trace all labels. N - same as F (the default). O - disable tracing. R - trace all clauses and final results of expressions. @{i}Notes:@{ui} If the expression is not a symbol or literal but [it] does begin with a symbol or literal, it must be preceded by the keyword VALUE. The value of the expression may be prefixed with ? to indicate that interactive tracing is to be toggled on or off. @endnode @node AppB "B: REXX BUILT-IN FUNCTIONS" The same notational conventions apply as in @{"Appendix A" link AppA}. @{" ABBREV() " link AppB1} @{" ABS() " link AppB2} @{" ADDRESS() " link AppB3} @{" ARG() " link AppB4} @{" BITAND() " link AppB5} @{" BITOR() " link AppB6} @{" BITXOR() " link AppB7} @{" B2X() " link AppB8} @{" CENTER() " link AppB9} @{" CHARIN() " link AppB10} @{" CHAROUT() " link AppB11} @{" CHARS() " link AppB12} @{" COMPARE() " link AppB13} @{" CONDITION() " link AppB14} @{" COPIES() " link AppB15} @{" C2D() " link AppB16} @{" C2X() " link AppB17} @{" DATATYPE() " link AppB18} @{" DATE() " link AppB19} @{" DELSTR() " link AppB20} @{" DELWORD() " link AppB21} @{" DELSTR() " link AppB22} @{" DIGITS() " link AppB23} @{" D2C() " link AppB24} @{" D2X() " link AppB25} @{" ERRORTEXT() " link AppB26} @{" FORM() " link AppB27} @{" FORMAT() " link AppB28} @{" FUZZ() " link AppB29} @{" INSERT() " link AppB30} @{" LASTPOS() " link AppB31} @{" LEFT() " link AppB32} @{" LENGTH() " link AppB33} @{" LINEIN() " link AppB34} @{" LINEOUT() " link AppB35} @{" LINES() " link AppB36} @{" MAX() " link AppB37} @{" MIN() " link AppB38} @{" OVERLAY() " link AppB39} @{" POS() " link AppB40} @{" QUEUED() " link AppB41} @{" REVERSE() " link AppB42} @{" RIGHT() " link AppB43} @{" SIGN() " link AppB44} @{" SOURCELINE() " link AppB45} @{" SPACE() " link AppB46} @{" STREAM() " link AppB47} @{" STRIP() " link AppB48} @{" SUBSTR() " link AppB49} @{" SUBWORD() " link AppB50} @{" SYMBOL() " link AppB51} @{" TIME() " link AppB52} @{" TRACE() " link AppB53} @{" TRANSLATE() " link AppB54} @{" TRUNC() " link AppB55} @{" VALUE() " link AppB56} @{" VERIFY() " link AppB57} @{" WORD() " link AppB58} @{" WORDINDEX() " link AppB59} @{" WORDLENGTH() " link AppB60} @{" WORDPOS() " link AppB61} @{" WORDS() " link AppB62} @{" XRANGE() " link AppB63} @{" X2B() " link AppB64} @{" X2C() " link AppB65} @{" X2D() " link AppB66} @endnode @node AppB1 "ABBREV()" ABBREV(string1, string2, [length]) @{i}Summary:@{ui} Indicates whether one string is a beginning segment of another. @{i}Arguments:@{ui} string1: the long form being checked for abbreviation. string2: the strings which is a potential abbreviation. length: the minimum length of string2 that will qualify as an abbreviation. @{i}Notes:@{ui} The function returns 1 if string2 is a substring of string1, starting at the first position and if it is at least length characters long, otherwise it returns 0. The default for length is the length of string2. @endnode @node AppB2 "ABS()" ABS(number) @{i}Summary:@{ui} Returns the absolute value of its argument. @{i}Arguments:@{ui} number: a valid REXX number. @{i}Notes:@{ui} The result is formatted according to the current setting of NUMERIC DIGITS. @endnode @node AppB3 "ADDRESS()" ADDRESS() @{i}Summary:@{ui} Returns the name of the current default environment. @{i}Notes:@{ui} The default environment name is set with the ADDRESS instruction. @endnode @node AppB4 "ARG()" ARG([argument-number], [option]) @{i}Summary:@{ui} Returns either the number of arguments, the value of a specific argument, or whether a specific argument has been included or omitted. @{i}Arguments:@{ui} argument-number: the number of the argument in question. option: one of the following: 'E' - test whether argument exists. 'O' - test whether argument was omitted. @{i}Notes:@{ui} If no argument is specified, ARG() returns the number of arguments passed to the current internal or external procedure. If only the argument number is specified, ARG() returns the value of the designated argument. If option is also specified, ARG() returns 0 or 1 to indicate whether the argument was present. @endnode @node AppB5 "BITAND()" BITAND(string1, [string2], [pad]) @{i}Summary:@{ui} Returns the logical AND of its arguments. @{i}Arguments:@{ui} string1: first operand of AND. string2: second operand of AND. Default is null string. pad: character appended before the operation to the shorter of the two operands to make them equal in length if the operands are of different lengths. @{i}Notes:@{ui} BITAND() produces the logical bitwise AND of its operands. If no pad character is specified, the default is 'ff'x. @endnode @node AppB6 "BITOR()" BITOR(string1, [string2], [pad]) @{i}Summary:@{ui} Returns the logical OR of its arguments. @{i}Arguments:@{ui} string1: first operand of OR. string2: second operand of OR. Default is null string. pad: character appended before the operation to the shorter of the two operands to make them equal in length if the operands are of different lengths. @{i}Notes:@{ui} BITOR() produces the logical bitwise OR of its operands. If no pad character is specified, the default is '00'x. @endnode @node AppB7 "BITXOR()" BITXOR(string1, [string2], [pad]) @{i}Summary:@{ui} Returns the logical XOR of its arguments. @{i}Arguments:@{ui} string1: first operand of XOR. string2: second operand of XOR. Default is null string. pad: character appended before the operation to the shorter of the two operands to make them equal in length if the operands are of different lengths. @{i}Notes:@{ui} BITXOR() produces the logical bitwise XOR of its operands. If no pad character is specified, the default is '00'x. @endnode @node AppB8 "B2X()" B2X(binary-string) @{i}Summary:@{ui} Returns the hexadecimal representation of the given binary string. @{i}Arguments:@{ui} binary-string: character string consisting of 0s and 1s to be converted. @{i}Notes:@{ui} Both the argument and the result of B2X() are character strings. B2X() converts the input data from base 2 representation to base 16. @endnode @node AppB9 "CENTER()" CENTER(string, length, [pad]) @{i}Summary:@{ui} Centres a string in a field of a specified width. @{i}Arguments:@{ui} string: the string to be centred. length: width of the field. pad: character to be added if the field width exceeds the length of the string. @{i}Notes:@{ui} If the string is longer than the width of the field, CENTER() returns the central length characters of the string. If an odd number of characters is to be either added or removed, one more character is added to or removed from the right end than the left. @endnode @node AppB10 "CHARIN()" CHARIN([stream], [position], [count]) @{i}Summary:@{ui} Returns characters read from the specified input stream. @{i}Arguments:@{ui} stream: name of the input stream. Default is the standard input stream. position: location within the input stream at which to begin reading. count: number of characters to read. @{i}Notes:@{ui} The default position is the current read position, which is either the first character of the stream or the character following the last one read by CHARIN() or LINEIN(). The default count is 1. With a count of 0, CHARIN() returns a null string but moves the current read position as specified by the second argument. @endnode @node AppB11 "CHAROUT()" CHAROUT([stream], [string], [position]) @{i}Summary:@{ui} Writes a string of characters to the specified output stream and returns the number of characters (if any) which could not be written. @{i}Arguments:@{ui} stream: name of the output stream. Default is the standard output stream. string: data to be written. position: location within the output stream at which to begin writing. @{i}Notes:@{ui} The default position is the current write position, which is the position following either the last character of the stream or the last one written by CHARIN() or LINEIN(). If the string is omitted, the current write position is updated to the value specified by position. If both string and position are omitted, the output stream is closed. @endnode @node AppB12 "CHARS()" CHARS([stream]) @{i}Summary:@{ui} Returns the number of characters remaining to be read in the specified input stream. @{i}Arguments:@{ui} stream: name of the input stream. Default is the standard input stream. @{i}Notes:@{ui} The number of characters remaining to be read is defined to be the number of characters from the current read position to the end of the file. @endnode @node AppB13 "COMPARE()" COMPARE(string1, string2, [pad]) @{i}Summary:@{ui} Returns the position of the first mismatch between the two input strings. @{i}Arguments:@{ui} string1: first string to be compared. string2: second string to be compared. pad: character appended before the operation to the shorter of the two operands to make them equal in length if the operands are of different lengths. @{i}Notes:@{ui} If the two strings are identical (after padding) COMPARE returns 0. @endnode @node AppB14 "CONDITION()" CONDITION([option]) @{i}Summary:@{ui} Returns information associated with the current trapped condition. @{i}Arguments:@{ui} option: selects type of information, which can be one of the following: 'C' - name of the trapped condition. 'D' - further descriptive information about the trapped condition. 'I' - the instruction that invoked the condition handler. 'S' - state of handling for the condition. @{i}Notes:@{ui} CONDITION() returns a null string if no condition has been raised. @endnode @node AppB15 "COPIES()" COPIES(string, count) @{i}Summary:@{ui} Returns a concatenation of the specified number of copies of the input string. @{i}Arguments:@{ui} string: the string to be copied. count: number of copies. @{i}Notes:@{ui} count must be a nonnegative whole number. COPIES() returns a null string if count is 0. @endnode @node AppB16 "C2D()" C2D(data, [length]) @{i}Summary:@{ui} Returns the value of the input data interpreted as a decimal number. @{i}Arguments:@{ui} data: the data to be converted. length: rightmost number of bytes of input data to be converted. @{i}Notes:@{ui} The data is assumed to be a binary representation of a number with the most significant byte at the left and the least significant byte at the right of the string. If a length is not specified, the number is assumed to be unsigned. If length is specified, the number is assumed to be signed, and only the rightmost length bytes are converted, padded on the left with 0 if necessary. The result must be a valid whole number according to the current setting of NUMERIC DIGITS. @endnode @node AppB17 "C2X()" C2X(data) @{i}Summary:@{ui} Returns the hexadecimal representation of the input data. @{i}Arguments:@{ui} data: the data to be converted. @{i}Notes:@{ui} C2X() returns a string consisting of hexadecimal digits (0-9, A-F) which give the internal representation of the input data. @endnode @node AppB18 "DATATYPE()" DATATYPE(string, [type]) @{i}Summary:@{ui} Returns the type of the input string, or an indication of the class to which the data belongs. @{i}Arguments:@{ui} string: the string whose type is needed. type: a letter which selects a string type to test for, one of the following: 'A' - alphanumeric (a-z, A-Z, or 0-9). 'B' - binary (0 or 1). 'L' - lowercase (a-z). 'M' - mixed case (a-z or A-Z). 'N' - number (a valid REXX number) 'S' - symbol (only characters valid in REXX symbols). 'U' - uppercase (A-Z). 'W' - whole number (valid whole number with current NUMERIC DIGITS). 'X' - hexadecimal (a-f, A-F, 0-9). @{i}Notes:@{ui} If type is not specified, DATATYPE returns NUM or CHAR depending on whether or not the string is a valid REXX number. If the type is specified, it returns 1 or 0 depending on whether or not the string belongs to the designated class. @endnode @node AppB19 "DATE()" DATE([option]) @{i}Summary:@{ui} Returns the current date. @{i}Arguments:@{ui} option: a character that indicates the required date format, one of the following: 'B' - base date, number of complete days since January 1, 0001. 'D' - number of the current day of the year, starting with 1. 'E' - date in European format (dd/mm/yy). 'M' - full English name of the current month. 'N' - date in default format (dd Mmm yyyy). 'O' - date in orderable format (yy/mm/dd). 'S' - date in standard format (yyyymmdd). 'U' - date in US format (mm/dd/yy). 'W' - full English name of the current day. @{i}Notes:@{ui} The default format, which is provided if option is not specified, consists of two digits for the day, the first three characters of the English name of the month, and the year [(dd Mmm yyyy)]. @endnode @node AppB20 "DELSTR()" DELSTR(string, start, [length]) @{i}Summary:@{ui} Returns the input string from which a substring starting at a specified position is deleted. @{i}Arguments:@{ui} string: the input string. start: the starting position of the substring to be deleted. length: the length of the substring to be deleted. @{i}Notes:@{ui} If length is not specified, all characters from the start position are deleted. If the start position is beyond the right end of the string, the string is returned unchanged. @endnode @node AppB21 "DELWORD()" DELWORD(string, start, [length]) @{i}Summary:@{ui} Returns the input string from which a substring starting at a specified word is deleted. @{i}Arguments:@{ui} string: the input string. start: the number of the word which starts the substring to be deleted. length: the length in words of the substring to be deleted. @{i}Notes:@{ui} If length is not specified, the reaminder of the string beginning with the specified word is deleted. If the number of the first word to be deleted is greater than the number of words in the string, the string is returned unchanged. @endnode @node AppB22 "DIGITS()" DIGITS() @{i}Summary:@{ui} Returns the current value of the NUMERIC DIGITS setting. @endnode @node AppB23 "D2C()" D2C(number, [length]) @{i}Summary:@{ui} Returns the internal representation of a decimal number. @{i}Arguments:@{ui} number: whole number to be converted. length: length of the result. @{i}Notes:@{ui} D2C() returns the internal representation of a number as a string of characters with the most significant byte at the left. If length is not specified, the number to be converted must be nonnegative. If length is specified and it is less than required for the entire internal representation, the leftmost (most significant) bytes are truncated. If length is longer than required, the result is sign-extended on the left. @endnode @node AppB24 "D2X()" D2X(number, [length]) @{i}Summary:@{ui} Returns the internal representation of a decimal number in hexadecimal form. @{i}Arguments:@{ui} number: whole number to be converted. length: length of the result in characters. @{i}Notes:@{ui} D2X() returns the internal representation of a number as a string of hexadecimal digits with the most significant digit at the left. If length is not specified, the number to be converted must be nonnegative. If length is specified and it is less than required for the entire internal representation, the leftmost (most significant) digits are truncated. If length is longer than required, the result is sign-extended on the left. @endnode @node AppB25 "ERRORTEXT()" ERRORTEXT(number) @{i}Summary:@{ui} Returns the text of the message associated with the specified error. @{i}Arguments:@{ui} number: the number of an error in the range 0-99. @{i}Notes:@{ui} If no error message is associated with the specified number, a null string is returned. @endnode @node AppB26 "FORM()" FORM() @{i}Summary:@{ui} Returns the current value of NUMERIC FORM. @{i}Notes:@{ui} Possible values of NUMERIC FORM are SCIENTIFIC or ENGINEERING. @endnode @node AppB27 "FORMAT()" FORMAT(number, [m], [n], [exp1], [exp2]) @{i}Summary:@{ui} Formats a number with a given number of digits before and after the decimal point, and with a given number of digits in the exponent. @{i}Arguments:@{ui} number: the number to be formatted. m: number of digits before the decimal point in the result. n: number of digits after the decimal point in the result. exp1: number of digits in the exponent of the result. exp2: number of digits required to trigger exponential notation. @{i}Notes:@{ui} The number is first rounded as in the result of the expression number + 0. The default values of m and n are the number of digits required for the integral and fractional parts, respectively. exp2 is the trigger for exponential notation, whose default is NUMERIC DIGITS. That is, exponential notation will be used if the number has more than exp2 digits before the decimal point or more than 2 * exp2 digits after the decimal point. @endnode @node AppB28 "FUZZ()" FUZZ() @{i}Summary:@{ui} Returns the current value of NUMERIC FUZZ. @{i}Notes:@{ui} NUMERIC FUZZ is the number of least significant digits that will be ignored in numeric comparisons of equality or inequality. @endnode @node AppB29 "INSERT()" INSERT(string1, string2, [pos], [length], [pad]) @{i}Summary:@{ui} Inserts one string at a certain position in a second string. @{i}Arguments:@{ui} string1: the string to be inserted. string2: the string inserted into. pos: the position in the second string at which the first is inserted. length: length to which inserted string is extended or truncated. pad: character appended to the first string when its length is to be extended. @{i}Notes:@{ui} The first string is inserted in the second string after the character identified by pos. The default position is 0, in which case the first string is inserted before the first character of the second [string]. The default value of length is the length of the first string. @endnode @node AppB30 "LASTPOS()" LASTPOS(target, string, [start]) @{i}Summary:@{ui} Returns the last position of one string in another, searching from right to left. @{i}Arguments:@{ui} target: the string being searched for. string: the string which is searched. start: the starting position in the second string at which the search begins. @{i}Notes:@{ui} LASTPOS() returns 0 if the target string is null or if it is not found. The default start position is LENGTH(string). @endnode @node AppB31 "LEFT()" LEFT(string, length, [pad]) @{i}Summary:@{ui} Returns the leftmost part of the input string. @{i}Arguments:@{ui} string: the string to be truncated or extended. length: desired length of the result. pad: character added to the right of the input string when its length is to be extended. @{i}Notes:@{ui} The input string is left-justified in a field of specified length or truncated if necessary. @endnode @node AppB32 "LENGTH()" LENGTH(string) @{i}Summary:@{ui} Returns the length in characters of the input string. @{i}Arguments:@{ui} string: the input string. @endnode @node AppB33 "LINEIN()" LINEIN([stream], [position], [count]) @{i}Summary:@{ui} Returns a line read from the specified input stream. @{i}Arguments:@{ui} stream: name of the input stream. Default is the standard input stream. position: location within the input stream at which to begin reading, specified as a relative line number. count: number of whole lines to read (0 or 1 only). @{i}Notes:@{ui} The default position is the current read position, which is either the first character of the stream or the character following the last one read by CHARIN() or LINEIN(). The default count is 1. With a count of 0, LINEIN() returns a null string but moves the current read position as specified by the second argument. @endnode @node AppB34 "LINEOUT()" LINEOUT([stream], [string], [position]) @{i}Summary:@{ui} Writes a string of characters to the specified output stream and returns the number of lines (0 or 1) which could not be written. @{i}Arguments:@{ui} stream: name of the output stream. Default is the standard output stream. string: data to be written. position: location within the output stream at which to begin writing, specified as a relative line number. @{i}Notes:@{ui} The default position is the current write position, which is the position following either the last character of the stream or the last one written by CHAROUT() or LINEOUT(). If the string is omitted, the current write position is updated to the value specified by position. If both string and position are omitted, the output stream is closed. @endnode @node AppB35 "LINES()" LINES([stream]) @{i}Summary:@{ui} Returns the number of lines remaining to be read in the specified input stream. @{i}Arguments:@{ui} stream: name of the input stream. Default is the standard input stream. @{i}Notes:@{ui} The number of lines remaining to be read is defined to be the number of whole or partial lines from the current read position to the end of the file. @endnode @node AppB36 "MAX()" MAX(number, [number], ...) @{i}Summary:@{ui} Returns the largest of a list of numbers. @{i}Arguments:@{ui} number: a valid REXX number. @{i}Notes:@{ui} The result is rounded according to the current setting of NUMERIC DIGITS. That is, like the value of the expression number + 0. @endnode @node AppB37 "MIN()" MIN(number, [number], ...) @{i}Summary:@{ui} Returns the smallest of a list of numbers. @{i}Arguments:@{ui} number: a valid REXX number. @{i}Notes:@{ui} The result is rounded according to the current setting of NUMERIC DIGITS. That is, like the value of the expression number + 0. @endnode @node AppB38 "OVERLAY()" OVERLAY(string1, string2, [pos], [length], [pad]) @{i}Summary:@{ui} Returns the result of replacing the characters of one string by the characters of another, starting at a certain position. @{i}Arguments:@{ui} string1: the string of replacement characters. string2: the string being overlaid. pos: the position in the second string at which the first overlays it. length: length to which overlaying string is extended or truncated. pad: character added to the first string when its length is to be extended. @{i}Notes:@{ui} The default position is 1, in which case the first string overlays the second starting at the beginning. The default value of length is the length of the first string. @endnode @node AppB39 "POS()" POS(target, string, [start]) @{i}Summary:@{ui} Returns the position of one string in another, searching from left to right. @{i}Arguments:@{ui} target: the target being searched for. string: the string which is searched. start: the starting position in the second string at which the search begins. @{i}Notes:@{ui} POS() returns 0 if the target string is null or if it is not found. The default start position is 1. @endnode @node AppB40 "QUEUED()" QUEUED() @{i}Summary:@{ui} Returns the number of lines contained in the external data queue. @endnode @node AppB41 "RANDOM()" RANDOM(max) RANDOM([min], [max], [seed]) @{i}Summary:@{ui} Returns a quasi-random whole number. @{i}Arguments:@{ui} max: maximum value that can be returned. min: minimum (nonnegative) value that can be returned. seed: a whole number which is used to generate the first of a repeatable sequence of quasi-random numbers. @{i}Notes:@{ui} If only one argument is specified, it is assumed to be the maximum value, in which case the minimum will be 0. OItherwise the defaults for min and max are 0 and 999 [respectively]. @endnode @node AppB42 "REVERSE()" REVERSE(string) @{i}Summary:@{ui} Returns the input string with the characters reversed end-for-end. @{i}Arguments:@{ui} string: the string to be reversed. @endnode @node AppB43 "RIGHT()" RIGHT(string, length, [pad]) @{i}Summary:@{ui} Returns the rightmost part of the input string. @{i}Arguments:@{ui} string: the string to be truncated or extended. length: desired length of the result. pad: character added to the left of the input string when its length is to be extended. @{i}Notes:@{ui} The input string is right-justified in a field of specified length or truncated if necessary. @endnode @node AppB44 "SIGN()" SIGN(number) @{i}Summary:@{ui} Returns the arithmetic sign of the input number. @{i}Arguments:@{ui} number: a valid REXX number. @{i}Notes:@{ui} SIGN() returns -1 for a negative number, 1 for a positive number, and 0 for 0. @endnode @node AppB45 "SOURCELINE()" SOURCELINE([number]) @{i}Summary:@{ui} Returns [the] number of lines in the program or the specified line of the program's source code. @{i}Arguments:@{ui} number: the number of the line to return. @{i}Notes:@{ui} SOURCELINE() returns the number of lines in the program if number is omitted. @endnode @node AppB46 "SPACE()" SPACE(string, [count], [pad]) @{i}Summary:@{ui} Returns the input string reformatted with a specified number of pad characters between each blank-delimited word. @{i}Arguments:@{ui} string: the input string. count: number of pad characters inserted between each blank-delimited word. pad: the character to be inserted between words of the input string. @{i}Notes:@{ui} The default count is 1 and the default pad character is a blank. If count is 0, all blanks are removed. SPACE() always removes leading and trailing blanks. @endnode @node AppB47 "STREAM()" STREAM(stream, [option], [command] @{i}Summary:@{ui} Performs an implementation-dependent operation on a specified I/O stream. @{i}Arguments:@{ui} stream: name of the I/O stream. option: determines type of information to be returned or operation to be performed. It may be one of the following: 'C' - perform a command specified by that argument. 'D' - return extended information about the state of the stream. 'S' - return indication about the state of the stream. command: command to be performed if option is 'C'. @{i}Notes:@{ui} STREAM() returns ERROR, NOTREADY, READY, or UNKNOWN if option is 'S'. All other behaviour of STREAM() is unstandardized and dependent on the implementation. @endnode @node AppB48 "STRIP()" STRIP(string, [option], [character]) @{i}Summary:@{ui} Returns the input string with specified leading or trailing characters removed. @{i}Arguments:@{ui} string: the input string. option specifies whether leading or trailing characters are to be removed. It can be one of the following: 'B' - both leading and trailing characters (the default). 'L' - only leading characters. 'T' - only trailing characters. character: the character to be stripped from the input string. Default is a blank. @endnode @node AppB49 "SUBSTR()" SUBSTR(string, start, [length], [pad]) @{i}Summary:@{ui} Returns a substring of the input string. @{i}Arguments:@{ui} string: the input string. start: the beginning position in the input string of the desired substring. length: length of the desired substring. pad: character added to the left of the substring when its length is to be extended. Default is a blank. @{i}Notes:@{ui} The substring may extend beyond the right end of the input string, in which case it is extended with pad characters. The start position may be greater than LENGTH(string), in which case the result will consist entirely of pad characters. @endnode @node AppB50 "SUBWORD()" SUBWORD(string, start, [length]) @{i}Summary:@{ui} Returns a substring of the input string. @{i}Arguments:@{ui} string: the input string. start: the beginning position in the input string of the desired substring, expressed in terms of blank-delimited words. length: length in words of the desired substring. @{i}Notes:@{ui} Leading and trailing blanks will be removed from the result. If the length is greater than the number of words remaining in the string, only the remainder is returned. The default for length is the number of words left in the string. @endnode @node AppB51 "SYMBOL()" SYMBOL(name) @{i}Summary:@{ui} Returns an indication of whether a given string is a valid symbol, and if so, whether it has an assigned value. @{i}Arguments:@{ui} name: a string that represents a possible symbol name. @{i}Notes:@{ui} SYMBOL() returns BAD if name is not a valid symbol name (for instance, the string contains characters not allowed in a symbol). SYMBOL() returns VAR if the string is the name of a symbol which has been assigned a value. Otherwise it returns LIT. @endnode @node AppB52 "TIME()" TIME([option]) @{i}Summary:@{ui} Returns the current time. @{i}Arguments:@{ui} option: a character that indicates the required time format, one of the following: 'C' - civil format: hh:mm, followed by am or pm. 'E' - elapsed time in seconds since timer was reset ('R' option). 'H' - complete hours since midnight. 'L' - hh:mm:ss.uuuuuu format (fractional seconds). 'M' - complete minutes since midnight. 'N' - normal time format (hh:mm:ss), 24-hour clock (default). 'R' - reset time and return time elapsed since last reset. 'S' - complete seconds since midnight. @{i}Notes:@{ui} Time values are never affected by NUMERIC DIGITS setting. The 'E' and 'R' can be used for computing elapsd time without concern for crossing midnight. @endnode @node AppB53 "TRACE()" TRACE([type]) @{i}Summary:@{ui} Returns current trace settings and optionally changes them. @{i}Arguments:@{ui} type: new trace setting in the same form as used in the TRACE instruction. @{i}Notes:@{ui} When the TRACE() function is used to change trace settings it works generally like the TRACE instruction, except that counts cannot be specified, and the settings will be changed even during interactive tracing. @endnode @node AppB54 "TRANSLATE()" TRANSLATE(string, [output], [input], [pad]) @{i}Summary:@{ui} Replaces specified characters in an input string. @{i}Arguments:@{ui} string: the string to be modified. output: the table of output characters. input: the table of input characters. pad: pad character used to extend the output table when it is shorter than the input table. @{i}Notes:@{ui} Every occurrence in the input string of a charcter from the input table is replaced by the corresponding character from the output table. The pad character, which defaults to a blank, is used as a replacement when there is no corresponding character in the output table because it is shorter than the input tabe. Characters not present in the input table are not changed. The default input table is XRANGE('00'x, 'ff'x). If neither input nor output table is specified, TRANSLATE() converts all lowercase characters to uppercase. @endnode @node AppB55 "TRUNC()" TRUNC(number, [digits]) @{i}Summary:@{ui} Formats a number with a given number of digits after the decimal point. @{i}Arguments:@{ui} number: the number to be formatted. digits: number of digits after the decimal point in the result. @{i}Notes:@{ui} The number is first rounded as in the result of the expression number + 0. The default for digits is 0, in which case the result will be an integer without a decimal point. @endnode @node AppB56 "VALUE()" VALUE(name, [value], [type]) @{i}Summary:@{ui} Returns the value of a specified variable and optionally changes it. @{i}Arguments:@{ui} name: a string that is the name of a variable. value: new value for the specified variable. type: system-dependent type or class of variable to be accessed. @{i}Notes:@{ui} The VALUE() function can be used instead of an INTERPRET statement to fetch or set a variable whose name isn't known until run-time. If a new value is not specified, the variable is not changed. By default (when type is not specified) VALUE() references REXX variables of the current generation. In this case, the specified name is uppercased and subject to substitution if it is a compound name. The type of other variables which may be accessed with VALUE() are dependent on the specific implementation. @endnode @node AppB57 "VERIFY()" VERIFY(string, search, [option], [start]) @{i}Summary:@{ui} Indicates whether or not characters from a given set occur in a specified string. @{i}Arguments:@{ui} string: the string to be searched. search: a string composed of the characters to be searched for. option: either 'N' (nomatch) to find the location of the first character of [the] string that is not in the search string, or 'M' (match) to find the first character of the string that is in the search string. Default is 1. @{i}Notes:@{ui} VERIFY() returns 0 if string is entirely composed of characters in search (option 'N'), if string is entirely composed of characters not in search (option 'M'), if the string to be searched is null, or if the start position is beyond the end of the string. @endnode @node AppB58 "WORD()" WORD(string, number) @{i}Summary:@{ui} Returns a specific blank-delimited word from a string. @{i}Arguments:@{ui} string: the input string. number: number of the word to select from the string. @{i}Notes:@{ui} WORD() returns a null string if there are fewer than number words in the string. @endnode @node AppB59 "WORDINDEX()" WORDINDEX(string, number) @{i}Summary:@{ui} Returns the character position of a specific blank-delimited word in a string. @{i}Arguments:@{ui} string: the input string. number: number of the word whose index is required. @{i}Notes:@{ui} WORDINDEX() returns 0 if there are fewer than number words in the string. @endnode @node AppB60 "WORDLENGTH()" WORDLENGTH(string, number) @{i}Summary:@{ui} Returns the length of a specific blank-delimited word in a string. @{i}Arguments:@{ui} string: the input string. number: number of the word whose length is required. @{i}Notes:@{ui} WORDLENGTH() returns 0 if there are fewer than number words in the string. @endnode @node AppB61 "WORDPOS()" WORDPOS(PHRASE, string, [start]) @{i}Summary:@{ui} Returns the word position of one string of words in another, searching from left to right. @{i}Arguments:@{ui} phrase: the string of words being searched for. string: the string which is searched. start: the starting word position in the string at which the search begins. Default is 1. @{i}Notes:@{ui} WORDPOS() returns 0 if the phrase being searched for is a null string or is not found. Excess blanks between words in the target and search strings are ignored. @endnode @node AppB62 "WORDS()" WORDS(string) @{i}Summary:@{ui} Returns the number of blank-delimited words in a string. @{i}Arguments:@{ui} string: the string whose length [in words] is required. @endnode @node AppB63 "XRANGE()" XRANGE([first], [last]) @{i}Summary:@{ui} Returns a string of all characters whose encodings lie in a given range. @{i}Arguments:@{ui} first: the first character in the range. Default is '00'x. last: the last character in the range. Default is 'ff'x. @{i}Notes:@{ui} The results consists of characters in ascending order if first is less than last. If first is greater than last, the result consists of characters in ascending order, but wrapping at 'ff'x. The result depends on the specific character collating sequence [eg. ASCII] used by the implementation. @endnode @node AppB64 "X2B()" X2B(hex-string) @{i}Summary:@{ui} Returns the binary-string representation of a given hexadecimal string. @{i}Arguments:@{ui} hex-string: the hex string whose binary-string representation is required. @{i}Notes:@{ui} Both the argument and the result of X2B() are character strings. X2B() converts the input data from base 16 representation to base 2. @endnode @node AppB65 "X2C()" X2C(hex-string) @{i}Summary:@{ui} Returns the binary (internal) representation of a given hexadecimal string. @{i}Arguments:@{ui} hex-string: the hex string whose binary representation is required. @{i}Notes:@{ui} The result of X2C() is the internal representation of the given hex string. It will normally contain nonprintable characters. @endnode @node AppB66 "X2D()" X2D(hex-string, [length]) @{i}Summary:@{ui} Returns the decimal representation of a given hexadecimal string. @{i}Arguments:@{ui} hex-string: the hex string whose decimal representation is required. length: rightmost number of hex digits to be converted. @{i}Notes:@{ui} The input string must consist of valid hex digits. The digits may be separated by spaces as long as each blank-delimited group except the first contains an even number of hex digits. The spaces are ignored in selecting the rightmost digits when length is specified. If length is specified, the result is a signed number. Otherwise the result will be unsigned. If length exceed LENGTH(hex-string), the string is first padded on the left with 0s. @endnode @node AppC "C: FURTHER READING" Cowlishaw, Michael F., @{i}The REXX Language: A Practical Approach to Programming@{ui}, 2nd ed. (1988). This is the standard definition of REXX, by its inventor. Daney, Charles, "REXX in Charge", @{i}Byte@{ui}, Vol. 15, No. 8 (August 1990). Introduction to REXX as a general purpose macro language for personal computer applications. Goldberg, Gabriel; Smith, Philip H. ]I[, @{i}The REXX Handbook@{ui} (1992). Contains 45 separate chapters with general information, usage notes, descriptions of different implementations, details of REXX-related products, and an extensive bibliography. O'Hara, Robert P.; Gomberg, David R., @{i}Modern Programming Using REXX@{ui}, 2nd ed. (1988). A very good introduction to programming concepts and techniques which uses REXX. Quercus Systems, @{i}Personal REXX User's Guide, Version 3.0@{ui} (1991). Quercus Systems, PO Box 2157, Saratoga CA 95070, [USA]. Contains a long tutorial chapter with many examples of REXX programs. @endnode @node AppD "D: OPERATORS AND SPECIAL CHARACTERS" + addition operator in positional parsing patterns unary plus - in positional parsing patterns subtraction operator unary minus * multiplication operator / division operator // remainder of integer division ** exponentiation operator \\ negation operator % integer division operator | logical or operator || concatenation operator & logical and operator && logical exclusive or operator = assignment comparison, equality in positional parsing patterns == comparison, strict equality > comparison, first operand greater than >> comparison, first operand strictly greater than < comparison, first operand less than << comparison, first operand strictly less than >= comparison, first operand greater than or equal to <<= comparison, first operand strictly greater than or equal to <= comparison, first operand less than or equal to <<= comparison, first operand strictly less than or equal to \\= comparison, first operand not equal \\== comparison, first operand strictly not equal \\> comparison, first operand not greater \\>> comparison, first operand strictly not greater \\< comparison, first operand not less \\<< comparison, first operand strictly not less \\>= comparison, first operand not greater than or equal \\<= comparison, first operand not less than or equal \\>== comparison, first operand strictly not greater than or equal \\<== comparison, first operand strictly not less than or equal /* beginning of comment */ end of comment ! valid symbol character ? valid symbol character . in compound names in numbers in parsing template valid symbol character _ valid symbol character : label identifier , continuation character in argument list in parsing templates ; explicitly ends a clause ( begins expression or argument list in DROP instruction in EXPOSE instruction in parsing patterns ) ends expression or argument list in DROP instruction in EXPOSE instruction in parsing patterns ' string delimiter " string delimiter @endnode