The prerequisite for CSCI 465 is CSCI 360, "Computer Programming in Assembly Language". This document is intended as a reminder of some of the more important aspects of assembly language. It is not a substitute for CSCI 360. There is more to say about nearly every topic than is mentioned here. This presentation is oriented to CSCI 465 and does not contain all the details and comments that might be needed in other courses such as CSCI 464 or 468.
Elementary background trivia
We have 16 general-purpose registers, 0 through 15. Some of these are usually reserved for specific purposes or are affected by specific machine instructions or by macros. We also have special registers such as the condition code, the location counter and the instruction length. There are other sets of registers as well, such as floating-point registers and access registers; do not worry about them.
A word, also known as a full word, is 4 bytes or 32 bits long. This is also the size of a register value, and it is also large enough to hold an address (24 bits or 31 bits). We normally use 24-bit addresses, corresponding to the BC mode of the PSW.
A good rule of thumb about assembly language is that the most common thing to put in a register is an address.
The general format of an instruction is:
a label (optional, 1 to 8 characters)
one or more blanks
a mnemonic for an operation code
one or more blanks
zero or more operands separated by commas
one or more blanks
a comment (optional in theory, but we require line documentation)
Assembly language is not especially column-sensitive. The mnemonic usually starts in column 10 or later, and the operands usually start in column 16 or later. If it is necessary to continue coding from one line to the next, there are requirements: Put a non-blank character in column 72 and continue in (very specifically) column 16 of the next line.
We use columns 1-71. Column 72 is used only to indicate continuation from one line to the next. We do not use columns 73-80 at all.
An entire line can be made into a comment by putting an asterisk ('*') in column 1.
Do not leave a line blank. The assembler sees this as the end of one assembly and the beginning of the next, and it will complain.
The character set used for labels and variable names consists of: letters 'A' through 'Z' and 'a' through 'z', digits '0' through '9', national characters '@', '$' and '#', and the underscore character '_'. Character literals and comments may use any characters at all. Lower-case letters 'a' through 'z' are considered as identical to the matching upper-case letters 'A' through 'Z'. Assembly language is not case-sensitive.
Mainframe assembly language uses the EBCDIC character set, not ASCII. Thus lower-case letters have lower values than upper-case letters, which have lower values than digits ('a' < 'A' < '1').
Program structure
A simple structure for an assembly language program might be:
name CSECT
entry linkage
(unload the parameter list if necessary)
(code)
exit linkage
LTORG
(storage)
END name
The structure might be much more complex. We could have more than one CSECT, we could be using in-line macros, and we could have DSECTs. The storage could be located elsewhere, or in several places, but we must be careful to branch around it to avoid trying to execute something not executable.
Very common machine instructions
You should know how to use these without having to look them up. You should know a good many other instructions almost as well.
In general, an address here is something that evaluates to an address, such as a label or a D(B) address or sometimes a D(X,B) address. It is useful to remember that with the exception of store instructions and CVD, most machine instructions move data from right to left. It is also useful to remember that some instructions are fussy about boundaries: L, ST, etc.
Load L register,address
Load a full word value into a register from an address (on a full word
boundary).
Load Address LA register,address
Load an address into a register.
Load Multiple LM registerfirst,registerlast,address
Load full word values into several consecutive registers from an
address (on a full word boundary).
Store ST register,address
Store the contents of a register at an address (on a full word boundary).
Store Multiple STM registerfirst,registerlast,address
Store the contents of several consecutive registers at an address (on a
full word boundary).
Move Character MVC address1(length),address2
Copy N = length bytes from the second address to the first address. The
length is at most 256.
Move Character Immediate MVI address,immediate byte
Copy the hard-coded immediate byte to an address. (Do not confuse this
with MVC with a length of 1.)
Compare Character CLC address1(length),address2
Compare N = length bytes at the two addresses, one byte at a time, left
to right. The length is at most 256.
Compare Character Immediate CLI address,immediate byte
Compare the byte at the address to the hard-coded immediate byte. (Do not
confuse this with CLC with a length of 1.)
Compare C register,address
Compare a register value to the full word value at an address (on a full
word boundary).
Branch on Condition BC mask,address
Compare the condition code to the mask, and if they match, branch to the
address. This is frequently used with one of the mnemonics instead of a
mask: BE, BH, BL, BZ, etc.
Common families of instructions
It can be difficult to remember the various machine instructions and choose the one you need. Many of the instructions fall into families:
packed arithmetic: AP, SP, MP, ZAP, CP, etc. register arithmetic: AR, SR, MR, XR, etc. involving half words: LH, STH, AH, SH, etc. comparisons: C, CR, CP, CLI, CLC, etc. load instructions: L, LR, LA, LM, etc. move instructions: MVC, MVI, MVCL, etc. store instructions: ST, STM, STCM, etc.
There are other families involving bit manipulation, branching, immediate bytes, and so on. (Obviously, some of them overlap.)
More obscure machine instructions
Some machine instructions are unusual enough that almost anyone might need to read about them now and then. Different programmers will disagree about what is really obscure. Some possibilities might be EX, TRT, TR, MVN, MVO, MVCL, CLCL, BXLE, BXH, and anything having to do with floating-point numbers or vector processing. Some instructions are "privileged" and cannot be used in ordinary application programming.
Some of these do sometimes show up in this course and other courses. In the past, CSCI 465 assignments have sometimes required the use of EX and TRT. (The CSCI 468 course sometimes makes use of MVCL.)
There are also many machine instructions which are intended for the use of only the systems programmers. (For instance, some are used for programming I/O channels.)
Comment on coding lengths in instructions
Some machine instructions such as MVC and all packed instructions need to have lengths for the fields involved.
If the length is not coded and the field in question is specified by name, the assembler can usually determine the length for itself from the field's definition in storage. (This does not always work exactly as desired.) If this cannot be done--for instance, if we attempt to specify the field by giving its address in a register--the length used is seldom what we want.
For this reason, many people recommend that the lengths should always be coded, not left up to the assembler.
Common assembler instructions
Assembler instructions are different from machine instructions in that they do not directly generate machine code. Instead, assembler instructions tell the assembler how to do its job and how much data to include in the listing.
CSECT This is the beginning of a control section.
DSECT This is the beginning of a dummy control section, which
contains a description of a portion of storage (DS
statements) rather than executable code. It must be
attached to a register with a USING statement before it
can be used.
START This is like CSECT, but causes the addresses to begin at a
specified value.
LTORG This is optional; it marks the beginning of the literal
pool.
USING This is used to establish a base register for a CSECT or
to attach a DSECT to a register (in fact the same
process).
DROP This undoes the attachment done by a USING.
END This marks the end of the source program. Its use is
absolutely mandatory.
ORG This is used to alter the value of the location counter
(temporarily). (This makes it possible to redefine items
in storage, etc.)
TITLE This causes the specified title to be printed at the
top of each page in the listing.
PRINT GEN/NOGEN This controls whether the expanded form of macros will
(PRINT GEN) be included in the listing or not (PRINT
NOGEN). This can be switched on and off as needed.
PRINT has other options as well.
EJECT This causes a page break in the listing.
EQU This is used to assign a value to a symbol; a search-
and replace operation is done before assembly begins.
ENTRY This identifies symbols (usually labels) in a module
so they can be referred to in another module. This
makes them entry points. CSECT names are
automatically entry points.
EXTRN This identifies symbols which are referred to in a
module but which are defined in another module (where
they may be listed in ENTRY statements).
Fairly unusual assembler instructions
It is unlikely in CSCI 465 that you will need OPSYN, ISEQ, CNOP, REPRO, COM, etc. Some of them are interesting, and you may want to read about them.
How numbers are stored in assembly language
Numbers are stored in several different formats, and there is no way to avoid having to know about them.
Zoned Decimal format
A number is stored as a sequence of bytes which are interpreted as EBCDIC values.
Example: The value 321 would be stored as F3 F2 F1, where F3 is the EBCDIC value for '3', etc.
The first hex digit of each byte contains a letter (the zone digit) and the second contains a digit. The zone digits are usually all 'F' except for the zone digit in the rightmost byte, which is interpreted as the sign: values 'B' and 'D' indicate a negative number and other values indicate a positive number.
Among other things, this means that the zoned-decimal representation of a number is not unique.
Example: Both F1 F2 and F1 E2 represent the value 12. If we compare them a byte at a time using CLC, they are different.
Example: F3 F2 C1 is interpreted as positive, as the rightmost zone digit is 'C'. F3 F2 D1 is interpreted as negative. If we compare these using CLC, we would find F3 F2 D1 is larger than F3 F2 C1, which is not correct.
Packed Decimal format
A number is stored as a sequence of decimal digits followed by a sign digit. The sign digit follows the same rules as for the rightmost zone digit in a zoned decimal number. Again, the representation is not unique.
Example: The value +69173 would be stored as 3 bytes: as 69 17 3F or as 69 17 3C (or two other ways). The value -2045893 would be stored as 4 bytes: as 20 45 89 3D or as 20 45 89 3B.
Because we have a whole number of bytes, we always have an odd number of digits in a packed decimal number.
Again, because of the sign digits, we must be careful how we compare packed decimal numbers. If we use CLC to compare 69 17 3C and 69 17 3D, which respectively represent +69173 and ?69173, we will get a wrong answer. Packed decimal numbers should be compared only with CP.
Binary format
A number is stored as a signed 16-bit number (in a half word) or 32-bit number (in a full word) in base two (binary). When we write out the value, we often convert it to base 16 (hex).
Example: The value +240 would be stored as 2 bytes: 00 F0. If we stored it as 4 bytes, we would have 00 00 00 F0. The value -2 would be stored as 2 bytes: FF FE. If we stored it as 4 bytes, we would have FF FF FF FE.
Instructions for Converting between formats
To convert from binary format to packed decimal format, use CVD. This converts a binary value stored in a register to packed decimal format and stores it in an 8-byte packed decimal value (on a double word boundary).
To convert from packed decimal format to binary format, use CVB. This converts an 8-byte packed decimal value (on a double word value) to binary format and stores it in a register.
To convert from packed decimal format to zoned decimal format, there are two choices: (a) use UNPK and then (if necessary) use OI to fix the sign digit in the last byte; or (b) use ED or EDMK. UNPK needs two lengths, one for the zoned decimal field and one for the packed decimal field. ED and EDMK each need one length.
To convert from zoned decimal format to packed decimal format, use PACK. This will need two lengths, one for the zoned decimal field and one for the packed decimal field.
There is no way to convert between the binary and zoned decimal formats in one step.
A note about arithmetic
There are families of instructions for doing arithmetic with numbers in binary and packed decimal formats. There is no way to do arithmetic with zoned decimal numbers without first converting them to packed decimal format.
Floating Point formats
There are also two formats for storing numbers in what is essentially scientific notation, using a sign, a mantissa and an exponent. You can ignore these for the purposes of CSCI 465.
Standard linkage
An assembly language program can call subprograms. A subprogram can be either internal or external.
Internal Subprograms
An internal subprogram is contained in the same CSECT as the code that calls it, and looks something like this:
To call an internal subprogram, we branch to the label, like this:
BAL 14,labelHere BAL causes the address of the instruction immediately following the BAL to be put into register 14. At the end of the subprogram, we branch back to the address in register 14:
BR 14It is also possible and common to pass parameters to an internal subprogram, as in calling an external subprogram, and to have a return code in register 15 when the subprogram ends, as well.
As an internal subprogram is in the same CSECT, it has access to the same storage as the rest of the program (providing it is addressable).
External Subprograms
An external subprogram is a separate CSECT. It may be part of the same assembly, or it may be assembled separately. Calling an external subprogram is a bit more complicated:
LA 1, PARMLIST L 15,=V(SUB) BALR 14,15Here (by a well-established convention) register 1 points to the parameter list (a list of the addresses of the parameters being passed to SUB). We use a V-con to load the address of the subprogram's entry point into register 15; this must be a V-con as the address will be known at run time but is not yet known at assembly time. BALR causes the address of the instruction following the BALR to be put in register 14.
For an external subprogram, we need the usual entry linkage:
SUB CSECT STM 14,12,12(13) LR 12,15 USING SUB,12 LA 14,SUBSAVE ST 14,8(0,13) ST 13,4(0,14) LR 13,14and exit linkage:
L 13,4(0,13) LM 14,12,12(13) LA 15,0 BR 14Both of these should look familiar: we need to save registers, establish a base register, cross-link save areas, etc., and later, restore the registers. Here SUBSAVE is an 18-full word save area for SUB. At the end of the exit linkage, we branch to the address in register 14 to return to the calling program.
There may be minor variations in the linkage code; for instance, we might use a base register other than 12, or we might want to provide a return code in register 15. It is possible to call an entry point in another program other than the CSECT name itself.
Run-time errors
Here are few of the ways in which an assembly language program can produce an ABEND. There are a number of others, including various kinds of overflow and underflow exceptions, specification exceptions, decimal-divide exceptions, specification exceptions, execute exceptions, etc. These are all listed in the yellow card. Here 'S0Cn' is a system-generated return code.
S0C1 Operation exception This is caused by an attempt to execute something not executable. This usually means we have somehow branched into our program's storage. S0C4 Protection exception This is caused by an attempt to write to (or, in rare cases, read from) locations in memory which belong to someone else. S0C7 Data exception This is caused by an attempt to do packed arithmetic or comparisons with fields not containing valid packed decimal numbers. The PACK and UNPK instructions will never cause a data exception, and there is no way to produce a data exception by doing binary arithmetic.
Debugging
If we have a run-time error in an assembly-language program, we need to debug it. We have various pieces of information: the return code (S0Cn), the PSW, the values of the 16 registers, and (sometimes) a dump. All of this should be readily available in the printout. The format for the PSW is in the yellow card. We also have our own documentation for the program, including which registers are used for which purposes.
There are also macros (XDUMP, XSNAP, SNAP) which can be useful in debugging.
How to use macros
In CSCI 465, you are not normally asked to write macros of your own, but you do need to know how to use common library macros. In professional quality assembly language code, a great deal of use is made of existing macros. IBM provides many of these for our use, and (as at many other installations), NIU has a family of macros of its own as well.
It is important to bear in mind that standard IBM macros may and often do change the values of registers 0, 1, 14 and 15.
In some cases, macros may call external programs, and these may place requirements on the JCL for the assembler and other steps.
Where to go to find additional information
The first couple of these you should own and keep at hand. The IBM manuals can be found on line at the IBM BookManager site, and the CSCI 465 web site maintains a list of links to some of them. The N.I.U. bookstore often has some of the IBM manuals for sale, and you may want to consider buying some of them. (IBM manuals are surprisingly inexpensive.) It is also possible to download the manuals (legally and free) in any of several formats such as PDF or HTML.
You may also want to become familiar with the Utilities manual or the manual for the Binder.
Assembler Language With ASSIST and ASSIST/I, 4th edition Overbeek and Singletary, Prentice Hall This is the textbook for the CSCI 360 course. It contains material not usually covered in that course on topics such as QSAM and some NIU macros. IBM System 370/Reference Summary, GX20-1850 This is the so-called "yellow card", also known as the "green card". It is a good place to find the formats of the various instructions, the EBCDIC character set, a list of carriage-control characters, a list of exceptions (S0C0, S0C1, S0C2, etc.), the layout of the PSW, and so on. Principles of Operation, SA22-7201 Among other things, this provides IBM's explanation of how each machine instruction actually works. High Level Assembler Language Reference, SC26-4940 This describes the format of an assembly language program, the formats of machine instructions, the various assembler instructions, and the macro language. High Level Assembler Programmer's Guide, SC26-4941 This describes the assembler output, the assembler options (to be used on the EXEC line), the assembler's JCL, linkage and the assembler diagnostic messages. Assembly Services Reference, GC28-1910 This is where you can find details about standard IBM macros such as TIME, ABEND, SNAP, GETMAIN, FREEMAIN and many more. Assembly Services Guide, GC28-1762 This is where you can find details about how to use the standard IBM macros and services. It is a good place to learn about advanced topics. Using Data Sets, SC26-4922 This discusses various access methods, record formats, and file formats, and it is a guide to the usual sequence of events in using files and access methods. Macro Instructions for Data Sets, SC26-4913 This is where you can find details about macros used for access methods. For QSAM, these would be DCB, OPEN, CLOSE, GET, PUT, PUTX. (VSAM, another access method, uses a different assortment of macros.)