Brief Review of Assembly Language for CSCI 465

The prerequisite for CSCI 465 is CSCI 360, "Computer Programming in Assembly Language". This document is intended as a reminder of some of the more important aspects of assembly language. It is not a substitute for CSCI 360. There is more to say about nearly every topic than is mentioned here. This presentation is oriented to CSCI 465 and does not contain all the details and comments that might be needed in other courses such as CSCI 464 or 468.


Elementary background trivia

We have 16 general-purpose registers, 0 through 15. Some of these are usually reserved for specific purposes or are affected by specific machine instructions or by macros. We also have special registers such as the condition code, the location counter and the instruction length. There are other sets of registers as well, such as floating-point registers and access registers; do not worry about them.

A word, also known as a full word, is 4 bytes or 32 bits long. This is also the size of a register value, and it is also large enough to hold an address (24 bits or 31 bits). We normally use 24-bit addresses, corresponding to the BC mode of the PSW.

A good rule of thumb about assembly language is that the most common thing to put in a register is an address.

The general format of an instruction is:

         a label (optional, 1 to 8 characters)
         one or more blanks
         a mnemonic for an operation code
         one or more blanks
         zero or more operands separated by commas
         one or more blanks
         a comment (optional in theory, but we require line documentation)

Assembly language is not especially column-sensitive. The mnemonic usually starts in column 10 or later, and the operands usually start in column 16 or later. If it is necessary to continue coding from one line to the next, there are requirements: Put a non-blank character in column 72 and continue in (very specifically) column 16 of the next line.

We use columns 1-71. Column 72 is used only to indicate continuation from one line to the next. We do not use columns 73-80 at all.

An entire line can be made into a comment by putting an asterisk ('*') in column 1.

Do not leave a line blank. The assembler sees this as the end of one assembly and the beginning of the next, and it will complain.

The character set used for labels and variable names consists of: letters 'A' through 'Z' and 'a' through 'z', digits '0' through '9', national characters '@', '$' and '#', and the underscore character '_'. Character literals and comments may use any characters at all. Lower-case letters 'a' through 'z' are considered as identical to the matching upper-case letters 'A' through 'Z'. Assembly language is not case-sensitive.

Mainframe assembly language uses the EBCDIC character set, not ASCII. Thus lower-case letters have lower values than upper-case letters, which have lower values than digits ('a' < 'A' < '1').


Program structure

A simple structure for an assembly language program might be:

         name    CSECT
                 entry linkage               
                 (unload the parameter list if necessary)
                 (code)
                 exit linkage
                 LTORG
                 (storage)
                 END    name

The structure might be much more complex. We could have more than one CSECT, we could be using in-line macros, and we could have DSECTs. The storage could be located elsewhere, or in several places, but we must be careful to branch around it to avoid trying to execute something not executable.


Very common machine instructions

You should know how to use these without having to look them up. You should know a good many other instructions almost as well.

In general, an address here is something that evaluates to an address, such as a label or a D(B) address or sometimes a D(X,B) address. It is useful to remember that with the exception of store instructions and CVD, most machine instructions move data from right to left. It is also useful to remember that some instructions are fussy about boundaries: L, ST, etc.

Load                        L   register,address

     Load a full word value into a register from an address (on a full word 
     boundary).   

Load Address                LA  register,address

     Load an address into a register.

Load Multiple               LM  registerfirst,registerlast,address

     Load full word values into several consecutive registers from an 
     address (on a full word boundary).

Store                       ST  register,address

     Store the contents of a register at an address (on a full word boundary).

Store Multiple               STM registerfirst,registerlast,address

     Store the contents of several consecutive registers at an address (on a
     full word boundary).

Move Character               MVC address1(length),address2  

     Copy N = length bytes from the second address to the first address.  The
     length is at most 256.

Move Character Immediate     MVI address,immediate byte

     Copy the hard-coded immediate byte to an address.  (Do not confuse this 
     with MVC with a length of 1.)

Compare Character            CLC address1(length),address2

     Compare N = length bytes at the two addresses, one byte at a time, left
     to right.  The length is at most 256.

Compare Character Immediate  CLI address,immediate byte 

     Compare the byte at the address to the hard-coded immediate byte.  (Do not 
     confuse this with CLC with a length of 1.)

Compare                      C   register,address

     Compare a register value to the full word value at an address (on a full
     word boundary).

Branch on Condition          BC  mask,address

     Compare the condition code to the mask, and if they match, branch to the
     address.  This is frequently used with one of the mnemonics instead of a 
     mask:  BE, BH, BL, BZ, etc.


Common families of instructions

It can be difficult to remember the various machine instructions and choose the one you need. Many of the instructions fall into families:

packed arithmetic:      AP, SP, MP, ZAP, CP, etc.

register arithmetic:    AR, SR, MR, XR, etc.

involving half words:   LH, STH, AH, SH, etc.

comparisons:            C, CR, CP, CLI, CLC, etc.

load instructions:      L, LR, LA, LM, etc.

move instructions:      MVC, MVI, MVCL, etc.

store instructions:     ST, STM, STCM, etc.

There are other families involving bit manipulation, branching, immediate bytes, and so on. (Obviously, some of them overlap.)


More obscure machine instructions

Some machine instructions are unusual enough that almost anyone might need to read about them now and then. Different programmers will disagree about what is really obscure. Some possibilities might be EX, TRT, TR, MVN, MVO, MVCL, CLCL, BXLE, BXH, and anything having to do with floating-point numbers or vector processing. Some instructions are "privileged" and cannot be used in ordinary application programming.

Some of these do sometimes show up in this course and other courses. In the past, CSCI 465 assignments have sometimes required the use of EX and TRT. (The CSCI 468 course sometimes makes use of MVCL.)

There are also many machine instructions which are intended for the use of only the systems programmers. (For instance, some are used for programming I/O channels.)


Comment on coding lengths in instructions

Some machine instructions such as MVC and all packed instructions need to have lengths for the fields involved.

If the length is not coded and the field in question is specified by name, the assembler can usually determine the length for itself from the field's definition in storage. (This does not always work exactly as desired.) If this cannot be done--for instance, if we attempt to specify the field by giving its address in a register--the length used is seldom what we want.

For this reason, many people recommend that the lengths should always be coded, not left up to the assembler.


Common assembler instructions

Assembler instructions are different from machine instructions in that they do not directly generate machine code. Instead, assembler instructions tell the assembler how to do its job and how much data to include in the listing.

CSECT                This is the beginning of a control section.

DSECT                This is the beginning of a dummy control section, which
                     contains a description of a portion of storage (DS
                     statements) rather than executable code.  It must be 
                     attached to a register with a USING statement before it 
                     can be used.                    

START                This is like CSECT, but causes the addresses to begin at a
                     specified value.

LTORG                This is optional; it marks the beginning of the literal                                        
                     pool.

USING                This is used to establish a base register for a CSECT or
                     to attach a DSECT to a register (in fact the same 
                     process).         

DROP                 This undoes the attachment done by a USING. 

END                  This marks the end of the source program.  Its use is
                     absolutely mandatory.

ORG                  This is used to alter the value of the location counter 
                     (temporarily).  (This makes it possible to redefine items 
                     in storage, etc.)

TITLE                This causes the specified title to be printed at the 
                     top of each page in the listing.  

PRINT GEN/NOGEN      This controls whether the expanded form of macros will 
                     (PRINT GEN) be included in the listing or not (PRINT 
                     NOGEN).  This can be switched on and off as needed.
                     PRINT has other options as well.

EJECT                This causes a page break in the listing.

EQU                  This is used to assign a value to a symbol; a search-
                     and replace operation is done before assembly begins.

ENTRY                This identifies symbols (usually labels) in a module 
                     so they can be referred to in another module.  This 
                     makes them entry points.  CSECT names are 
                     automatically entry points.

EXTRN                This identifies symbols which are referred to in a 
                     module but which are defined in another module (where 
                     they may be listed in ENTRY statements).


Fairly unusual assembler instructions

It is unlikely in CSCI 465 that you will need OPSYN, ISEQ, CNOP, REPRO, COM, etc. Some of them are interesting, and you may want to read about them.


How numbers are stored in assembly language

Numbers are stored in several different formats, and there is no way to avoid having to know about them.

Zoned Decimal format

A number is stored as a sequence of bytes which are interpreted as EBCDIC values.

Example: The value 321 would be stored as F3 F2 F1, where F3 is the EBCDIC value for '3', etc.

The first hex digit of each byte contains a letter (the zone digit) and the second contains a digit. The zone digits are usually all 'F' except for the zone digit in the rightmost byte, which is interpreted as the sign: values 'B' and 'D' indicate a negative number and other values indicate a positive number.

Among other things, this means that the zoned-decimal representation of a number is not unique.

Example: Both F1 F2 and F1 E2 represent the value 12. If we compare them a byte at a time using CLC, they are different.

Example: F3 F2 C1 is interpreted as positive, as the rightmost zone digit is 'C'. F3 F2 D1 is interpreted as negative. If we compare these using CLC, we would find F3 F2 D1 is larger than F3 F2 C1, which is not correct.

Packed Decimal format

A number is stored as a sequence of decimal digits followed by a sign digit. The sign digit follows the same rules as for the rightmost zone digit in a zoned decimal number. Again, the representation is not unique.

Example: The value +69173 would be stored as 3 bytes: as 69 17 3F or as 69 17 3C (or two other ways). The value -2045893 would be stored as 4 bytes: as 20 45 89 3D or as 20 45 89 3B.

Because we have a whole number of bytes, we always have an odd number of digits in a packed decimal number.

Again, because of the sign digits, we must be careful how we compare packed decimal numbers. If we use CLC to compare 69 17 3C and 69 17 3D, which respectively represent +69173 and ?69173, we will get a wrong answer. Packed decimal numbers should be compared only with CP.

Binary format

A number is stored as a signed 16-bit number (in a half word) or 32-bit number (in a full word) in base two (binary). When we write out the value, we often convert it to base 16 (hex).

Example: The value +240 would be stored as 2 bytes: 00 F0. If we stored it as 4 bytes, we would have 00 00 00 F0. The value -2 would be stored as 2 bytes: FF FE. If we stored it as 4 bytes, we would have FF FF FF FE.

Instructions for Converting between formats

To convert from binary format to packed decimal format, use CVD. This converts a binary value stored in a register to packed decimal format and stores it in an 8-byte packed decimal value (on a double word boundary).

To convert from packed decimal format to binary format, use CVB. This converts an 8-byte packed decimal value (on a double word value) to binary format and stores it in a register.

To convert from packed decimal format to zoned decimal format, there are two choices: (a) use UNPK and then (if necessary) use OI to fix the sign digit in the last byte; or (b) use ED or EDMK. UNPK needs two lengths, one for the zoned decimal field and one for the packed decimal field. ED and EDMK each need one length.

To convert from zoned decimal format to packed decimal format, use PACK. This will need two lengths, one for the zoned decimal field and one for the packed decimal field.

There is no way to convert between the binary and zoned decimal formats in one step.

A note about arithmetic

There are families of instructions for doing arithmetic with numbers in binary and packed decimal formats. There is no way to do arithmetic with zoned decimal numbers without first converting them to packed decimal format.

Floating Point formats

There are also two formats for storing numbers in what is essentially scientific notation, using a sign, a mantissa and an exponent. You can ignore these for the purposes of CSCI 465.


Standard linkage

An assembly language program can call subprograms. A subprogram can be either internal or external.

Internal Subprograms

An internal subprogram is contained in the same CSECT as the code that calls it, and looks something like this:


To call an internal subprogram, we branch to the label, like this:

                 BAL   14,label

Here BAL causes the address of the instruction immediately following the BAL to be put into register 14. At the end of the subprogram, we branch back to the address in register 14:

                 BR    14

It is also possible and common to pass parameters to an internal subprogram, as in calling an external subprogram, and to have a return code in register 15 when the subprogram ends, as well.

As an internal subprogram is in the same CSECT, it has access to the same storage as the rest of the program (providing it is addressable).

External Subprograms

An external subprogram is a separate CSECT. It may be part of the same assembly, or it may be assembled separately. Calling an external subprogram is a bit more complicated:

         LA     1, PARMLIST      
         L      15,=V(SUB)       
         BALR   14,15

Here (by a well-established convention) register 1 points to the parameter list (a list of the addresses of the parameters being passed to SUB). We use a V-con to load the address of the subprogram's entry point into register 15; this must be a V-con as the address will be known at run time but is not yet known at assembly time. BALR causes the address of the instruction following the BALR to be put in register 14.

For an external subprogram, we need the usual entry linkage:

SUB      CSECT 
         STM    14,12,12(13) 
         LR     12,15 
         USING  SUB,12 
         LA     14,SUBSAVE 
         ST     14,8(0,13) 
         ST     13,4(0,14)
         LR     13,14

and exit linkage:

         L      13,4(0,13)
         LM     14,12,12(13)
         LA     15,0 
         BR     14

Both of these should look familiar: we need to save registers, establish a base register, cross-link save areas, etc., and later, restore the registers. Here SUBSAVE is an 18-full word save area for SUB. At the end of the exit linkage, we branch to the address in register 14 to return to the calling program.

There may be minor variations in the linkage code; for instance, we might use a base register other than 12, or we might want to provide a return code in register 15. It is possible to call an entry point in another program other than the CSECT name itself.


Run-time errors

Here are few of the ways in which an assembly language program can produce an ABEND. There are a number of others, including various kinds of overflow and underflow exceptions, specification exceptions, decimal-divide exceptions, specification exceptions, execute exceptions, etc. These are all listed in the yellow card. Here 'S0Cn' is a system-generated return code.

S0C1                        Operation exception
     This is caused by an attempt to execute something not executable.  This 
     usually means we have somehow branched into our program's storage.

S0C4                        Protection exception
     This is caused by an attempt to write to (or, in rare cases, read from) 
     locations in memory which belong to someone else.

S0C7                        Data exception
     This is caused by an attempt to do packed arithmetic or comparisons with 
     fields not containing valid packed decimal numbers.  The PACK and UNPK 
     instructions will never cause a data exception, and there is no way to produce 
     a data exception by doing binary arithmetic.


Debugging

If we have a run-time error in an assembly-language program, we need to debug it. We have various pieces of information: the return code (S0Cn), the PSW, the values of the 16 registers, and (sometimes) a dump. All of this should be readily available in the printout. The format for the PSW is in the yellow card. We also have our own documentation for the program, including which registers are used for which purposes.

There are also macros (XDUMP, XSNAP, SNAP) which can be useful in debugging.


How to use macros

In CSCI 465, you are not normally asked to write macros of your own, but you do need to know how to use common library macros. In professional quality assembly language code, a great deal of use is made of existing macros. IBM provides many of these for our use, and (as at many other installations), NIU has a family of macros of its own as well.

It is important to bear in mind that standard IBM macros may and often do change the values of registers 0, 1, 14 and 15.

In some cases, macros may call external programs, and these may place requirements on the JCL for the assembler and other steps.


Where to go to find additional information

The first couple of these you should own and keep at hand. The IBM manuals can be found on line at the IBM BookManager site, and the CSCI 465 web site maintains a list of links to some of them. The N.I.U. bookstore often has some of the IBM manuals for sale, and you may want to consider buying some of them. (IBM manuals are surprisingly inexpensive.) It is also possible to download the manuals (legally and free) in any of several formats such as PDF or HTML.

You may also want to become familiar with the Utilities manual or the manual for the Binder.

Assembler Language With ASSIST and ASSIST/I, 4th edition
         Overbeek and Singletary, Prentice Hall
     
     This is the textbook for the CSCI 360 course.  It contains material not 
usually covered in that course on topics such as QSAM and some NIU macros. 

IBM System 370/Reference Summary, GX20-1850

     This is the so-called "yellow card", also known as the "green card". It is 
a good place to find the formats of the various instructions, the EBCDIC 
character set, a list of carriage-control characters, a list of exceptions 
(S0C0, S0C1, S0C2, etc.), the layout of the PSW, and so on. 

Principles of Operation, SA22-7201 

Among other things, this provides IBM's explanation of how each machine 
instruction actually works.

High Level Assembler Language Reference, SC26-4940 

     This describes the format of an assembly language program, the formats of 
machine instructions, the various assembler instructions, and the macro 
language. 

High Level Assembler Programmer's Guide, SC26-4941
 
     This describes the assembler output, the assembler options (to be used on 
the EXEC line), the assembler's JCL, linkage and the assembler diagnostic 
messages.

Assembly Services Reference, GC28-1910

     This is where you can find details about standard IBM macros such as TIME, 
ABEND, SNAP, GETMAIN, FREEMAIN and many more.

Assembly Services Guide, GC28-1762 

    This is where you can find details about how to use the standard IBM macros  
and services.  It is a good place to learn about advanced topics.

Using Data Sets, SC26-4922

     This discusses various access methods, record formats, and file formats, 
and it is a guide to the usual sequence of events in using files and access 
methods.

Macro Instructions for Data Sets, SC26-4913 

     This is where you can find details about macros used for access methods.  
For QSAM, these would be DCB, OPEN, CLOSE, GET, PUT, PUTX.   (VSAM, another 
access method, uses a different assortment of macros.)