Reverse engineering PDP-11 BASIC: Part 19

daveor
Mar 3, 2021
8 min read

Updated: Mar 4, 2021

In this post I'll describe user-defined functions and the BASIC DEF command, starting with some TRAPs that support its operation (TRAP 140, TRAP 126 and TRAP 110) and then describing the DEF command itself.

For context and a list of other posts on this topic, see the PDP-11 BASIC reverse engineering project page.

Function and variable names

BASIC, like pretty much any other programming language, allows the use of user-defined functions and variables. However, in the case of PDP-11 BASIC, the functionality is restricted in a number of ways that will be quite surprising to a modern-day programmer.

Firstly, user-defined function names in PDP-11 BASIC can only be three characters long and they have to be of the form "FN[A-Z]".

Similarly, there are also restrictions on the possible names for variables. Variable names must be a minimum of one character in length and can be a maximum of two characters in length. The first character must be a letter and the second character, if present, must be a number.

Not exactly structural requirement that are going to lead to self-commenting code!

Defining functions

User-defined functions are created using the DEF command. They can take a single parameter and can consist of any valid arithmetic expression.

Here are a few examples:

DEF FNA(X) = X * 2
DEF FNB(Y) = SIN(Y)
DEF FNC(Z) = COS(10 * (Z+2))

If a function with the same name is defined more than once, the first definition is used and subsequent definitions are ignored.

Representing user-defined functions in running state storage

See Part 11 if you need more background on the running state storage. User defined functions are parsed and stored into the runtime state storage in the following way:

Function identifier (1 word): 60000 OR'd with the ASCII code of the letter identifier of the function name.
Variable identifier (1 word): A word-encoded representation of the variable name.
Expression location (1 word): The memory address of the function expression.

TRAP 140

When evaluating expressions involving functions, instead of comparing the first character "F" and then having a separate test for the second character "N" to see whether we might be dealing with a function, TRAP 140 is used to convert two ASCII characters into a single value for comparison and testing. This TRAP is used, for example, in the BASIC DEF command handler, described below.

TRAP 140 converts two ASCII characters, pointed to by R1, into a single word representation and returns the resulting value in R4.

Here's the code:

002400 104472 TRAP 72
002402 010204 MOV R2, R4
002404 000304 SWAB R4
002406 104472 TRAP 72
002410 050204 BIS R2, R4
002412 000207 RTS PC

TRAP 72 is used to get the next non-whitespace character pointed to by R1 and store it in R2. This value is then moved into R4. The bytes in R4 are swapped, so the high order byte now contains the first ASCII character.

Then, TRAP 72 is used to get the next non-whitespace character, which will be the second character of the variable name. This is then OR'd with the value in R4, so now R4 contains both bytes of the variable name - the high byte of R4 contains the first ASCII character and the low byte contains the second ASCII character.

Here's an example of its use:

004474 104540 TRAP 140
004476 020427 CMP R4, #43116

This gets the next two characters in the BASIC command, pointed to by R1 and converts them into a single value stored on R1. That value is then compared to the constant 43116. This constant is the result you would get by performing the TRAP 140 calculation on "FN". Therefore, this comparison tests whether the two characters just read from the BASIC command were "FN".

TRAP 126

TRAP 126 is used to parse a variable name from the command sequence. Here's the code:

001724 104472 TRAP 72
001726 104470 TRAP 70
001730 001416 BEQ 1766
001732 102415 BVS 1766
001734 042702 BIC #177700, R2
001740 010204 MOV R2, R4
001742 000304 SWAB R4
001744 006204 ASR R4
001746 006204 ASR R4
001750 104472 TRAP 72
001752 104470 TRAP 70
001754 001002 BNE 1762
001756 050204 BIS R2, R4
001760 104472 TRAP 72
001762 000257 CCC
001764 000207 RTS PC
001766 000262 SEV
001770 000207 RTS PC

Let's see how this works.

001724 104472 TRAP 72

First, the next non-whitespace character pointed to by R1 is extracted and stored in R2.

001726 104470 TRAP 70
001730 001416 BEQ 1766
001732 102415 BVS 1766

The first character of a variable name must be a letter. Therefore, the value in R2 is tested using TRAP 70. If it is numeric (i.e. zero flag is set) this is invalid so control jumps to 1766 where the overflow flag is set and then control is returned. Similarity, if the value in R2 is invalid (i.e. the overflow flag is set) control also jumps to 1766.

Otherwise, we have identified a valid first character for a variable name:

001734 042702 BIC #177700, R2
001740 010204 MOV R2, R4
001742 000304 SWAB R4
001744 006204 ASR R4
001746 006204 ASR R4

In this case, all bits of R2 are cleared except the lowest six (where an ASCII code would be stored). The value is then moved from R2 to R4.

Next, the bytes of R4 are swapped, which is the same as multiplying the value in R4 by 256. Then the value is shifted right twice, which is the same as dividing by 4.

What this calculation does, if you consider the range of ASCII values of the upper-case letters, is set up a situation where R4 will always have a value of the following form (in binary): 0001 xxxx xxxx xxxx. Since variables will, in due course, be stored in the runtime state, this means that variables have a "type" of 10000 (octal). See Part 11 for more detail on the runtime state storage.

001750 104472 TRAP 72
001752 104470 TRAP 70
001754 001002 BNE 1762

After identifying a valid first character for the variable name, the next non-whitespace character is obtained using TRAP 72 to see if it is part of the variable name. The value stored in R2 is tested using TRAP 70 to see if it is numeric, in which case the zero flag will be set. If the zero flag is not set, then the next character was not a number, meaning that it is not part of the variable name, so control branches to 1762 to return the single-character variable name that has been identified.

Otherwise, a numeric value has been identified:

001756 050204 BIS R2, R4
001760 104472 TRAP 72

In this case the ASCII code of the digit, now stored in R2, is OR'd with the value in R4 (which is the ASCII value of the first character of the variable name, shifted left by six bytes).

Then, the next non-whitespace character is obtained before returning.

001762 000257 CCC
001764 000207 RTS PC
001766 000262 SEV
001770 000207 RTS PC

Finally, we set the flags and return. If a valid variable name has been identified all flags are cleared before returning. If an error was encountered, or no valid variable name could be identified, the overflow flag is set before returning. The variable name identifier is returned in R4.

TRAP 110

TRAP 110 is used to skip ahead to the end of the current command. The end of the current command is identified by either a ":", indicating another command on the same line, or a linefeed, indicating the end of the line.

Here's the code:

001236 121127 CMPB (R1), #72
001242 001404 BEQ 1254
001244 122127 CMPB (R1)+, #12
001250 001372 BNE 1236
001252 005301 DEC R1
001254 000207 RTS PC

The value pointed to by R1 is compared to 72 (the ASCII code for ":"). If it is equal, then we are already at the end of the command so branch to 1254 and return. Otherwise, the value at R1 is compared to linefeed, and then R1 is incremented. If the value at R1 is not equal to linefeed then control branches back to the beginning of the TRAP to test the next character.

If we identify a linefeed, then R1 is decremented and then control returns.

So, at the end of TRAP 110, R1 will point at the terminating character of the current command, which will be either ":" or linefeed.

BASIC DEF command

The BASIC DEF command is used to create a user-defined function. With the help of the TRAPs just analysed, we can now understand how this works.

Here's the code:

004474 104540 TRAP 140
004476 020427 CMP R4, #43116
004502 001033 BNE 4572
004504 104472 TRAP 72
004506 104470 TRAP 70
004510 001430 BEQ 4572
004512 102427 BVS 4572
004514 052702 BIS #60000, R2
004520 010200 MOV R2, R0
004522 104512 TRAP 112
004524 104472 TRAP 72
004526 020227 CMP R2, #50
004532 001017 BNE 4572
004534 104526 TRAP 126
004536 102415 BVS 4572
004540 010400 MOV R4, R0
004542 104512 TRAP 112
004544 020227 CMP R2, #51
004550 001010 BNE 4572
004552 104472 TRAP 72
004554 020227 CMP R2, #75
004560 001004 BNE 4572
004562 010100 MOV R1, R0
004564 104512 TRAP 112
004566 104510 TRAP 110
004570 000735 BR 4464
004572 104437 TRAP 37

Let's see how this works.

004474 104540 TRAP 140
004476 020427 CMP R4, #43116
004502 001033 BNE 4572

First, TRAP 140 is used to get two ASCII characters and assemble them into a word. The resulting value is then compared to the encoded character pair "FN", which is represented by the value 43116. If the comparison is not equal to zero, in other words the first two characters are not equal to "FN", control branches to 4572 to generate an error.

004504 104472 TRAP 72
004506 104470 TRAP 70
004510 001430 BEQ 4572
004512 102427 BVS 4572

If the first two characters were "FN", then the next character is read from the input string into R2. The value is tested to see whether it is numeric. If it is numeric (result of TRAP 70 is equal to zero) or invalid (overflow flag set by TRAP 70) then control branches to 4572 to generate an error.

004514 052702 BIS #60000, R2
004520 010200 MOV R2, R0
004522 104512 TRAP 112

The value of the character in R2 is OR'd with 60000 to create a function identifier and the resulting value is moved to R0. TRAP 112 is then used to push the value of R0 into the running state storage.

004524 104472 TRAP 72
004526 020227 CMP R2, #50
004532 001017 BNE 4572

TRAP 72 is used to get the next non-whitespace character, which is compared to "(" (ASCII 50). If the resulting comparison does not return a zero result (meaning the next character was not an open bracket) control jumps to 4527 to generate an error.

004534 104526 TRAP 126
004536 102415 BVS 4572
004540 010400 MOV R4, R0
004542 104512 TRAP 112

TRAP 126 is used to get the variable name, encoded as a word value, and returns the result in R4. If overflow flag is set, that means there was an error parsing the variable name, in which case control jumps to 4572 to generate an error. Otherwise, the variable identifier is moved from R4 to R0 and then TRAP 112 is used to store the value in the runtime state storage.

004544 020227 CMP R2, #51
004550 001010 BNE 4572

TRAP 126 will return the next character after the variable name in R2. So, the value in R2 is compared to ")" (ASCII 51). If not equal, control jumps to 4572 to generate an error and return.

004552 104472 TRAP 72
004554 020227 CMP R2, #75
004560 001004 BNE 4572

The next non-whitespace character is placed in R2 using TRAP 72. The value is then compared to "=" (ASCII 75). If not equal, control jumps to 4572 to generate an error and return.

004562 010100 MOV R1, R0
004564 104512 TRAP 112

The location of the expression, now pointed to by R1 is moved into R0 and then stored in the runtime state storage using TRAP 112.

004566 104510 TRAP 110
004570 000735 BR 4464

TRAP 110 is used to skip to the end of the command, with will either be when a linefeed or a ":" is encountered. At this point, control branches to 4464 which jumps to execute the next command.

004572 104437 TRAP 37

All error conditions are generated using this TRAP 37.

Reverse engineering PDP-11 BASIC: Part 19

Recent Posts

Comments