Search
  • daveor

Reverse engineering PDP-11 BASIC: Part 21

In this post we'll look at the expression evaluation code. Expressions are used all over the place and the main entry point to parsing them is TRAP 136.


I normally aim to make my posts about a ten minute read, but this post is much longer than usual because I couldn't think of a sensible way to break this analysis into smaller pieces without making it more difficult to follow.


For context and a list of other posts on this topic, see the PDP-11 BASIC reverse engineering project page.


Before tackling the code itself I need to provide some background information. Expressions can be used all over BASIC programs. For example in the LET and FOR commands:

LET X = <expression>

FOR X = <expression> TO <expression> STEP <expression>

Expressions can range in complexity from very simple to arbitrarily complicated. Here, for example, the "expression" is the constant value 10:

LET X = 10

In the second example below, the value of X is the result of the series of mathematical operations shown, added to the result of the user-defined function FNA() called with the sub-expression "Y" (not shown).

LET X = (((2 * 5) ^ 3) - (6 * 2)) + FNA(Y)

Expression syntax

The syntax of expressions is described in detail in the PDP-11 BASIC programming manual (sections 2.5.1 to 2.5.5 and chapter 4 being the most relevant sections). I'll provide a brief summary here before beginning the analysis.


Expressions can consist of a series of:

  1. Brackets

  2. Constants

  3. Variables

  4. Functions

  5. Operators

Brackets (i.e. "(" and ")") are used to set the precedence of operations. Portions of expressions in brackets are evaluated first, beginning with the innermost groupings and working outwards. Numeric constant values can be used in expressions, specified as either integers, decimal values or numbers in exponential format.


A described in Part 19, variables can be used to represent the result of evaluating expressions. Variable names must be a minimum of one character in length and can be a maximum of two characters in length. The first character must be a letter and the second character, if present, must be a number.


A series of built-in functions can be used in expressions. They are (SIN, COS, ATN, EXP, LOG, ABS, SQR, INT, RND, SGN and EXF). Also, as described in Part 19, user-defined function can be created whose names in PDP-11 BASIC can only be three characters long and they have to be of the form "FN[A-Z]".


Finally, the valid operators, in order of precedence, are "+", "-", "*", "/" and "^".


TRAP 136

This is the TRAP used for expression evaluation. The value in R1 is the memory location of the beginning of the expression to be evaluated, and the result (floating point) is returned in R2/R3/R4.


In the case of a BASIC instruction like "LET X=10", TRAP 136 is invoked with the value in R1 being the memory location of the beginning of the expression, which in this case will be the memory location of the start of the string "10".


TRAP 136 is very large so in the remaining sections of this post I will break down the TRAP code into several sections to make describing how it works a bit easier.


Initial setup and parsing

The first part of the TRAP is the initial setup. This code also does some initial expression parsing:

  1. Expressions may have a "+" at the beginning, which will be ignored.

  2. Expressions may have a preceding "-", which will mean that the entire value needs to be negated once it is evaluated.

  3. Expressions may be enclosed in brackets, so opening brackets are parsed in this code.

004574 005000 CLR R0
004576 104512 TRAP 112
004600 012746 MOV #177777, -(SP)
004604 104504 TRAP 104
004606 103561 BCS 5152
004610 104472 TRAP 72
004612 020227 CMP R2, #53
004616 001410 BEQ 4640
004620 020227 CMP R2, #55
004624 001006 BNE 4642
004626 010200 MOV R2, R0
004630 005002 CLR R2
004632 005003 CLR R3
004634 005004 CLR R4
004636 000410 BR 4660
004640 104472 TRAP 72
004642 020227 CMP R2, #50
004646 001007 BNE 4666
004650 005046 CLR -(SP)
004652 005265 INC -2(R5)
004656 000752 BR 4604
004660 104542 TRAP 142
004662 010046 MOV R0, -(SP)
004664 000765 BR 4640
004666 005301 DEC R1
004670 014546 MOV -(R5), -(SP)
004672 004767 JSR PC, 5200

Let's see how this works.

004574 005000 CLR R0
004576 104512 TRAP 112
004600 012746 MOV #177777, -(SP)

Firstly, the value in R0 is cleared and then TRAP 112 is used to push the value in R0 into the running state storage. Next, the value -1 is pushed onto the stack.

004604 104504 TRAP 104
004606 103561 BCS 5152

TRAP 104 is used to check if there is enough free memory to execute the command. If there is not enough space, the carry flag will be set, in which case control jumps to 5152 to cause an error to be returned.

004610 104472 TRAP 72

Next, TRAP 72 is used to get the next non-whitespace character from R1 and stored in R2.

004612 020227 CMP R2, #53
004616 001410 BEQ 4640
004620 020227 CMP R2, #55
004624 001006 BNE 4642
004626 010200 MOV R2, R0
004630 005002 CLR R2
004632 005003 CLR R3
004634 005004 CLR R4
004636 000410 BR 4660

The first character of the expression is compared to "+" (ASCII 53) and if character matches, control jumps to location 4640. If the character does not match, the character is compared to "-" (ASCII 55) and if the character is anything other than minus, control jumps to 4642. Finally, in the case that the first character is a minus, the minus character (currently stored in R2) is moved to R0. The values of R2, R3 and R4 are cleared and then control jumps to 4660.

004640 104472 TRAP 72

In the case that the first character was a "+" TRAP 72 is used to get the next non-whitespace character.

004642 020227 CMP R2, #50
004646 001007 BNE 4666
004650 005046 CLR -(SP)
004652 005265 INC -2(R5)
004656 000752 BR 4604

In the case that the character was anything other than a minus, the value is compared to "(" (ASCII 50) and if it is anything other than an opening bracket control jumps to 4666. Otherwise, when a bracket is identified, a zero word is pushed onto the stack and the value at R5-2 (i.e. the zero word pushed into the running state storage at the beginning of the TRAP) is incremented. Control then jumps to 4604 to parse the next character.

004660 104542 TRAP 142
004662 010046 MOV R0, -(SP)
004664 000765 BR 4640

If a minus character is found at the beginning of the expression, TRAP 142 is used to push R2, R3 and R4 onto the stack. R0 is also pushed onto the stack and then control jumps back up to 4640 to parse the next character.

004666 005301 DEC R1
004670 014546 MOV -(R5), -(SP)
004672 004767 JSR PC, 5200
004676 012625 MOV (SP)+, (R5)+

For all characters other than initial brackets, plusses and minuses the R1 pointer is decremented to re-add the character that was extracted from the string by TRAP 72.

Next the value from the top of the running state storage is taken from the running state storage and pushed onto the stack. This value is the number of opening brackets at the beginning of the expression.


Once all preceding brackes, plusses and minuses have been parsed, a subroutine is invoked to parse floating point values, built-in functions and user-defined functions, which will be described in the next five sections;

  1. Numeric value parsing

  2. Built-in function parsing

  3. Built-in function evaluation

  4. User-defined function parsing

  5. User-defined function evaluation

The subroutine returns a floating point value in R2, R3 and R4. When the subroutine returns, the top of the running state storage is restored from the stack.


Numeric value parsing

The first part of the subroutine involves the parsing of numeric values. Here's the code:

005200 010146 MOV R1, -(SP)
005202 104472 TRAP 72
005204 104470 TRAP 70
005206 102417 BVS 5246
005210 001022 BNE 5256
005212 012601 MOV (SP)+, R1
005214 162706 SUB #6, SP
005220 010600 MOV SP, R0
005222 104406 TRAP 6
005224 102764 BVS 5176
005226 012602 MOV (SP)+, R2
005230 012603 MOV (SP)+, R3
005232 012604 MOV (SP)+, R4
005234 000207 RTS PC
005236 012002 MOV (R0)+, R2
005240 012003 MOV (R0)+, R3
005242 012004 MOV (R0)+, R4
005244 000207 RTS PC
005246 020227 CMP R2, #56
005252 001757 BEQ 5212
005254 000471 BR 5440

Let's take a look at how this works.

005200 010146 MOV R1, -(SP)
005202 104472 TRAP 72
005204 104470 TRAP 70
005206 102417 BVS 5246
005210 001022 BNE 5256
005212 012601 MOV (SP)+, R1

Firstly, R1 is pushed onto the stack and then TRAP 72 is used to get the next non-whitespace character from the string at R1. The resulting value is returned in R2. TRAP 70 is then used to check the nature of the character. If the character is non-alphanumeric, the overflow flag is set and control jumps to 5246:

005246 020227 CMP R2, #56
005252 001757 BEQ 5212
005254 000471 BR 5440n

This code deals with one edge-case where a number of the form ".1" is specified. The value in R2 is compared to the decimal point (ASCII 56). If equal, control jumps to 5212 because the number may be valid and otherwise the character is invalid so control jumps (indirectly) to the "tidy up and return" code.


Returning to the main piece of code being examined, in the case that the character being examined is numeric the zero flag will be set. If the zero flag is not set, the remainder of the number parsing code is skipped by control jumping to 5256.


Finally, in the case that the zero flag is set, meaning that a number has been identified, R1 is popped from the stack and execution continues:

005214 162706 SUB #6, SP
005220 010600 MOV SP, R0
005222 104406 TRAP 6
005224 102764 BVS 5176
005226 012602 MOV (SP)+, R2
005230 012603 MOV (SP)+, R3
005232 012604 MOV (SP)+, R4
005234 000207 RTS PC

Six is subtracted from the stack pointer, which allocates three words on the stack. The stack pointer is then copied to R0. TRAP 6 is used to convert characters from the ASCII string pointed to by R1 into a floating point number at the memory locations pointed by R0. TRAP 6 is described in detail in Part 18.


If the overflow flag is set, that means that an error parsing the floating point number has occurred, in which case control jumps to 5176 to return an error:

005176 104467 TRAP 67

Otherwise, the returned floating point number is moved from the stack into registers R2, R3 and R4 and then control returns from the subroutine.


The last few instructions contained in the block of code above are actually nothing to do with parsing a float. They are actually the last few instructions of the main exit point from the subroutine:

005236 012002 MOV (R0)+, R2
005240 012003 MOV (R0)+, R3
005242 012004 MOV (R0)+, R4
005244 000207 RTS PC

Three words, representing a floating point value, are moved from the memory location contained in R0 to registers R2, R3 and R4 and then control returns from the subroutine.


Built-in function parsing

In the case where the first character examined above is non-numeric, control jumps to the second section in the parsing subroutine, to check for the use of a built-in function. Built-in function entries are stored in a lookup table. The table consists of two parts, firstly, there are a series of unique values, one for each built-in function. This is followed, in the same order, by the memory location of the code that implements each specific function.


The unique values are generated as follows, using "SIN" as an example and remembering that all numbers are in octal:

  1. Take the lowest six bits of the ASCII code for "S", which is the value 23, and multiply it by 44. This gives the value 1254.

  2. Take the lowest six bits of the ASCII code for "I", which is the value 11, and add this to the 1254 calculated in the previous step. Multiply the result by 44. This gives the value 60564.

  3. Take the lowest six bits of the ASCII code for "N", which is the value 16 (octal), and add this to the 60564 calculated in the previous step. This gives the value 60602.

Here's the lookup table (annotated) and if you note that the first entry has the value 60602, which means this is the entry for the "SIN" function.

005734 060602 ; This is the encoding of "SIN"
005736 010537 ; This is the encoding of "COS"
005740 003756 ; This is the encoding of "ATN"
005742 016300 ; This is the encoding of "EXP"
005744 037343 ; This is the encoding of "LOG"
005746 002553 ; This is the encoding of "ABS"
005750 061246 ; This is the encoding of "SQR"
005752 027634 ; This is the encoding of "INT"
005754 056434 ; This is the encoding of "RND"
005756 060472 ; This is the encoding of "SGN"
005760 016266 ; This is the encoding of "EXF"
005762 015020 ; this is the pointer to the code for SIN
005764 015162 ; this is the pointer to the code for COS
005766 015260 ; this is the pointer to the code for ATN
005770 014260 ; this is the pointer to the code for EXP
005772 013764 ; this is the pointer to the code for LOG
005774 012470 ; this is the pointer to the code for ABS
005776 015642 ; this is the pointer to the code for SQR
006000 013372 ; this is the pointer to the code for INT
006002 010004 ; this is the pointer to the code for RND
006004 012522 ; this is the pointer to the code for SGN
006006 017204 ; this is the pointer to the code for EXF

Notice that the memory location of each function is found at the location of the matching identifier plus 20 bytes (24 in octal), so the location of the SIN code is memory location 15020. I will describe the operation of all of these functions in future posts.


As a quick aside, if you refer back to Part 5, where I was describing the operation of the BASIC setup code, you might remember that some of the startup options involve deleting certain built-in functions so that user programs can have more memory. The deletion is achieved by clearing the "unique values" for the relevant functions in this table, which will mean they will never be located and the corresponding memory areas can be reused.


The built-in function parsing code calculates a unique value from the input string and compares it against the list of unique values for the known built-in functions:

005256 020227 CMP R2, #106
005262 001473 BEQ 5452
005264 012746 MOV #177700, -(SP)
005270 041602 BIC (SP), R2
005272 010200 MOV R2, R0
005274 104530 TRAP 130
005276 104530 TRAP 130
005300 104472 TRAP 72
005302 104470 TRAP 70
005304 102454 BVS 5436
005306 001453 BEQ 5436
005310 041602 BIC (SP), R2
005312 060200 ADD R2, R0
005314 104530 TRAP 130
005316 104530 TRAP 130
005320 104472 TRAP 72
005322 104470 TRAP 70
005324 102444 BVS 5436
005326 001443 BEQ 5436
005330 042602 BIC (SP)+, R2
005332 060200 ADD R2, R0
005334 012703 MOV #5734, R3
005340 022300 CMP (R3)+, R0
005342 001404 BEQ 5354
005344 020327 CMP R3, #5762
005350 103773 BCS 5340
005352 000552 BR 5700

Let's see how this code works.

005256 020227 CMP R2, #106
005262 001473 BEQ 5452

Firstly, the current character is compared to the character "F". None of the built-in functions start with an "F", and user-defined functions must start with an "F", so this test distinguishes between possible built-in functions and possible user-defined functions. If the first character is "F" then control jumps to 5452 which is described below in the user-defined function parsing section.


Otherwise, execution continues to see whether a built-in function has been used.

005264 012746 MOV #177700, -(SP)

Firstly, the value 177700 is pushed onto the stack. In binary this value is "1 111 111 111 000 000". This is the mask that is used to mask off the lowest six bits of the ASCII codes for incorporation into the unique value calculation described earlier.

005270 041602 BIC (SP), R2
005272 010200 MOV R2, R0
005274 104530 TRAP 130
005276 104530 TRAP 130

The current character is masked using the mask just pushed onto the stack. This is achieved using the BIC (bit clear) instruction. This operates by clearing all bits in the second operand that are matched by a one in the first operand. In other words, this will mask all except the lowest six bits of R2. The resulting value is then moved to R0. TRAP 130 is then used to multiply the value in R0 by six, and then again to multiply by another six. Remember that in octal, 6 x 6 = 44 (which is decimal 36).

005300 104472 TRAP 72
005302 104470 TRAP 70
005304 102454 BVS 5436
005306 001453 BEQ 5436

Next, TRAP 72 is used to get the next non-whitespace character and TRAP 70 is used to test the nature of the character. Noting that all of the built-in functions have names that are three letters in length, if we encounter a number or a non-alphanumeric character that means, by definition, we are not looking at a built-in function. So, if a non-alphanumeric character is identified (as represented by the overflow flag being set by TRAP 70) or a numeric character is identified (as represented by the zero flag being set by TRAP 70) we branch to 5436 which will (indirectly) jump to the "tidy up and return" code by (a) popping the mask value off the stack and discarding it and (b) branching to 5700, which is the "tidy up and return" code, as shown here:

005436 005726 TST (SP)+
005440 000517 BR 5700

Continuing with the main code:

005310 041602 BIC (SP), R2
005312 060200 ADD R2, R0
005314 104530 TRAP 130
005316 104530 TRAP 130

Otherwise we have encountered another letter, in which case we once again mask off all but the lowest six bits and add it to the value in R0 (contains the running total of the calculation of the unique value). Then the result is multiplied by six twice using TRAP 130.

005320 104472 TRAP 72
005322 104470 TRAP 70
005324 102444 BVS 5436
005326 001443 BEQ 5436

We repeat the exact same process again, of getting a new character and returning from the subroutine if it is numeric or non-alphanumeric.

005330 042602 BIC (SP)+, R2
005332 060200 ADD R2, R0

Otherwise, if we have encountered a third letter, all but the lowest six bits are masked and the value is added to R0. We now have our unique value, stored in R0, to compare to the list of built-in functions.

005334 012703 MOV #5734, R3
005340 022300 CMP (R3)+, R0
005342 001404 BEQ 5354
005344 020327 CMP R3, #5762
005350 103773 BCS 5340
005352 000552 BR 5700

The value 5734 is moved into R3. This is the memory location of the beginning of the built-in function lookup table. The value pointed to by R3 is compared to the unique value we have just calculated and post-incremented so it will be ready for the next time around the loop if needed. If the values are equal we branch out of this loop to the built-in function evaluation code, described in the next section.


Otherwise the value in R3 is compared to 5762 to test whether we have reached the end of the built-in function lookup table. If not, branch to 5340 to test the next entry in the lookup table.


If no matching value was found in the lookup table we branch to 5700 to tidy up and return.


Built-in function evaluation

If we identify a matching built-in function, we now need to evaluate that function and return the result. That is the purpose of the next piece of code:

005354 104472 TRAP 72
005356 020227 CMP R2, #50
005362 001146 BNE 5700
005364 016346 MOV 24(R3), -(SP)
005370 104536 TRAP 136
005372 102023 BVC 5442
005374 012600 MOV (SP)+, R0
005376 104542 TRAP 142
005400 010002 MOV R0, R2
005402 010600 MOV SP, R0
005404 010146 MOV R1, -(SP)
005406 010001 MOV R0, R1
005410 162706 SUB #6, SP
005414 010600 MOV SP, R0
005416 004712 JSR PC, (R2)
005420 012602 MOV (SP)+, R2
005422 012603 MOV (SP)+, R3
005424 012604 MOV (SP)+, R4
005426 012601 MOV (SP)+, R1
005430 062706 ADD #10, SP
005434 000207 RTS PC

So let's take a look at how this works.

005354 104472 TRAP 72
005356 020227 CMP R2, #50
005362 001146 BNE 5700

First, TRAP 72 is used to get the next non-whitespace character. This character is then compared to "(" (ASCII 50) because a valid function name will be followed by an argument in brackets, so the next non-whitespace character after the function name will be an opening bracket. If the next character is not a bracket, control jumps to 5700 to tidy up and return.

005364 016346 MOV 24(R3), -(SP)

Next, the location of the function code for the identified function (which is found in the lookup table at R3+24) is pushed onto the stack.

005370 104536 TRAP 136
005372 102023 BVC 5442

The argument to the function may, itself, be any expression so TRAP 136 is used to evaluate the expression that is the argument to the builtin function. The result will be a floating point value stored in R2/R3/R4.


If TRAP 136 encounters a closing bracket as the last character in the expression being evaluated, the overflow flag will be set. If this does not occur, the overflow flag will be clear and control branches to 5442. This is relevant in the case of the EXF() function, which may have multiple arguments separated by commas. Only the first argument is parsed by TRAP 136, with the remainder of the arguments being processed directly by the assembly language code invoked by EXF().


At location 5442 we find the following:

005442 021627 CMP (SP), #17204
005446 001036 BNE 5544
005450 000751 BR 5374

We compare the value at the top of the stack (the location of the built-in function being invoked) to 17204 (the location of EXF()). If not equal (i.e. we are not evaluating EXF()), and we encounter anything other than a closing bracket at the end of the expression, we jump to 5544 to generate an error. Otherwise we branch back up to 5374 to continue executing.

005374 012600 MOV (SP)+, R0
005376 104542 TRAP 142
005400 010002 MOV R0, R2
005402 010600 MOV SP, R0
005404 010146 MOV R1, -(SP)
005406 010001 MOV R0, R1
005410 162706 SUB #6, SP
005414 010600 MOV SP, R0
005416 004712 JSR PC, (R2)

Now that we have evaluated the argument to the built-in function, we begin preparations to invoke the function. Firstly, the value from the top of the stack, which is the memory location of the built-in function, is popped and stored in R0. TRAP 142 is then used to store the value of the function argument from registers R2/R3/R4 onto the stack.


The location of the function is then copied from R0 into R2 and the location of the argument on the stack (i.e. the current value of the stack pointer) is copied to R0.


Next R1 is pushed onto the stack and then R0 is copied into R1. Six is subtracted from the stack pointer, allocating three words on the stack, and the location of the three words (i.e. the current value of the stack pointer) is copied to R0.


The code at the memory location stored in R2, which is the built-in function, is now invoked. The value of the argument is pointed to by the value in register R1 and the result will be returned in the memory locations pointed to by R0.

005420 012602 MOV (SP)+, R2
005422 012603 MOV (SP)+, R3
005424 012604 MOV (SP)+, R4
005426 012601 MOV (SP)+, R1
005430 062706 ADD #10, SP
005434 000207 RTS PC

Once the built-in function has been evaluated, the result is copied from the stack to R2/R3/R4 and R1 is also restored from the stack. Ten is then added to the stack pointer, which will discard four more words from the stack (the three words of the argument value plus the word containing the memory location of the built-in function). Control then returns from the subroutine.


This would conclude the execution of the subroutine in the case of a built-in function. Let's now take a look at the execution of a user-defined function.


User-defined function parsing

This code is invoked in the case where the built-in function parsing code, described above, identifies that the function name string begins with an "F". None of the built-in functions begin with "F", but user-defined function names must be of the form FN[A-Z].

005452 104472 TRAP 72
005454 020227 CMP R2, #116
005460 001107 BNE 5700
005462 104472 TRAP 72
005464 104470 TRAP 70
005466 102504 BVS 5700
005470 001503 BEQ 5700
005472 104534 TRAP 134
005474 001507 BEQ 5714
005476 005000 CLR R0
005500 052702 BIS #60000, R2
005504 010204 MOV R2, R4
005506 104514 TRAP 114
005510 001473 BEQ 5700

Let's see what this does.

005452 104472 TRAP 72
005454 020227 CMP R2, #116
005460 001107 BNE 5700

Having already identified an "F", we now get the next non-whitespace character and compare it to "N" (ASCII 116). If it is not equal, we branch to 5700 to tidy up and return.

005462 104472 TRAP 72
005464 104470 TRAP 70
005466 102504 BVS 5700
005470 001503 BEQ 5700

Having encountered "FN", we now expect that the next character will be another letter. Therefore, TRAP 72 is used to get the next non-whitespace character and TRAP 70 is used to test the nature of the character. If it is a non-alphanumeric character (indicated by the overflow flag being set) or numeric (indicated by the zero flag being set) then control jumps to 5700 to tidy up and return.


Otherwise the ASCII code of the letter that uniquely represents a specific user-defined function is now stored in R2.

005472 104534 TRAP 134
005474 001507 BEQ 5714
005476 005000 CLR R0

TRAP 134 is used to set the value in R3 to the beginning of the runtime state storage area. If the zero flag is set, that means an error has occurred, so control jumps to 5714 to return. Otherwise the value in R0 is cleared and execution continues below.


For an overview of how user-defined functions are stored in the runtime state storage, see Part 19 of this series. Briefly, user defined functions are parsed and stored as entries in the runtime state storage in the following way:

  1. Function identifier (1 word): 60000 OR'd with the ASCII code of the letter identifier of the function name.

  2. Variable identifier (1 word): A word-encoded representation of the variable name.

  3. Expression location (1 word): The memory address of the function expression.

005500 052702 BIS #60000, R2
005504 010204 MOV R2, R4
005506 104514 TRAP 114
005510 001473 BEQ 5700

The letter in R2 that uniquely identifies the current function is OR'd with the value 60000 to produce the identifier of the function in the runtime state storage. This value is then copied to R4 and TRAP 114 is used to search the runtime state for the function entry. If the entry is located, the memory location of the entry in the runtime state will be stored in R3. If no entry is found the zero flag is set which will cause the code to branch to 5700 in order to tidy up and return.


User-defined function evaluation

If the entry is found in the runtime state, we now move on to evaluation of the user-defined function. Firstly, remember when a function is defined, it is defined using syntax like this:

DEF FNA(X) = X+3

where the part after the equals sign can be any valid expression. Then, when the function is used, it would look like this for example:

LET Z = FNA(Y+2)

where the part in the brackets, the argument to the function, can also be any valid expression.


Therefore, user-defined function evaluation takes place in two stages;

  1. First evaluate the argument, which can be any expression, and store that in the runtime state as the name of the variable specified in the function definition.

  2. Then evaluate the function expression, using the variable value calculated in the previous step, and return this value.

Picking up the code where we left of in the last section, we have located the user-defined function entry in the runtime state storage and this consists of three words; the identifying value, the function variable identifier, the pointer to the string representing the function expression.

005512 005723 TST (R3)+
005514 012304 MOV (R3)+, R4
005516 012346 MOV (R3)+, -(SP)
005520 104534 TRAP 134
005522 104514 TRAP 114
005524 001027 BNE 5604

Firstly, the value at R3 is tested and incremented, effectively skipping over the identifier word in the function entry. The variable identifier is then moved into R4 and the pointer to the function expression is pushed onto the stack.


TRAP 134 is used to set R3 to the beginning of the runtime state storage and TRAP 114 is then used to search the runtime state storage for a value with the identifier stored in R4. In other words, the runtime state storage is searched for an entry representing the variable in the function definition.


If this entry is found (i.e. the zero flag is not set) then control jumps to 5604. Otherwise a value for the function argument has not yet been calculated, so we need to evaluate the argument to the function. Here is the code that evaluates the function argument:

005526 104472 TRAP 72
005530 020227 CMP R2, #50
005534 001340 BNE 5436
005536 010446 MOV R4, -(SP) 
005540 104536 TRAP 136
005542 102401 BVS 5546
005544 104417 TRAP 17
005546 012600 MOV (SP)+, R0
005550 010546 MOV R5, -(SP)
005552 104512 TRAP 112
005554 005000 CLR R0
005556 104512 TRAP 112
005560 104550 TRAP 150
005562 010146 MOV R1, -(SP)
005564 016601 MOV 4(SP), R1
005570 104536 TRAP 136
005572 102764 BVS 5544
005574 012601 MOV (SP)+, R1
005576 012605 MOV (SP)+, R5
005600 022626 CMP (SP)+, (SP)+
005602 000207 RTS PC

Let's see how this works.

005526 104472 TRAP 72
005530 020227 CMP R2, #50
005534 001340 BNE 5436

TRAP 72 is used to get the next non-whitespace character from the string and this is compared to "(" (ASCII 50), because the argument to the function should be enclosed in brackets. If the next character in the string is not an opening bracket, the comparison will be non-zero and control will jump to 5436, which is an indirect branch to the tidy up and return code.

005536 010446 MOV R4, -(SP) 
005540 104536 TRAP 136
005542 102401 BVS 5546
005544 104417 TRAP 17

Otherwise, an open bracket is encountered, so the expression after the opening bracket needs to be evaluated. R4, which contains the variable identifier, is pushed onto the stack.


TRAP 136 is then called to evaluate the expression. In this case, we expect the expression to be terminated by a closing bracket, so we expect the overflow flag to be set on return from TRAP 136. If the overflow flag is not set, TRAP 17 is invoked to return an error.

005546 012600 MOV (SP)+, R0
005550 010546 MOV R5, -(SP)
005552 104512 TRAP 112
005554 005000 CLR R0
005556 104512 TRAP 112
005560 104550 TRAP 150

In this case, TRAP 136 has returned a valid value and the overflow flag was set as expected. Therefore, the value of the function argument is contained in the registers R2/R3/R4. So, the value from the top of the stack, which is the variable identifier, is popped from the stack into R0. R5 is then pushed onto the stack.


TRAP 112 is used to push the value from R0, the variable identifier, into into the runtime storage state. R0 is then cleared and pushed into the runtime storage state. Finally, TRAP 150 is used to store the calculated value of the function argument into the runtime storage state.

005562 010146 MOV R1, -(SP)
005564 016601 MOV 4(SP), R1
005570 104536 TRAP 136
005572 102764 BVS 5544

R1 is pushed onto the stack and then SP+4 (the memory location of the function expression) is copied into R1. TRAP 136 is then invoked again to evaluate the function expression. If the overflow flag is set by TRAP 136 then control jumps to 5544 to generate an error.

005574 012601 MOV (SP)+, R1
005576 012605 MOV (SP)+, R5
005600 022626 CMP (SP)+, (SP)+
005602 000207 RTS PC

Now that the function has been evaluated the result can be found in R2/R3/R4, so all that remains is to pop R1 and R5 from the stack and discard two other words from the stack. Control then returns from the subroutine.


R1, which generally holds the pointer to the current location being parsed in the program, needs to be reset back to the beginning of the user-defined function because TRAP 136 will be called to parse the user-defined function a second time. Now that function argument has been evaluated and is now stored in the runtime state storage the other execution path, described below, will be used.


The alternative execution path for user-defined function evaluation relates to the situation where the code above has already been executed and the function argument value has already been calculated, in which case R3 will point at the variable entry in the runtime state storage:

005604 022323 CMP (R3)+, (R3)+
005606 012600 MOV (SP)+, R0
005610 012346 MOV (R3)+, -(SP)
005612 012346 MOV (R3)+, -(SP)
005614 012346 MOV (R3)+, -(SP)
005616 010346 MOV R3, -(SP)
005620 010046 MOV R0, -(SP)
005622 104472 TRAP 72
005624 020227 CMP R2, #50
005630 001036 BNE 5726
005632 104536 TRAP 136
005634 102343 BVC 5544
005636 010100 MOV R1, R0
005640 016601 MOV 2(SP), R1
005644 010441 MOV R4, -(R1)
005646 010341 MOV R3, -(R1)
005650 010241 MOV R2, -(R1)
005652 012601 MOV (SP)+, R1
005654 010046 MOV R0, -(SP)
005656 104536 TRAP 136
005660 102731 BVS 5544
005662 012601 MOV (SP)+, R1
005664 012600 MOV (SP)+, R0
005666 012640 MOV (SP)+, -(R0)
005670 012640 MOV (SP)+, -(R0)
005672 012640 MOV (SP)+, -(R0)
005674 005726 TST (SP)+
005676 000207 RTS PC

Let's see how this works.

005604 022323 CMP (R3)+, (R3)+
005606 012600 MOV (SP)+, R0
005610 012346 MOV (R3)+, -(SP)
005612 012346 MOV (R3)+, -(SP)
005614 012346 MOV (R3)+, -(SP)
005616 010346 MOV R3, -(SP)
005620 010046 MOV R0, -(SP)

Two words of the variable representation (the variable identifier and the zero word) are skipped. The location of the function expression, currently stored on the top of the stack, is copied into R0.


The three words of the floating-point value of the function argument are moved onto the stack. The value of the registers R3 and R0 are pushed onto the stack.

005622 104472 TRAP 72
005624 020227 CMP R2, #50
005630 001036 BNE 5726

The next non-whitespace character is read from the string and compared to the "(" character (ASCII 50). If the character is not an open bracket then control jumps to 5726 which indirectly jumps to the tidy up and return code.

005632 104536 TRAP 136
005634 102343 BVC 5544

TRAP 136 is used to evaluate the expression between the brackets in the function (i.e. the function argument). If the overflow flag is clear (meaning that the last character parsed was not a bracket) control branches to 5544 to generate an error.

005636 010100 MOV R1, R0
005640 016601 MOV 2(SP), R1
005644 010441 MOV R4, -(R1)
005646 010341 MOV R3, -(R1)
005650 010241 MOV R2, -(R1)
005652 012601 MOV (SP)+, R1
005654 010046 MOV R0, -(SP)

The location of the string pointer, which will now point at the character after the end of the user-defined function, is copied from R1 into R0. The value from SP+2 (which is the memory location after the function argument in the runtime state storage) is moved into R1. The result of executing TRAP 136 is then copied from registers R4, R3 and R2 into the variable entry in the runtime state storage.


Why is this done, when the function argument was evaluated by the code described earlier in this section? The only scenario I can think of is when a function is used multiple times the argument will need to be evaluated each time.


The value from the top of the stack, which was the original value of Ro containing the location of the function expression, is popped into R1. R0, containing the string location of the end of the user-defined function, is pushed onto the stack.

005656 104536 TRAP 136
005660 102731 BVS 5544

TRAP 136 is then invoked to evaluate the function expression. In this case, we are expecting that the expression will not have ended with a closing bracket, and therefore the overflow flag should not be set. If the overflow flag is set, control jumps to 5544 to generate an error.

005662 012601 MOV (SP)+, R1
005664 012600 MOV (SP)+, R0
005666 012640 MOV (SP)+, -(R0)
005670 012640 MOV (SP)+, -(R0)
005672 012640 MOV (SP)+, -(R0)
005674 005726 TST (SP)+
005676 000207 RTS PC

Finally, the string location of the end of the user-defined function is popped off the stack and copied into R1.


The value of the argument variable that was stored on the stack at the beginning of this code (see instructions starting at 5610) is restored. The location of the argument variable is moved into R0 and then three words are popped from the stack into the next three locations.


One more value, the location of the function expression, is discarded from the stack and then control returns.

The result of the evaluation of the function will be stored in R2/R3/R4 as usual after execution of TRAP 136.


Tidy up and return

In the cases above when the subroutine passes through the "tidy up and return" code, the following instructions are executed:

005700 012601 MOV (SP)+, R1
005702 104544 TRAP 144
005704 102403 BVS 5714
005706 001402 BEQ 5714
005710 000167 JMP 5236
005714 104767 TRAP 367
005716 005002 CLR R2
005720 005003 CLR R3
005722 005004 CLR R4
005724 000207 RTS PC

R1 is popped from the stack and then TRAP 144 is invoked to get the value of the variable identifier stored in R4. The result is returned as a floating point value at the memory address returned in R0. In case of error, or when the variable is not found, control jumps to 5714 which clears the result and returns. Otherwise control jumps to 5236 to move the result from the memory location pointed to by R0 into R2/R3/R4 and then return.


Arithmetic evaluation

When the subroutine to evaluate floats and functions returns, execution of the main TRAP 136 code continues on to the evaluation of arithmetic operators:

004700 010246 MOV R2, -(SP)
004702 104472 TRAP 72
004704 012700 MOV #5163, R0
004710 124002 CMPB -(R0), R2
004712 001407 BEQ 4732
004714 020027 CMP R0, #5155
004720 101373 BHI 4710
004722 005000 CLR R0
004724 005301 DEC R1
004726 012602 MOV (SP)+, R2
004730 000402 BR 4736
004732 010200 MOV R2, R0
004734 000774 BR 4726
004736 005716 TST (SP)
004740 003457 BLE 5100
004742 010146 MOV R1, -(SP)
004744 012701 MOV #5163, R1
004750 124100 CMPB -(R1), R0
004752 001376 BNE 4750
004754 006201 ASR R1
004756 010125 MOV R1, (R5)+
004760 012701 MOV #5163, R1
004764 124166 CMPB -(R1), 2(SP)
004770 001375 BNE 4764
004772 006201 ASR R1
004774 010125 MOV R1, (R5)+
004776 012601 MOV (SP)+, R1
005000 024545 CMP -(R5), -(R5)
005002 002726 BLT 4660
005004 010025 MOV R0, (R5)+
005006 012700 MOV #5163, R0
005012 124016 CMPB -(R0), (SP)
005014 001376 BNE 5012
005016 162700 SUB #5156, R0
005022 006300 ASL R0
005024 062700 ADD #5164, R0
005030 010025 MOV R0, (R5)+
005032 005726 TST (SP)+
005034 010600 MOV SP, R0
005036 104542 TRAP 142
005040 010146 MOV R1, -(SP)
005042 010601 MOV SP, R1
005044 005721 TST (R1)+
005046 014502 MOV -(R5), R2
005050 014546 MOV -(R5), -(SP)
005052 004772 JSR PC, @(R2)
005056 012600 MOV (SP)+, R0
005060 012601 MOV (SP)+, R1
005062 062706 ADD #6, SP
005066 012602 MOV (SP)+, R2
005070 012603 MOV (SP)+, R3
005072 012604 MOV (SP)+, R4
005074 005716 TST (SP)
005076 003321 BGT 4742

Let's see how this bit works.

004700 010246 MOV R2, -(SP)
004702 104472 TRAP 72

First R2, one of the words of the result from the evaluation of the floats and functions, is pushed onto the stack. Then TRAP 72 is used to get the next non-whitespace character and store it in R2.

004704 012700 MOV #5163, R0
004710 124002 CMPB -(R0), R2
004712 001407 BEQ 4732
004714 020027 CMP R0, #5155
004720 101373 BHI 4710

The value 5163 is moved into R0. The value is pre-decremented, and the byte value at the resulting memory location is compared to the value in R2. So what can be found at location 5162? Here are the relevant words:

005154 024400
005156 026453
005160 027452
005162 000136

These bytes are the ASCII string

)+-*/^

With a zero byte before and after. So, the first comparison is to the ASCII code for "^". If the value in R2 matches the current character from the string, control jumps to 4732. Otherwise, the value in R0 is compared to 5155 to check whether all characters in the string have been tested. If not, control jumps back up to 4710 to test the next character.

004722 005000 CLR R0
004724 005301 DEC R1
004726 012602 MOV (SP)+, R2
004730 000402 BR 4736

If all characters are tested and no match is found, R0 is cleared and the string pointer (R1) is decremented. R2 (the word from the result of the floats and functions subroutine) is popped from the stack and control branches to 4736.

004732 010200 MOV R2, R0
004734 000774 BR 4726

Otherwise, in the case a match was found, R2 (the matching ASCII character) is copied into R0 and then control branches to 4726 above, which pops R2 from the stack and then jumps to 4736.

004736 005716 TST (SP)
004740 003457 BLE 5100

A non-zero value will be found at the top of the stack when an arithmetic operation character was encountered in the previous iteration through the expression parser. For example, in the expression "2+3*4", the following sequence of events takes place when TRAP 136 is involed:

  1. The "2" is parsed into a floating point number into R2/R3/R4.

  2. The ASCII code for "+" is stored in R0.

  3. When this test is encountered, control will jump to 5100, because there is not currently an operand on top of the stack, and R2/R3/R4 are pushed onto the stack and then R0 is pushed onto the stack.

  4. Afterwards, the "3" is parsed into R2/R3/R2 and the ASCII code for "*" is stored in R0.

  5. The precedence of the two operands is tested. In this case the "*" is the highest priority so R2/R3/R4 are pushed onto the stack along with R0.

  6. Afterwards, the "4" is parsed into R2/R3/R4.

  7. Next, the 3 and 4 are multiplied and then the result is added to the 2.

We'll see in detail how this works now.


So, the value at the top of the stack is tested and if it is less than or equal to zero (meaning the ASCII code of an arithmetic operation is not found at the top of the stack) control jumps to 5100.


Otherwise execution will continue below, meaning an arithmetic operation was encountered at the top of the stack:

004742 010146 MOV R1, -(SP)

The current parse location in the expression string, stored in R1, is pushed onto the stack.


Next, the priority of the arithmetic operation in R0 is calculated and stored in the runtime state storage:

004744 012701 MOV #5163, R1
004750 124100 CMPB -(R1), R0
004752 001376 BNE 4750
004754 006201 ASR R1
004756 010125 MOV R1, (R5)+

This is achieved by moving the location of the arithmetic operation string (location 5163) into R1. Then the value is pre-decremented and compared to R0 and control branches until a match is found. The string is as follows (with a zero byte before and after):

)+-*/^

Because these values are bytes, they are in words together, with priorities as follows (in ascending order of priority):

  1. zero byte and close bracket have the same (lowest) priority

  2. plus and minus have the same priority

  3. multipy and divide have the same priority

  4. exponent has the highest priority

So, if R0 contains zero, this will match with the zero before the close bracket, representing the lowest possible priority of operation. If there is a match against one of the operands, the value is shifted right (this will remove the lowest bit from the resulting memory location) but it will give the same result for plus and minus, multiply and divide, etc., meaning that this value represents the operator precedence.


This value is then pushed into the runtime state storage.


The same calculation is now performed with the arithmetic operation at the top of the stack. Actually since R1 was pushed onto the stack a few instructions ago, the arithmetic operation can now be found at SP+2:

004760 012701 MOV #5163, R1
004764 124166 CMPB -(R1), 2(SP)
004770 001375 BNE 4764
004772 006201 ASR R1
004774 010125 MOV R1, (R5)+

Once again, this is achieved by moving the location of the arithmetic operation string (location 5163) into R1. Then the value is pre-decremented and compared to SP+2 and control branches until a match is found. When a match is found the value is shifted right to calculate a value representing the operator precedence and then the result is pushed into the runtime state storage.

004776 012601 MOV (SP)+, R1
005000 024545 CMP -(R5), -(R5)
005002 002726 BLT 4660

When the precedence of both arithmetic operators has been stored into the runtime state storage, R1 is restored from the stack and then the two precedence values are compared. If the operator in R0 has a higher precedence, control jumps to 4660 (described above), where the current value in R2/R3/R4 and the operator in R0 are pushed onto the stack and the expression after the operator is parsed.


Otherwise, the operation on the stack is the one that should be carried out, so execution continues:

005004 010025 MOV R0, (R5)+

We begin by storing the other operation (or zero if no other operation has been found) in the runtime state storage.


Then, we calculate the location of the operator function:

005006 012700 MOV #5163, R0
005012 124016 CMPB -(R0), (SP)
005014 001376 BNE 5012
005016 162700 SUB #5156, R0
005022 006300 ASL R0
005024 062700 ADD #5164, R0
005030 010025 MOV R0, (R5)+
005032 005726 TST (SP)+

The location of the arithmetic operation string (location 5163) is moved into R0. Then the value is pre-decremented and compared to the arithmetic operator at the top of the stack, and control branches until a match is found.


When a matching entry is found, 5156 is subtracted, the value is shifted left and 5164 is added. This will give a result that is one of a series of memory locations containing addresses that point to the code for executing each of the arithmetic operations. Here are the values (annotated):

005164 012216 ; this is the memory location of ADDF
005166 012442 ; this is the memory location of SUBF
005170 012776 ; this is the memory location of MULF
005172 012550 ; this is the memory location of DIVF
005174 013714 ; this is the memory location of EXP

Note that the functions are in the same order as in the arithmetic operations string:

)+-*/^

The memory address containing the location of the arithmetic operation code is then stored in the runtime state storage. Now that the arithmetic operation at the top of the stack has been parsed, the actual value on the stack is no longer required, so it is popped off and discarded.


Next we set up the arguments to the arithmetic operation:

005034 010600 MOV SP, R0
005036 104542 TRAP 142
005040 010146 MOV R1, -(SP)
005042 010601 MOV SP, R1
005044 005721 TST (R1)+

One of the floating point nubmer arguments is stored at the top of the stack, so the stack pointer is moved into R0. This is the location of the first argument.


The other argument is stored in R2/R3/R4 so TRAP 142 is used to push this floating point value onto the stack. The value in R1 is pushed onto the stack and then the stack pointer is moved into R1. The value in R1 is tested, but more importantly, it is post-incremented. Therefore, the value in R1 now points at the second argument.

005046 014502 MOV -(R5), R2
005050 014546 MOV -(R5), -(SP)
005052 004772 JSR PC, @(R2)

We're now ready to invoke the arithmetic operation, so the memory location that contains the function location is taken from the runtime state storage and saved in R2. Also, the next value from the runtime state storage, which is the next operand (or zero if no operand was identified), is removed and pushed onto the stack.


Finally, the code at the location specified in the value found at the memory location stored in register R2 (i.e. the relevant arithmetic function) is executed. The result will be stored in the location pointed to by R0.

005056 012600 MOV (SP)+, R0
005060 012601 MOV (SP)+, R1
005062 062706 ADD #6, SP
005066 012602 MOV (SP)+, R2
005070 012603 MOV (SP)+, R3
005072 012604 MOV (SP)+, R4

These instructions are used to tidy and restore registers before continuing. The values of registers R0 and R1 are restored from the stack. Three more values are popped and discarded from the stack - these are the floating point value of the second argument (not the result of the calculation). Finally the floating point value, formerly pointed to by R0, is popped from the stack and stored in registers R2/R3/R4.

005074 005716 TST (SP)
005076 003321 BGT 4742

Finally the value at the top of the stack is tested. This will be the next operand, if one was located, and zero otherwise. If the value is greater than zero then control jumps back to 4742 to handle the next arithmetic operation.


Otherwise, we continue execution in the closing bracket handling code:


Handling closing brackets

Nearly there!


The last section before exiting is to handle closing brackets. Here's the code:

005100 020027 CMP R0, #51
005104 001410 BEQ 5126
005106 005700 TST R0
005110 003263 BGT 4660
005112 005745 TST -(R5)
005114 001003 BNE 5124
005116 005726 TST (SP)+
005120 000257 CCC
005122 000207 RTS PC
005124 104417 TRAP 17
005126 005745 TST -(R5)
005130 001003 BNE 5140
005132 005726 TST (SP)+
005134 000262 SEV
005136 000207 RTS PC

; there are still open brackets
005140 005716 TST (SP)
   ; test the value at the top of the stack
005142 002773 BLT 5132
   ; branch if less than zero to 5132
   ; i.e. we reached the -1 at the bottom of the stack
   ; i.e. no more zeros, meaning no more brackets to be closed
   ; otherwise:
005144 005726 TST (SP)+
   ; pop a zero off the top of the stack and discard
005146 005325 DEC (R5)+
   ; decrement the number of remaining open brackets
005150 000653 BR 4700
   ; branch to 4700 (tidy up and exit)

Let's see how this works.

005100 020027 CMP R0, #51
005104 001410 BEQ 5126

First we check whether the next character, currently in R0, is a closing bracket (ASCII 51). If yes, we branch to 5126. Otherwise:

005106 005700 TST R0
005110 003263 BGT 4660

The next check is to see whether the value in R0 is non-zero. If it is, that means there is an arithmetic operator in R0 so we branch back to 4660 to save the current value of the expression (in R2/R3/R4) and the operator (in R0) into the stack and parse the next component of the expression. Otherwise:

005112 005745 TST -(R5)
005114 001003 BNE 5124
005116 005726 TST (SP)+
005120 000257 CCC
005122 000207 RTS PC
005124 104417 TRAP 17

We check the value at the top of the runtime state storage, which contains the number of open brackets in the current part of the expression. If this is not equal to zero control jumps to 5124 to return an error. Otherwise we pop the last value off the top of the stack (which will be the -1 pushed on at the beginning of TRAP 136), clear all flags, and return.


Otherwise we encountered a closing bracket at the beginning of this block of code, in which case:

005126 005745 TST -(R5)
005130 001003 BNE 5140
005132 005726 TST (SP)+
005134 000262 SEV
005136 000207 RTS PC

We check the value at the top of the runtime state storage, which contains the number of open brackets in the current part of the expression. If this is not equal to zero control jumps to 5140, meaning there are remaining open brackets expected in the current part of the expression being parsed.


We then pop and discard the top value from the stack, which will be the -1 pushed on at the beginning of the TRAP, set the overflow flag to indicate that the expression parsing ended with a closing bracket and return.


Otherwise there are remaining open brackets:

005140 005716 TST (SP)
005142 002773 BLT 5132
005144 005726 TST (SP)+
005146 005325 DEC (R5)+
005150 000653 BR 4700

We test the value at the top of the stack and if it is less than zero that means we have encountered the -1 pushed onto the stack at the beginning of the TRAP, which means there are no more closing brackets to be found. In this case we branch back up to 5132 to set the overflow flag before returning, which indicates that the last character parsed was a closing bracket.


Otherwise there are some closing brackets remaining, so a zero is popped off the stack (matching an opening bracket), the number of currently open brackets as recorded in the runtime state storage is decremented, and control branches back up to 4700 to check for more arithmetic operators, expressions or brackets.


Summary

...And that's it! We've reached the end of this mammoth analysis of TRAP 136. This code is used extensively throughout the remainder of the BASIC code so it will have been worth it!


If you've managed to make it this far, I'm very impressed! Thanks for reading!




16 views0 comments

Recent Posts

See All

Reverse engineering PDP-11 BASIC: Part 26

This short post describes TRAP 116 and the BASIC RUN, STOP, END and RANDOMIZE commands. For context and a list of other posts on this topic, see the PDP-11 BASIC reverse engineering project page. TRAP

Reverse engineering PDP-11 BASIC: Part 25

This post describes the BASIC FOR and NEXT loop commands. For context and a list of other posts on this topic, see the PDP-11 BASIC reverse engineering project page. Introduction Loops are implemented

Reverse engineering PDP-11 BASIC: Part 24

This post describes the BASIC READ, INPUT and RESTORE commands. For context and a list of other posts on this topic, see the PDP-11 BASIC reverse engineering project page. Introduction The INPUT comma