daveor
- Feb 2, 2021
- 7 min read

Reverse engineering PDP-11 BASIC: Part 7

Updated: Feb 9, 2021

The purpose of this post is to lay the groundwork required to complete the analysis of the main syntax parsing loop. There are a few additional TRAP calls used that haven't yet been explained.

NOTE: For context and a list of other posts on this topic, see the PDP-11 BASIC reverse engineering project page.

Refresher of key points

Program code is stored in the memory area defined by the memory address contained in memory location 13662 (the lower limit of the memory area) and the value contained in R5 (the memory location that represents the upper extent of the currently stored code).
The very first byte of the program memory area will contain a linefeed. Then, each subsequent line of the program is terminated by a linefeed character.

TRAP 102

This TRAP is used to move R1 forward to point at the character after the next linefeed. What this, in effect, means is that TRAP 102 skips R1 forward to the next command in the program code area.

Here's the code:

001646 122127 CMPB (R1)+, #12
001652 001375 BNE 1646
001654 000207 RTS PC

It's very simply really, R1 is compared to linefeed (ASCII 12) and then incremented. If R1 did not equal linefeed then loop back, compare and increment R1 again. Otherwise, return from the subroutine.

TRAP 74

This TRAP is used to check whether the line number contained in R0 is present in the program code. At the end of this TRAP:

If the program contains no lines, the overflow flag is set.
If the line number is not found, no flags are set.
If the line number is found in the code, R1 will point at the beginning of the command, R2 will contain the line number and the zero flag is set.

Here's the code:

001656 016701 MOV 13662, R1
001662 104502 TRAP 102
001664 020105 CMP R1, R5
001666 103013 BCC 1716
001670 010046 MOV R0, -(SP)
001672 010146 MOV R1, -(SP)
001674 104410 TRAP 10
001676 012601 MOV (SP)+, R1
001700 010002 MOV R0, R2
001702 012600 MOV (SP)+, R0
001704 020002 CMP R0, R2
001706 001402 BEQ 1714
001710 003364 BGT 1662
001712 000257 CCC
001714 000207 RTS PC
001716 000257 CCC
001720 000262 SEV
001722 000207 RTS PC

Let's work through it line-by-line.

001656 016701 MOV 13662, R1

Firstly, the memory location of the beginning of the program code area (which is stored at memory address 13662) is moved into R1.

001662 104502 TRAP 102

TRAP 102 is then used to move to the character after the next linefeed. Since the program code area begins with a linefeed, this will move R1 to point at the location of the first command in the program code area.

001664 020105 CMP R1, R5
001666 103013 BCC 1716

The value in R1 is compared to the value in R5. R5 stores the end of the program. If they are equal, that means that the program has no lines in it so control jumps to address 1716 to set the overflow flag and return.

001670 010046 MOV R0, -(SP)
001672 010146 MOV R1, -(SP)

R0 and R1 are pushed onto the stack.

001674 104410 TRAP 10

TRAP 10 converts the digits in the string pointed to by R1 into a numeric value, which will be stored in R0.

001676 012601 MOV (SP)+, R1
001700 010002 MOV R0, R2
001702 012600 MOV (SP)+, R0

R1 is popped off the stack. The numeric value calculated by TRAP 10, returned in R0, is moved to R2 and then the original value of R0 is popped from the stock.

001704 020002 CMP R0, R2
001706 001402 BEQ 1714

The value in R0 (the line number we are looking for) is compared to the value in R2 (the line number of the current line). If they are equal control jumps to address 1714 to return from the subroutine.

001710 003364 BGT 1662
001712 000257 CCC
001714 000207 RTS PC

If R0 (the line number we are looking for) is greater than the current line number then jump back to address 1662, to move forward to the next line in the program code area and continue checking. Otherwise R0 is less than R2 and we have therefore passed by the line number we are looking for, in which case the line number we are looking for doesn't exist, so all flags are cleared and control returns from the subroutine.

If the line was identified (i.e. R0 equals R2), the control will have jumped directly to 1714 so control will return with the zero flag set.

001716 000257 CCC
001720 000262 SEV
001722 000207 RTS PC

The last case is when there are no lines of code in which case control will have jumped to 1716. All flags are cleared, the overflow flag is set, and then control returns from the subroutine.

TRAP 76

This TRAP is used to delete the line currently pointed to by R1 from the program code area.

Here's the code area:

; TRAP 76 handler
001620 104516 TRAP 116
001622 010103 MOV R1, R3
001624 010102 MOV R1, R2
001626 104502 TRAP 102
001630 020105 CMP R1, R5
001632 103002 BCC 1640
001634 112123 MOVB (R1)+, (R3)+
001636 000774 BR 1630
001640 010305 MOV R3, R5
001642 010201 MOV R2, R1
001644 000207 RTS PC

Let's see how it works.

001620 104516 TRAP 116

The mysterious TRAP 116...I'm still not sure what this does yet.

001622 010103 MOV R1, R3
001624 010102 MOV R1, R2

The memory address in R1 points at the command we want to delete from the program. Firstly, two copies of this memory location are stored in R2 and R3.

001626 104502 TRAP 102

This TRAP is used to move R1 forward to the next command in the program code area.

001630 020105 CMP R1, R5
001632 103002 BCC 1640
001634 112123 MOVB (R1)+, (R3)+
001636 000774 BR 1630

R1 is then compared to R5. This tests whether we are now at the end of the program. If there are no more lines in the program jump ahead to 1640.

Otherwise, we need to overwrite the current command with the remaining bytes of the program. R1 points at the remaining bytes after the deleted lines and R3 points at the location of the line to be overwritten. A byte is moved from R1 to R3 and then both pointers are incremented. The code then loop around to address 1630 and bytes are copied until R1 reaches the end of the current program code area.

001640 010305 MOV R3, R5

R3 now contains the address of the end of the program code. This value is now moved to R5, where the address of the end of the program code is normally kept.

001642 010201 MOV R2, R1
001644 000207 RTS PC

R2, which contains a backup of the original value of R1 is now moved back to R1 and then control returns from the subroutine.

TRAP 104

The final TRAP for this post, TRAP 104, checks to see whether there is enough space left in the program code memory area to store another command. The number of bytes of memory required to store the command is contained in register R0. The result of this command is that the status flags will be set by the final CMP instruction, such that if there is enough space for the command in the program code area the Carry flag will be clear, otherwise the Carry flag will be set.

Here's the code:

002126 010504 MOV R5, R4
002130 060004 ADD R0, R4
002132 010603 MOV SP, R3
002134 162703 SUB #70, R3
002140 020304 CMP R3, R4
002142 000207 RTS PC

Another nice short routine, so let's see what happens.

002126 010504 MOV R5, R4

Firstly R5 is copied to R4. Remember that R5 contains the largest memory address currently in use by the program code.

002130 060004 ADD R0, R4

R0 is then added to R4. R0 contains the amount of space required to hold the new command. Therefore, R4 will contain the memory address that would be the largest memory address of the program code, were the new command added to the program code area.

002132 010603 MOV SP, R3

The stack pointer is moved into R3. The stack grows downwards from the top of the program code area, so the current value of the stack pointer is a good estimate for the memory location at the top of the available space in the program code area.

002134 162703 SUB #70, R3

70 is subtracted from R3. This is presumably to allow some buffer space so that the stack can continue to grow without overwriting the program code.

002140 020304 CMP R3, R4
002142 000207 RTS PC

Finally, the value in R3, which is the maximum allowable memory address of the program code area, is compared to the value in R4, which is the memory address to which the program code will extend if the new command is added. The result of this comparison represents the output from this subroutine, so then control returns from the subroutine.

The CMP instruction works by subtracting the destination from the source operand. In this case R3-R4. Since R3 and R4 are both positive, if R4 is greater than R3 the result of subtraction will be negative, and hence the N and C flags will be set; the N flag because the result is negative and the C flag because the sign of the result is different to the sign of the operands. This indicates that more memory is required to store the command than is available in the progam code area, so the command cannot be stored.

On the other hand if R3 is greater than R4, meaning that there is enough space in the program code area, when the subtraction is performed the result will be positive and the N and C flags will not be set.

The flags can still be tested on return from the subroutine.

Conclusion

With the understanding of these four TRAPs we can now complete the analysis of the syntax parsing code. I'll get back to that in the next post.

Reverse engineering PDP-11 BASIC: Part 7

Recent Posts