Reverse engineering PDP-11 BASIC: Part 1
Hopefully you're here for some bare metal PDP-11 stuff, but just in case:
If you want to load and use PDP-11 BASIC, look here.
If you want to understand how PDP-11 BASIC is loaded, look here.
If you want more information about programming in PDP-11 BASIC, look here.
but if you want to watch me muddling through as I try and figure out how PDP-11 BASIC works, read on!
This will be the first in a series of posts as I document my progress in reverse-engineering this code.
I didn't think that a brute-force approach, starting with the entry point instruction and working my way line-by-line through the code, was going to be the most efficient way to understand the code. It might be feasible when I have a better understanding of some of the key components, but not at this early stage. Therefore, I decided to start by examining some of the other ways in which code gets executed, namely with the interrupt handling routines.
There were two questions in particular that I thought it would be worth trying to answer:
BASIC is often used in interactive mode, so if the TTY is generating interrupts the TTY interrupt handler might have some useful indications of what happens when a character key is pressed.
Even a brief skim through the code reveals that there are a lot of TRAP instructions, which are being used in a way analagous to system calls. If I could figure out what some of the TRAP instruction calls do, that would be helpful for building up an insight into how the code works.
The TTY interrupt handling routines
The DL-11 line input (TTY) device has two interrupt vectors; 060 which is the interrupt generated when there is data ready to be received and 064 which is the interrupt generated on completion of transmit. Here are the interrupt vectors:
000060 001136 ; This is the interrupt vector for TTY receive 000062 000000 ; 000064 000066 ; This is the interrupt vector for TTY send 000066 000000 ; - note a TTY send interrupt will cause a HALT
Each vector consists of two words; firstly the value to be loaded into the PC when the interrupt is triggered and secondly the value to be loaded into the PSW.
Note that the transmit interrupt vector, at address 000064 loads the value 000066 into the PC. This points at the next word which has the value 000000 (i.e. HALT). Therefore, the TTY send interrupt, if one ever generated, will cause a HALT.
The TTY receive vector, on the other hand, will execute the code starting at address 001136:
001136 010246 MOV R2, -(SP) 001140 116702 MOVB 177562, R2 001144 042702 BIC #177600, R2 001150 120227 CMPB R2, #20 001154 001402 BEQ 1162 001156 012602 MOV (SP)+, R2 001160 000002 RTI 001162 016746 MOV 13700, -(SP) 001166 012767 MOV #1, 13700 001174 012702 MOV #136, R2 001200 104400 TRAP 0 001202 012702 MOV #120, R2 001206 104400 TRAP 0 001210 012667 MOV (SP)+, 13700 001214 012767 MOV #1, 13702 001222 000755 BR 1156
Let's talk through this code line-by-line and see what it does.
001136 010246 MOV R2, -(SP)
This first line pushes R2 onto the stack.
001140 116702 MOVB 177562, R2
Next, a byte is read from address 177562, which is the address of the receive buffer of the TTY device, and the valus is stored in R2.
001144 042702 BIC #177600, R2
Next, all bits in R2 are cleared except the lowest 7 bits using a BIC (bit clear) instruction, which will clear all bits in the second argument (in this case R2) corresponding to bits that are set in the first argument. The first argument is 177600 (in binary 1 111 111 110 000 000) so only the lowest seven bits will be left untouched and all other bits will be cleared.
001150 120227 CMPB R2, #20
The value in R2 is compared to the value 20. In ASCII, 20 is the code for DLE (Data Link Escape), which is generated by the key combination Ctrl-P. If you take a look in the PDP-11 Basic Manual, at Section 5.5, it says that Ctrl-P is used to stop execution at the end of the current statement. So, this code checks whether Ctrl-P has been pressed.
001154 001402 BEQ 1162
If Ctrl-P was pressed, branch to 1162.
001156 012602 MOV (SP)+, R2 001160 000002 RTI
Otherwise, this code is executed in all other cases. So, if any key other than Ctrl-P has been pressed, pop R2 from the stack and return from the interrupt handler.
001162 016746 MOV 13700, -(SP) 001166 012767 MOV #1, 13700
This is the beginning of the code that is executed when Ctrl-P has been pressed. The first thing that is done is the value of memory address 13700 is pushed onto the stack and then the value 1 is moved into address 13700. As far as I can tell so far, address 13700 is used as a flag to prevent code simultaneously reading and writing from I/O devices. The value 1 means that transmitting to the TTY is allowed.
001174 012702 MOV #136, R2 001200 104400 TRAP 0 001202 012702 MOV #120, R2 001206 104400 TRAP 0
Next, the value 136 is moved into R2. This represents the ASCII character "^". The TRAP 0 routine is used to display the character contained in the R2 register (see below). Then the value 120, which in ASCII is "P", is moved into R2. This is then displayed to the screen using another TRAP 0. In summary, these lines echo "^P" to the terminal, when Ctrl-P has been pressed.
001210 012667 MOV (SP)+, 13700
After the TRAP 0 calls have been completed, the original value in 13700 is popped from the stack.
001214 012767 MOV #1, 13702
The value 1 is moved into address 13702. I'm not sure what this is for yet, but I'm guessing it is used elsewhere in the code to test whether Ctrl-P has been pressed and interrupt execution.
001222 000755 BR 1156
Lastly, the code branches up to 1156, which as described above, pops R2 from the stack and returns from the interrupt.
It seems that this routine only handles the Ctrl-P keypress. It must therefore be possible to conclude that all other keypresses must be polled/handled elsewhere in the code.
The TRAP handling routine
I have already written a post explaining the architecture of the TRAP handling routine. Briefly, the TRAP "parameter" is extracted from the instruction and used (indirectly) to calculate the address of the subroutine used for handling the specific TRAP that has been invoked. Rather than going through all of that again, I will take as a starting point that there are separate routines that are handling each of the various evenly numbered TRAPs.
The TRAP 0 subroutine displays a character represented by the ASCII code stored in the register R2. The code for handling TRAP 0 starts at address 000476.
000476 010146 MOV R1, -(SP) 000500 012701 MOV #177564, R1 000504 005767 TST 13676 000510 001403 BEQ 520 000512 016701 MOV 13706, R1 000516 000416 BR 554 000520 005267 INC 13672 000524 026727 CMP 13672, #110 000532 003405 BLE 546 000534 010246 MOV R2, -(SP) 000536 010046 MOV R0, -(SP) 000540 104402 TRAP 2 000542 012600 MOV (SP)+, R0 000544 012602 MOV (SP)+, R2 000546 005767 TST 13700 000552 001406 BEQ 570 000554 005267 INC 13710 000560 105711 TSTB (R1) 000562 100374 BPL 554 000564 110261 MOVB R2, 2(R1) 000570 012601 MOV (SP)+, R1 000572 000207 RTS PC
Let's walk through this code:
000476 010146 MOV R1, -(SP)
This first line pushes the value contained in R1 onto the stack.
000500 012701 MOV #177564, R1
The value 177564 is then moved into the register R1. This is the address of the TTY transmit status register.
000504 005767 TST 13676 000510 001403 BEQ 520
The content of address 13676 is tested and if the value is zero, the code branches to address 520. The value of address 13676 contains a flag that indicates whether an output device has been configured. If the value is equal to zero then the default output device (TTY) is used. Otherwise, the configured device will be used, as implemented by the following lines:
000512 016701 MOV 13706, R1 000516 000416 BR 554
The address of the configured output device is stored in address 13706. If the value in 13676 indicates that an output device has been configured, the output device transmit status register is contained in memory location 13706. This could either by the transmit status register of the TTY or it could be the transmit status register of the paper tape punch. Once this has been configured, the code branches to 554.
000520 005267 INC 13672 000524 026727 CMP 13672, #110 000532 003405 BLE 546
This code is executed when the default output device is being used. The value in memory address 13672 is incremented and then compared to the value 110. I think this is a maximum line length test. If the value is less than 110, then the code branches to 546.
000534 010246 MOV R2, -(SP) 000536 010046 MOV R0, -(SP) 000540 104402 TRAP 2 000542 012600 MOV (SP)+, R0 000544 012602 MOV (SP)+, R2
Otherwise, R2 and R0 are pushed onto the stack and TRAP 2 is invoked. When TRAP 2 completes, R0 and R2 are popped off the stack. TRAP 2 is used to display a newline to the screen.
000546 005767 TST 13700 000552 001406 BEQ 570
Here we check whether 13700 is set. Referring to the TTY interrupt code above, this value is used as a flag to make sure that read and write code are not simultaneously executed. If 13700 is zero, then the code branches to 570, meaning that the code to display to the screen won't be executed.
000554 005267 INC 13710 000560 105711 TSTB (R1) 000562 100374 BPL 554
The value stored in 13710 is incremented. I'm not sure why that is done, but it might be something to do with handling transmit delay. Next, the value of the address in R1 (i.e. the transmit status word of the input device) is tested. If the value in the transmit status word is positive (i.e. bit 7 is zero) then the code loops back to 554 to wait. When bit 7 is set the value in the transmit status word becomes negative and the code continues.
000564 110261 MOVB R2, 2(R1)
A byte value is moved from R2 to the address contained in R1+2 (which is the transmit buffer of the output device). This is the instruction that leads to the transmission of the ASCII code to the output device.
000570 012601 MOV (SP)+, R1 000572 000207 RTS PC
Finally, the value of R1 is popped from the stack and control returns from this subroutine to the calling location.
TRAP 2 and TRAP 66
TRAP 66 will display an arbitrary string to the screen, the first character of which is stored in register R0, continuing byte-by-byte until a zero byte is identified, indicating the end of the string. TRAP 2 displays a newline to the screen by using the same code with the specific memory address of 4103 in R0. The code can be found here:
000574 012767 MOV #177776, 13672 000602 012700 MOV #4103, R0 000606 112002 MOVB (R0)+, R2 000610 001770 BEQ 572 000612 104400 TRAP 0 000614 000774 BR 606
Working through this code line-by-line:
000574 012767 MOV #177776, 13672 000602 012700 MOV #4103, R0
Firstly, the value 177776 (i.e. the address of the PSW) is stored in memory location 13672. Next, the value 4103 is moved to register R0. These are the only two instructions that are specific to TRAP 2.
Address 4103 contains 15 (CR), address 4104 contains 12 (LF), and address 4105 contains 0. Therefore, the string being printed by TRAP 2 consists of carriage return and linefeed characters only.
The remainder of the code is TRAP 66:
000606 112002 MOVB (R0)+, R2 000610 001770 BEQ 572 000612 104400 TRAP 0 000614 000774 BR 606
Firstly, a byte is moved from R0 to R2 and R0 is incremented. If the value moved was equal to zero then branch to 572 (which is an RTS PC instruction) to return from the subroutine. Otherwise invoke TRAP 0, which will display the character in R2 to the output device, and branch back to address 606 to repeat the sequence and display the next character.
As far as I can tell, these appear to be the basic output routines; displaying a character and displaying a string. In the next in this series of posts I will walk through the basic input code and a couple of other TRAPs I have figured out.
The DL-11 line interface (i.e. TTY) manual was helpful to understand how code interacts with the TTY. The interrupt vectors and I/O registers in particular.
For the same reason the PC-11 paper tape reader/punch manual was helpful.
The PDP-11 BASIC manual is obviously very useful for understanding how it was anticipated that a user would interact with BASIC.