Reverse engineering PDP-11 BASIC: Part 4
I had originally planned to write one post explaining what happens from the time BASIC is loaded and starts executing to the time the user is presented with "READY" and can begin entering BASIC commands. It ended up being way too long, so I had to split in into a few separate posts.
In this part I will look at the initial setup and the display and processing of the short-form options.
What happens when BASIC starts running? The PDP-11 BASIC Programming Manual (chapter 7) provides an explanation, which I will summarise here. When you load BASIC, the first thing you will see is this:
PDP-11 BASIC, VERSION 007A *O
The "*O" is supposed to represent a request for options, and you can enter one or more of the following options as a comma separated list:
"L" : Specifying "L" tells BASIC to use the low-speed reader/punch for SAVE/OLD commands instead of the high-speed reader/punch. If there is no high-speed reader/punch, the low-speed reader/punch will be authomatically selected.
"D" : Specifying "D" tells BASIC to delete the extended functions (SIN, COS, ATN, and SQR), presumably to free some space. Default is to retain these functions.
"E" : Specifying "E" tells BASIC to delete the "EXP" and "LOG" functions as well as the extended functions listed above. Default is to retain these functions.
"H" : Specifying "H" tells BASIC to halt before entering the interpreter to allow loading of the EXF function from paper tape. The EXF function allows BASIC programmers to invoke other code written in assembly language from within a BASIC program. Default is not to halt.
Any number between 4 and 28: By default BASIC will use all memory available to the processor. Specifying a number between 4 and 28 tells BASIC to only use that number of kilobytes of memory.
"?": Specifying a "?" tells BASIC to display the long-form version of the configuration options.
Just press RETURN: In this case the default values are all selected, you will get the "READY" prompt, and can start programming BASIC.
If you enter "?" you get a long-form version of the configuration options, that looks like this:
DO YOU NEED THE EXTENDED FUNCTIONS? HIGH-SPEED READER/PUNCH? SET UP THE EXTERNAL FUNCTION? MEMORY?
For the first three questions you can answer "Y" or "N" and for the "MEMORY?" question you enter a value from 4 to 28.
When all options have been selected you get the "READY" prompt and can start programming BASIC.
So, that's what's happens. Now, let's take a look at the code that makes it all work.
Firstly, there is some initial setup code. This code does two things;
Find out whether the high-speed paper tape reader/punch is installed.
Find the amount of available memory in the system.
Here it is:
016104 016706 MOV 13712, SP 016110 104402 TRAP 2 016112 012767 MOV #16126, 4 016120 016701 MOV 177550, R1 016124 000402 BR 16132 016126 005267 INC 17464 016132 012767 MOV #16146, 4 016140 024646 CMP -(SP), -(SP) 016142 012701 MOV #160000, R1 016146 022626 CMP (SP)+, (SP)+ 016150 014111 MOV -(R1), (R1) 016152 162701 SUB #302, R1 016156 010167 MOV R1, 17462
Let's work through it line-by-line.
016104 016706 MOV 13712, SP
This is the program entry point. The first command moves the value at memory address 13712 into the Stack Pointer. The value in 13712 is 13660, so this is where the stack will be held initially.
016110 104402 TRAP 2
Next there's a TRAP 2 which displays a Carriage Return (CR) and Line Feed (LF). I described the operation of TRAP 2 in detail in Part 1 of this series.
016112 012767 MOV #16126, 4
The value 13126 is moved into memory address 4. This is the interrupt vector that is generated on "timeout and other errors". So, what this instruction does is say if a timeout/other error occurs, execute the code at address 16126.
016120 016701 MOV 177550, R1
The next command moves the value from address 177550 (which is the CSW of the high-speed paper tape reader) to R1. If the high-speed paper tape reader does not exist this will lead to a "timeout" or "other error", and therefore the interrupt handling code just loaded at address 4 will be used.
016124 000402 BR 16132
Then, presuming the command to copy the paper tape reader CSW into R1 succeeded, we branch over the interrupt handling code.
016126 005267 INC 17464
Here is the code executed in case of an error reading from the paper tape reader. We increment the value in address 17464. This will then be used as a flag to indicate the presence/absence of the paper tape reader later in the code.
016132 012767 MOV #16146, 4
Now that the initial test for the accessibility of the high-speed paper tape reader has been completed, the code moves on to the second task, figuring out how much memory is available. Firtly, the "timeout and other error" interrupt vector is changed to point to address 16146.
016140 024646 CMP -(SP), -(SP)
Then, the stack pointer is decremented twice. This is done to compensate for the "fall through" case where the instruction two below is always invoked at least once.
016142 012701 MOV #160000, R1
Next, the value 160000 is loaded into R1. This, I think, represents the maximum possible amount of memory. This value represents approximately 57k.
016146 022626 CMP (SP)+, (SP)+
Now, this instruction requires some explanation. It, effectively, pops two values off the stack and discards them. It will always be executed at least once, but the first time it is executed its effect will have been undone by the command two above which has pre-decremented the stack pointer twice.
This is the interrupt handling instruction location that was just loaded into address 4. So, this will be invoked whenever there is a "timeout and other error". Normally, when an interrupt is invoked, the current value of the PSW and PC are pushed onto the stack and then used by the "RTI" instruction to return to the location in code before the interrupt was handled. This time, however, by discarding the top two values from the stack, it is clear the the code never intends to return from the interrupt and was using the generation of the interrupt as a test to see whether the memory access performed by the next instruction succeeded or not.
016150 014111 MOV -(R1), (R1)
This instruction attepmts to move the value in the address pointed to by the value in R1, decremented, to the address pointed to by the value in R1. As a side-effect, the value in R1 is pre-decremented. If the memory access required to perform this instruction fails (e.g. because the memory address does not exist) this will lead to the generation of a "timeout and other error" interrupt.
So, basically what happens is that the code loops, attempting to access memory addresses, starting at 160000 and decrementing each time. This continues, with interrupts being generated on each turn until valid addresses are encountered. In other words, the first time this instruction succeeds, R1 will contain the highest valid memory address.
016152 162701 SUB #302, R1
302 is subtracted from the value in R1. This might represent the lowest bytes, which are not available for code, meaning that R1 contains the count of available memory bytes, but I'm not sure.
016156 010167 MOV R1, 17462
The number of bytes of available memory is then stored in memory address 17462.
That concludes the initial setup code.
Displaying the introductory prompt
Now we move on to displaying the introductory prompt and reading the short-form options.
016162 012700 MOV #13540, R0 016166 104552 TRAP 152 016170 122702 CMPB #114, R2 016174 001433 BEQ 16264 016176 122702 CMPB #104, R2 016202 001435 BEQ 16276 016204 122702 CMPB #105, R2 016210 001430 BEQ 16272 016212 122702 CMPB #110, R2 016216 001432 BEQ 16304 016220 122702 CMPB #12, R2 016224 001503 BEQ 16434 016226 104470 TRAP 70 016230 001030 BNE 16312 016232 005301 DEC R1 016234 104410 TRAP 10 016236 010067 MOV R0, 17450 016242 104472 TRAP 72 016244 122702 CMPB #12, R2 016250 001471 BEQ 16434 016252 122702 CMPB #54, R2 016256 001371 BNE 16242 016260 104472 TRAP 72 016262 000742 BR 16170
Let's see how this works.
016162 012700 MOV #13540, R0 016166 104552 TRAP 152
The memory address of the introductory string is loaded into R0. This is the address of the string "PDP-11 BASIC, VERSION 007A\r\n*O\r\n". Then TRAP 152 is invoked to display the string and read input from the user. The detailed operation of TRAP 152 is described in Part 2 of this series. The first non-whitespace character of the user input is stored in R2.
016170 122702 CMPB #114, R2 016174 001433 BEQ 16264
First we compare the value in R2 to the ASCII code for "L". If the user specifies the "L" option, this means to use the low-speed reader/punch rather than the high-speed reader punch. If R2 contains the ASCII code for "L", then control branches to address 16264. This code jumps to a load of different locations to handle each of the different options and I'll explain what all of them do in the next section.
016176 122702 CMPB #104, R2 016202 001435 BEQ 16276
Next, we compare the value in R2 to "D". If it equals "D" that means the user specified that we should delete the extended functions (i.e. SIN, COS, ATN and SQR). If the value in R2 is "D" control jumps to address 16276.
016204 122702 CMPB #105, R2 016210 001430 BEQ 16272
The next test compares the value in R2 to "E". If it equals "E" that means the user specified that we should delete EXP and LOG, as well as the extended functions. If that value is specified, control jumps to address 16272.
016212 122702 CMPB #110, R2 016216 001432 BEQ 16304
The next test compares the value in R2 to "H". If specified, this means that we should halt before entering the interpreter to allow the loading of the EXF functions, which is stored on a separate paper tape. If "H" was specified, we branch to 16304.
016220 122702 CMPB #12, R2 016224 001503 BEQ 16434
Perhaps the user just pressed return and didn't specify any options, in which case, the value pointed to by R2 will be 12 (the ASCII code for LF, or line-feed). If this test matches, control branches to 16434.
016226 104470 TRAP 70 016230 001030 BNE 16312 016232 005301 DEC R1 016234 104410 TRAP 10 016236 010067 MOV R0, 17450
This code checks for whether a number has been entered. Firstly TRAP 70 (described in Part 3) tests whether the value in R2 represents a number. If the value in R2 does represent a number, the zero flag is set. The next instruction branches, if the result of the previous instruction was not equal to zero, to address 16312. In this case, that means, if the value in R2 does not represent a number, branch to 16312. 16312 is the address that starts the display of the long-form menu, so although the manual indicates that the user needs to press "?" to get the long-form menu, in fact it seems that any non-numeric character would cause the display of the long-form menu.
Otherwise we decrement R1. The reason for this is because TRAP 72 (used within TRAP 152) to get the next non-whitespace character in the string at R1, leaves R1 pointing at the next character after the character that has been stored in R2. Therefore, if the value in R2 is numeric, then R1 needs to be decremented to point at that character again, in case it is one digit of a multi-digit number.
TRAP 10 (described in Part 3) converts a string of ASCII digits, pointed to by R1, into a numeric value, stored in R0. Finally, the numeric value is stored in memory address 17450.
016242 104472 TRAP 72 016244 122702 CMPB #12, R2 016250 001471 BEQ 16434 016252 122702 CMPB #54, R2 016256 001371 BNE 16242 016260 104472 TRAP 72 016262 000742 BR 16170
Next thing we do is get the next non-whitespace character from R1 and compare it to linefeed (ASCII 12). If it equals, that means we're at the end of the list of options selected in which case we branch to 16434.
Otherwise, we compare R2 to comma (ASCII 54). If we don't have a comma, we loop back to 16242 and get the next non-whitespace character and keep checking. Otherwise, if we do have a comma, that means the user might have specified another option, so read the next non-whitespace character from R1 and branch right back up to the top to process another option.
That's the end of this section of code, so we have now read and parsed all of the short-form options.
Storing the short-form menu options
The next thing the code does is to store some of the short-form menu options.
016264 005267 INC 17452 016270 000764 BR 16242
This is the code for handling "L". Remember, if the user selects "L" that means that the low-speed reader/punch should be used instead of the high-speed reader/punch. When this code is executed the memory address 17452 is incremented. This is used to represent the user selecting the low-speed reader/punch. Otherwise, the memory address 17452 will have the value zero.
Once this has been done, the code branches up to the option handling loop discussed in the previous section.
016272 005267 INC 17456 016276 005267 INC 17454 016302 000757 BR 16242
This is the code for handling "E" and "D". Remember "D" means delete the extended functions, "E" means delete the extended functions and also EXP and LOG.
If the user selects "D" the code execution starts at 16276 and the value 17454 is incremented. If the user selects "E" the code execution starts at 16272 and the memory address 17456 is incremented but the memory address 17454 is also incremented. Otherwise one or other, or both, of these values will be zero.
After the relevant settings have been stored control jumps back to the option parsing loop.
016304 005367 DEC 17460 016310 000754 BR 16242
This is the code for handling "H", which means halt before entering the interpreter. In this case, the memory address 17460 will be decremented (meaning it will have the value -1) if the "H" option has been selected.
Again, once the value has been stored, control branches back to the option parsing loop.