Reverse engineering PDP-11 BASIC: Part 9

daveor
Feb 5, 2021
12 min read

In this post I'll be looking at how the BASIC LIST command works.

For context and a list of other posts on this topic, see the PDP-11 BASIC reverse engineering project page.

Introduction: The LIST command

Referring to the PDP-11 BASIC Programming Manual, Section 5.1, we can see that LIST is used to provide a listing of the stored program code. LIST can also accept as an argument a line number, which will mean only that line number will be listed, or it can accept two comma-separated line numbers, which will mean that the line numbers in that range will be printed.

Here is sample usage:

LIST        ; lists the entire program
LIST 50     ; lists line 50 only
LIST 20,100 ; lists lines 20 to 100
LIST 20,    ; lists all lines from 20 to the end of the program
LIST ,100   ; lists lines from the start of the program to 100

Note that when using a range one value can be left out, indicating that the range is either from the beginning of the program or to the end of the program, as appropriate.

TRAP 106

TRAP 106 is used to parse the line number parameter(s) passed to the LIST command. It takes as input the command string (pointed to by the memory address stored in R1) and returns the lower bound of the list range in R3 (or zero if not specified) and the upper bound of the list range in R4 (or zero if it is not specified).

At the time this code is executed, R1 points at the next character after "LET" in the command string.

Here's the code:

002010 104472 TRAP 72
002012 104470 TRAP 70
002014 001015 BNE 2050
002016 005301 DEC R1
002020 104410 TRAP 10
002022 010046 MOV R0, -(SP)
002024 104472 TRAP 72
002026 022702 CMP #54, R2
002032 001010 BNE 2054
002034 104410 TRAP 10
002036 005700 TST R0
002040 001405 BEQ 2054
002042 010004 MOV R0, R4
002044 012603 MOV (SP)+, R3
002046 000207 RTS PC
002050 005046 CLR -(SP)
002052 000765 BR 2026
002054 005004 CLR R4
002056 000772 BR 2044

Let's see how this works.

002010 104472 TRAP 72
002012 104470 TRAP 70
002014 001015 BNE 2050

First TRAP 72 is used to get the next non-whitespace character, then TRAP 70 is used to determine whether or not the character, now stored in R2, is the ASCII character of a digit. If the character is a digit then the zero flag will be set by TRAP 70. If the character is not numeric, control branches to address 2050. Otherwise the character is numeric, so we proceed as follows:

002016 005301 DEC R1
002020 104410 TRAP 10
002022 010046 MOV R0, -(SP)

After the TRAP 72, R1 will point at the memory location after the first digit that has been identified, therefore R1 is decremented so it points at the first digit of the number to be parsed. Then TRAP 10 is used to parse the string into a numeric value, which will be stored in R0. The value at R0 is then pushed onto the stack.

So, in the case where there is only a single numeric value specified in the LIST command, we now have that value stored on the stack. The next thing to check is whether or not there is a comma:

002024 104472 TRAP 72
002026 022702 CMP #54, R2
002032 001010 BNE 2054

The next whitespace character is extracted using TRAP 72 and stored in R2. This value is then compared to the ASCII code for comma, which is 54 (octal). If they are not equal, control branches to 2054.

002034 104410 TRAP 10
002036 005700 TST R0
002040 001405 BEQ 2054

If a comma was identified then another TRAP 10 is used to get the number after the comma. The value is returned in R0, which is then tested and if it is found to be equal to zero (which can mean an invalidly formatted number). If the value in R0 is zero then control branches to 2054.

The remainder of the code relates to organising the results for return. The first value (starting line) is stored in R3 and the second value (ending line) is stored in R4.

002042 010004 MOV R0, R4
002044 012603 MOV (SP)+, R3
002046 000207 RTS PC

In the case where we have found both values, the ending value just identified and stored in R0 is moved into R4 and the starting value, stored on the stack, is popped into R3. Afterwards, control is returned from the subroutine.

002050 005046 CLR -(SP)
002052 000765 BR 2026
002054 005004 CLR R4
002056 000772 BR 2044

The last piece of code deals with the situation where one or other of the numbers was not specified. In the case where an starting number was not identified (before a comma, for example), control jumps to 2050, where a zero value is pushed onto the stack and then control jumps up to the code to test for a comma.

In the case where an ending number was not identified (after a comma) then the value in R4 is cleared and control jumps up to address 2044, the code above to pop the starting value into R3.

The BASIC LIST command handler

So, now that the TRAP 106 code has been explained, here is the code of the BASIC LIST command handler:

002440 104516 TRAP 116
002442 104506 TRAP 106
002444 010300 MOV R3, R0
002446 001060 BNE 2610
002450 016703 MOV 13662, R3
002454 005704 TST R4
002456 001062 BNE 2624
002460 010504 MOV R5, R4
002462 005767 TST 13702
002466 001076 BNE 2664
002470 112302 MOVB (R3)+, R2
002472 120227 CMPB R2, #140
002476 002421 BLT 2542
002500 162702 SUB #140, R2
002504 012700 MOV #3626, R0
002510 010201 MOV R2, R1
002512 005301 DEC R1
002514 002404 BLT 2526
002516 122027 CMPB (R0)+, #44
002522 001375 BNE 2516
002524 000772 BR 2512
002526 112002 MOVB (R0)+, R2
002530 120227 CMPB R2, #44
002534 001752 BEQ 2462
002536 104400 TRAP 0
002540 000772 BR 2526
002542 120227 CMPB R2, #12
002546 001402 BEQ 2554
002550 104400 TRAP 0
002552 000743 BR 2462
002554 104402 TRAP 2
002556 020304 CMP R3, R4
002560 103740 BCS 2462
002562 005767 TST 13676
002566 001406 BEQ 2604
002570 005002 CLR R2
002572 012701 MOV #100, R1
002576 104400 TRAP 0
002600 005301 DEC R1
002602 001375 BNE 2576
002604 000167 JMP 3106
002610 010446 MOV R4, -(SP)
002612 104474 TRAP 74
002614 012604 MOV (SP)+, R4
002616 020105 CMP R1, R5
002620 101313 BHI 2450
002622 010103 MOV R1, R3   
002624 020400 CMP R4, R0
002626 003407 BLE 2646
002630 010400 MOV R4, R0
002632 010346 MOV R3, -(SP)
002634 104474 TRAP 74
002636 001006 BNE 2654
002640 012603 MOV (SP)+, R3
002642 020105 CMP R1, R5
002644 101305 BHI 2460
002646 104502 TRAP 102
002650 010104 MOV R1, R4
002652 000703 BR 2462
002654 012603 MOV (SP)+, R3
002656 020105 CMP R1, R5
002660 101277 BHI 2460
002662 000772 BR 2650
002664 000167 JMP 3056

There's a lot to analyse there, so let's get started.

002440 104516 TRAP 116

First we have the mystery trap, TRAP 116. I'm still not sure what this does, but I'm getting closer to understanding. As far as I can tell it's something to do with maintenance of the executing state of the current command. More particularly, I think it resets the storage area used for maintaining executing state of the current command, so that it is ready for the command that is about to be executed. More on that in a future post.

002442 104506 TRAP 106

Then we have TRAP 106, which I just described. After this subroutine returns, any starting line number (or single line number) will be stored in R3 and any ending line number will be stored in R4. Either, or both, may be zero.

002444 010300 MOV R3, R0
002446 001060 BNE 2610
002450 016703 MOV 13662, R3

The starting line number (or just single line number) is moved from R3 to R0. If it is not equal to zero, control jumps to address 2610. Otherwise a starting line number has not been specified, in which case the memory address of the beginning of the stored program code area is moved from address 13662 into R3.

Next we check if we have an end line number:

002454 005704 TST R4
002456 001062 BNE 2624
002460 010504 MOV R5, R4

The value in R4 is tested and if it is not equal to zero, control jumps to address 2624. Otherwise, the value in R5 is moved to R4. R5 contains the address of the end of the program code in the program code storage area.

So, we branch elsewhere in the code below to deal with the situations where starting and/or ending addresses have been specified. The code continued here in the situation where no numbers have been specified, so the entire program needs to be listed.

002462 005767 TST 13702
002466 001076 BNE 2664

First we check the flag (at address 13702) to see if execution has been interrupted by the user. If the value is non-zero that means the user has pressed Ctrl-P, so control jumps to address 2664, which exits from the command handling code and returns to the syntax parsing loop.

002470 112302 MOVB (R3)+, R2

A byte of the program code is read from the program storage area and saved in R2. The pointer to the remainder of the program code is then incremented.

002472 120227 CMPB R2, #140
002476 002421 BLT 2542

The byte of program code is compared against the value 140. Any ASCII characters below this value will be printable elements of the program code itself. Values greater than 140 represent the tokenised versions of the BASIC commands. The conversion of BASIC commands into tokens was described in Part 6. If the ASCII character is lower than 140, we can skip the code that is used to display the BASIC command by branching to address 2542.

Otherwise execution continues:

002500 162702 SUB #140, R2

140 is subtracted from the value in R2, which now contains the index of the current command in the "$" separated list of commands described in Part 6.

002504 012700 MOV #3626, R0

The memory address of the "$" separated list of commands is now moved to R0.

002510 010201 MOV R2, R1

Now a working copy of R2 is made into R1 and then we enter a loop to skip forward through the "$" separated list of commands to the correct one for the purposes of displaying the current line of the program:

002512 005301 DEC R1
002514 002404 BLT 2526
002516 122027 CMPB (R0)+, #44
002522 001375 BNE 2516
002524 000772 BR 2512

R1 is decremented and if the value is now less than zero, control branches to address 2526 (after this loop). Otherwise, the byte at R0 is compared to the ASCII code for "$", which is 44, and then R0 is incremented. If R0 does not point at a "$" then control branches back to 2516 to compare another byte. These two lines will skip forward to the next command. Then, once the current command has been skipped, control branches back up to the top and R1 is decremented again. The cycle repeats until R1 becomes negative, at which point the control branches down to address 2516, ending the loop.

When this code completes, R0 will point at the memory location of the string representation of the current command in the "$" separated list of commands.

Having identified the current command, we now display it:

002526 112002 MOVB (R0)+, R2
002530 120227 CMPB R2, #44
002534 001752 BEQ 2462
002536 104400 TRAP 0
002540 000772 BR 2526

A byte of the command (pointed to by R0) is moved to R2 and R0 is then incremented. The value in R2 is compared to the ASCII code for "$", which is 44. If they are equal, that means we have displayed the entire command so control branches up to nearly the top of the LIST command code to get the next character and display it. Otherwise, we display the character in R2 using TRAP 0 and loop back to move another byte of the command into R2.

That concludes the code executed whenever we encounter a character with an ASCII code greater than 140, which is a token representing a command. So, we now continue with the general case, which is code used for displaying all characters with ASCII codes less than 140.

002542 120227 CMPB R2, #12
002546 001402 BEQ 2554
002550 104400 TRAP 0
002552 000743 BR 2462
002554 104402 TRAP 2

The character to be displayed is compared to the ASCII code for linefeed, which is 12. If it equals 12, it means that we have reached the end of the current line, so control jumps to address 2554 and TRAP 2 is used to display a carriage return, line feed pair. Otherwise, TRAP 0 is used to display the character and control loops back to near the top of the LIST code, address 2462, to process another character.

002556 020304 CMP R3, R4
002560 103740 BCS 2462

In the case that we have finished displaying a line, we compare R3 and R4 to see if we are finished displaying the entire program. If not, branch back to 2462 to process another character.

002562 005767 TST 13676
002566 001406 BEQ 2604

If the LIST command is being used to write the program to paper tape (because the SAVE command uses the LIST command), a set of trailing characters is added to the end of the output, and that is the purpose of this next bit of code.

Firstly, the value 13676 is tested. This memory location is used as a flag to indicate writing to a specifically selected output device (as opposed to the default TTY output). If the value is non-zero that means an output device has been selected. If it equals zero, no output device has been selected so control jumps to address 2604 to return from the LIST command handler.

Otherwise the following code is executed:

002570 005002 CLR R2
002572 012701 MOV #100, R1
002576 104400 TRAP 0
002600 005301 DEC R1
002602 001375 BNE 2576

A zero word is loaded into R2 and the value 100 is loaded into R1. TRAP 0 is used to output the zero character, which will output in this case to the non-default output device (e.g. paper tape writer). R1 is decremented and compared to zero. If it is not yet zero, control loops back again and another zero byte is written to the output device. This continues until 100 (octal) zero bytes have been written to the paper tape.

002604 000167 JMP 3106

After that, control jumps to 3106, which is back to the main syntax parsing loop.

So, that's the end of the main body of the LIST command. However, there are a couple of bits of code we haven't yet discussed.

Firstly, there's the code that is used when a starting line number has been specified to seek forwards to that line. Let's look at that first. Remember that R0 contains the starting line number (see instruction at memory address 2444).

002610 010446 MOV R4, -(SP)
002612 104474 TRAP 74
002614 012604 MOV (SP)+, R4

R4 is pushed onto the stack, then TRAP 74 is used to seek forward to that line number. After TRAP 74, R1 will contain the memory location of the line number specified in R0. R4 is then popped off the stack.

002616 020105 CMP R1, R5
002620 101313 BHI 2450

R1 is now compared to R5 to see whether we have seeked all the way to the end of the program, which would indicate that the starting line number specified was not found. In that case, control jumps to address 2450 and LIST starts displaying from the beginning of the program code.

002622 010103 MOV R1, R3

Otherwise R1 (the location of the command specified by the starting line number) is moved into R3.

Next the assessment of the ending line number is carried out.

002624 020400 CMP R4, R0
002626 003407 BLE 2646

Firstly, the value in R4 (ending line number, or zero if ending line number is not specified) is compared to the starting line number contained in R0. If R0 is greater than R4 control jumps to 2626. Otherwise:

002630 010400 MOV R4, R0
002632 010346 MOV R3, -(SP)
002634 104474 TRAP 74
002636 001006 BNE 2654
002640 012603 MOV (SP)+, R3

R4 is moved into R0. Then R3 is pushed onto the stack and TRAP 74 is then used to seek forward to the line number specified in R0. After TRAP 74 the zero flag will be set if the line number is found. Therefore if the zero flag is not set control jumps to 2654. Finally, R3 is restored from the stack. After the TRAP 74, R1 will contain the memory location of the line number specified in R0.

002642 020105 CMP R1, R5
002644 101305 BHI 2460

R1, the memory address of the line number specified in R0, is compared to R5, which is the ending memory address of the program code. If R1 is bigger than R5, then control branches to 2460, and R5 is used as the ending location for the LIST command.

002646 104502 TRAP 102
002650 010104 MOV R1, R4
002652 000703 BR 2462

Otherwise, TRAP 102 is used to move to the end of the string contained in R1. After this command, R1 will point at the memory address of the command after the command with the ending line number specified in the LIST command. R1 is then moved into R4 and control branches to address 2462.

002654 012603 MOV (SP)+, R3
002656 020105 CMP R1, R5
002660 101277 BHI 2460
002662 000772 BR 2650
002664 000167 JMP 3056

Now there is some final tidying up. Firstly, R3 is popped from the stack. Then R1 is compared to R5. If R1 is greater than R5, so control branches to 2460 and R5 is used as the ending location for the LIST command. Otherwise the value in R1 is used by branching to 2650 and loading the value from R1 into R4.

The last command jumps to 3056, which is back to the syntax parsing loop, ready to receive the next command from the user.

Reverse engineering PDP-11 BASIC: Part 9

Recent Posts

Comments