daveor
- Mar 27, 2021
- 14 min read

Reverse engineering PDP-11 BASIC: Part 24

Updated: Mar 28, 2021

This post describes the BASIC READ, INPUT and RESTORE commands. For context and a list of other posts on this topic, see the PDP-11 BASIC reverse engineering project page.

Introduction

The INPUT command is used to read values specified by the user into variables. Here is an example usage:

INPUT A, B, C

When this command is executed, BASIC will display a question mark to the user, after which the user must enter three values, comma separated, that will be parsed and assigned to the three variables specified in the INPUT command. If not enough, or too many, values are specified the user will receive an error.

The READ command is also used to assign values to variables but instead of reading values from the user, the values are obtained from DATA commands specified elsewhere in the program. Here is an example:

10 DATA 1, 2, 3
20 READ A, B, C

In this example, the values from the DATA command will be assigned to the respective variables specified in the READ command. There does not need to be a one-to-one mapping between DATA command values and READ values. The following sequence of commands would work just as well:

10 DATA 1
20 DATA 2, 3
30 READ A, B, C

The DATA command values are sequentially "consumed" and assigned to variables specified in the READ command, so these three commands are functionally equivalent to the previous example.

As you might expect, there is some overlap in the functionality between the READ and INPUT commands and this is reflected in the code through the use of a number of shared subroutines, which will be analysed separately below.

Subroutine to locate or allocate space for variables

In both the READ and INPUT commands, the series of variable names specified in the command needs to be parsed and either created or located in the runtime state storage. This subroutine pushes onto the stack a series of memory locations representing the locations of the values of each of the variables in runtime state storage.

Here is the code:

007066 104544 TRAP 144
007070 102424 BVS 7142
007072 001002 BNE 7100
007074 010400 MOV R4, R0
007076 104546 TRAP 146
007100 012602 MOV (SP)+, R2
007102 010046 MOV R0, -(SP)
007104 012700 MOV #4, R0
007110 104504 TRAP 104
007112 103415 BCS 7146
007114 010246 MOV R2, -(SP)
007116 104472 TRAP 72
007120 120227 CMPB R2, #54
007124 001760 BEQ 7066
007126 120227 CMPB R2, #72
007132 001404 BEQ 7144
007134 120227 CMPB R2, #12
007140 001401 BEQ 7144
007142 000262 SEV
007144 000207 RTS PC
007146 104401 TRAP 1

Let's see how this works.

007066 104544 TRAP 144
007070 102424 BVS 7142
007072 001002 BNE 7100

TRAP 144 is used to (a) parse a variable name from the command string and return that value in R4 and (b) attempt to locate that variable in the runtime state storage. If there is an error, such as an invalid variable name in the command string, the overflow flag will be set and control branche to 7142 to return an error. If the variable is not located in the runtime state storage the zero flag will be set. Otherwise, if the zero flag is not set, that means the variable was located in runtime state storage and a pointer to the variable value is returned in R0.

If the variable is not located in runtime state storage, then space is allocated for the variable:

007074 010400 MOV R4, R0
007076 104546 TRAP 146

The variable identifier is copied from R4 to R0 and then TRAP 146 is used to allocate space for the variable in the runtime state storage. After TRAP 146, R0 will contain a pointer to the variable value location.

007100 012602 MOV (SP)+, R2
007102 010046 MOV R0, -(SP)
007104 012700 MOV #4, R0
007110 104504 TRAP 104
007112 103415 BCS 7146
007114 010246 MOV R2, -(SP)

In all cases, there is now a variable entry in the runtime state storage for the variable identifier, and R0 points at the location of the variable value.

So, the return address (remember, this is a subroutine) is popped off the stack into R2 and the location of the variable value for the current variable is then pushed onto the stack.

The constant value 4 is moved into R0 and TRAP 104 is used to check whether there are that number of words available in the runtime state storage. If the carry flag is set, that means there is not enough space available, so control branches to 7146 to return an error.

Otherwise the return address is pushed back onto the stack from R2.

007116 104472 TRAP 72

Then, the next non-whitespace character is obtained from the command string. Having parsed a variable name, there are two main possibilities; either there is a comma which will be followed by another variable name or there is a command-ending character (linefeed or colon) which will end the subroutine. So:

007120 120227 CMPB R2, #54
007124 001760 BEQ 7066
007126 120227 CMPB R2, #72
007132 001404 BEQ 7144
007134 120227 CMPB R2, #12
007140 001401 BEQ 7144
007142 000262 SEV

The next character, now stored in R2, is compared to "," (ASCII 54) and if equal control branches back up to 7066 to parse the next variable name. Otherwise the character is compared to ":" (ASCII 72) and if equal we branch to 7144 to return. Finally, the character is compared to linefeed (ASCII 12) and if equal we branch to 7144 to return. If any other character is encountered, the overflow flag is set before returning.

The overflow flag is also set when TRAP 144 sets the carry flag, meaning an invalid variable name was identified.

007144 000207 RTS PC
007146 104401 TRAP 1

Finally control returns from the subroutine. The TRAP 1 is used in case of errors.

Subroutine to assign values to variables

The other requirement that is common to both READ and INPUT is the need to assign a series of values to a series of variables, so there is a subroutine used by both for doing that.

The subroutine expects that the string containing the values to be assigned to the variables will be passed in R1 and that the memory locations of the various variable values will be arranged on the stack by the subroutine above. The subroutine also expects two additional words on the stack before the list of variable locations - these are the return address from the subroutine and the value of R1 (i.e. the saved location in the command string).

Here's the code:

006764 010604 MOV SP, R4
006766 022424 CMP (R4)+, (R4)+
006770 005724 TST (R4)+
006772 005724 TST (R4)+
006774 001376 BNE 6772
006776 005744 TST -(R4)
007000 014400 MOV -(R4), R0
007002 001425 BEQ 7056
007004 010446 MOV R4, -(SP)
007006 104406 TRAP 6
007010 102424 BVS 7062
007012 012604 MOV (SP)+, R4
007014 121127 CMPB (R1), #54
007020 001410 BEQ 7042
007022 121127 CMPB (R1), #72
007026 001407 BEQ 7046
007030 121127 CMPB (R1), #12
007034 001404 BEQ 7046
007036 000262 SEV
007040 000207 RTS PC
007042 005201 INC R1
007044 000755 BR 7000
007046 014400 MOV -(R4), R0
007050 001773 BEQ 7040
007052 000270 SEN
007054 000207 RTS PC
007056 000257 CCC
007060 000207 RTS PC
007062 005726 TST (SP)+
007064 000764 BR 7036

Let's see how this works.

006764 010604 MOV SP, R4

Firstly, the stack pointer is copied into R4. Throughout this subroutine, R4 is used to move around the values on the stack without needing to change the stack pointer.

006766 022424 CMP (R4)+, (R4)+

R4 is moved down the stack by two words to skip over the subroutine return address and the copy of R1 on the stack. R4 should now be pointing at the zero word above the list of variable locations.

006770 005724 TST (R4)+

R4 is moved down by another word, past the zero word above the list of variable locations.

006772 005724 TST (R4)+
006774 001376 BNE 6772
006776 005744 TST -(R4)

We now skip down through the list of non-zero values by testing R4 until the zero value before the list of variable locations is identified. When a zero value is identified, R4 will have been post-incremented, so it is decremented so that after these instructions R4 will be pointing at the zero value before the list of variable locations.

007000 014400 MOV -(R4), R0

The bottom variable identifier from the stack is copied into R0 by pre-decrementing R4 and copying the value into R0.

007002 001425 BEQ 7056

If this value is zero, that means that the next entry on the stack was zero so we branch to 7056 to return.

007004 010446 MOV R4, -(SP)
007006 104406 TRAP 6
007010 102424 BVS 7062
007012 012604 MOV (SP)+, R4

Otherwise R4 is pushed onto the stack and TRAP 6 is used to parse a floating point number from the string pointed to by R1 into the memory location(s) pointed to by R0. In other words, a value will be parsed from the command string and the value assigned to the relevant variable. If the overflow flag is set by TRAP 6, indicating a parse error, control branches ot 7062 to return. Otherwise, R4 is restored from the stack before continuing.

Having parsed a value from the string we now check (a) whether there are any more values on the string, as indicated by the presence of a comma following the value just parsed or (b) whether we have reached the end of the command.

007014 121127 CMPB (R1), #54
007020 001410 BEQ 7042
007022 121127 CMPB (R1), #72
007026 001407 BEQ 7046
007030 121127 CMPB (R1), #12
007034 001404 BEQ 7046
007036 000262 SEV
007040 000207 RTS PC

The next character on the line is compared to comma (ASCII 54) and if it is equal control branches to 7042. If not, the value is compared to ":" (ASCII 72), which would represent the end of the current command and if it is equal control branches to 7046. If not, the value is compared to the line feed character (ASCII 12), which would represent the end of the current line and if it is equal control also branches to 7046. For all other characters, the syntax is invalid, so the overflow flag is set and control returns from the subroutine.

007042 005201 INC R1
007044 000755 BR 7000

When a comma is identified, R1 is incremented to move past the comma character, and then control branches to 7000 to parse the next value.

007046 014400 MOV -(R4), R0
007050 001773 BEQ 7040
007052 000270 SEN
007054 000207 RTS PC

When an end of statement character (colon or line feed) is identified, the value in R4 is pre-decremented and copied into R4. If this value is zero, control branches to 7040 to return immediately. Otherwise, when this value is non-zero, that means that there are more values required than were present in the input string currently being parsed. The value in R0 is the first variable from the list of variable storage locations for which a value has not yet been assigned. The "N" flag (negative) is set before returning to indicate this condition.

007056 000257 CCC
007060 000207 RTS PC

In the case where all variables have now been assigned values, all flags are cleared before returning.

007062 005726 TST (SP)+
007064 000764 BR 7036

Finally, in the case where TRAP 6 returns a parse error (i.e. the overflow flag was set by TRAP 6), the temporarily stored R4 value is popped off the stack and discarded before branching to 7036 to set the overflow flag and return.

With these two subroutines complete, we can now take a look at the INPUT and READ commands themselves.

BASIC INPUT command

The INPUT command reads user input into variable values. Here's the code:

006660 005046 CLR -(SP)
006662 004767 JSR PC, 7066
006666 102001 BVC 6672
006670 104445 TRAP 45
006672 005046 CLR -(SP)
006674 010146 MOV R1, -(SP)
006676 012702 MOV #77, R2
006702 104400 TRAP 0
006704 104500 TRAP 100
006706 004767 JSR PC, 6764
006712 102422 BVS 6760
006714 003015 BGT 6750
006716 002416 BLT 6754
006720 012601 MOV (SP)+, R1
006722 005726 TST (SP)+
006724 005726 TST (SP)+
006726 001376 BNE 6724
006730 005301 DEC R1
006732 005767 TST 13664
006736 001002 BNE 6744
006740 112711 MOVB #12, (R1)
006744 000167 JMP 2762
006750 104765 TRAP 365
006752 000751 BR 6676
006754 104763 TRAP 363
006756 000747 BR 6676
006760 104761 TRAP 361
006762 000745 BR 6676

Let's see how this works.

006660 005046 CLR -(SP)
006662 004767 JSR PC, 7066
006666 102001 BVC 6672
006670 104445 TRAP 45
006672 005046 CLR -(SP)

Firstly, a zero word is pushed onto the stack. Then, the subroutine described above for creating or locating variables is used to push the memory locations of the values of all of the variables onto the stack. If this subroutine sets the overflow flag that means there was a parse error.

If the overflow flag is clear control branches to 6672 otherwise an error is returned by TRAP 45. Another zero word is then pushed onto the stack.

In summary, at this point the stack contains a zero word, followed by all of the memory locations of the variables values, followed by another zero word.

006674 010146 MOV R1, -(SP)

R1, the current location in the command string, is then pushed onto the stack.

006676 012702 MOV #77, R2
006702 104400 TRAP 0
006704 104500 TRAP 100

The ASCII code for "?" (ASCII 77) is moved into R2 and then TRAP 0 is used to display that character. TRAP 100 is then used to read a line into the string input buffer, with the location of the resulting string being returned in R1.

006706 004767 JSR PC, 6764
006712 102422 BVS 6760
006714 003015 BGT 6750
006716 002416 BLT 6754

We now jump to the subroutine, described above, for assigning values to variables. If there was a parse error in the subroutine the overflow flag will be set and control will branch to 6760. If there are more variables that have not yet been assigned values, the "N" (negative) flag will be set in which case control will branch to 6754, otherwise the result will be greater than zero and control branches to 6750.

If you look at the code executed for all of the various options:

006750 104765 TRAP 365
006752 000751 BR 6676
006754 104763 TRAP 363
006756 000747 BR 6676
006760 104761 TRAP 361
006762 000745 BR 6676

they involve invoking various non-fatal error traps (or perhaps traps associated with the operation of the floating point unit) and then all paths lead to address 6676.

Otherwise, in the case the zero flag is set, meaning all variables have been matched and assigned their respective values, execution continues:

006720 012601 MOV (SP)+, R1

R1 is restored from the stack. The remaining values on the stack are all of the variable value locations, with a zero word before and after.

006722 005726 TST (SP)+
006724 005726 TST (SP)+
006726 001376 BNE 6724

The zero word is popped off the stack. Words are then popped off the stack until a zero word is popped. After these instructions, all stack entries added by the INPUT code will have been removed.

006730 005301 DEC R1

R1 is decremented so that it points at the character immediately preceding the list of values in the command string.

006732 005767 TST 13664
006736 001002 BNE 6744
006740 112711 MOVB #12, (R1)
006744 000167 JMP 2762

The value in memory location 13664 is tested. This value is used as a flag to indicate whether or not the program is running - a non-zero value means that the program is running.

If the value is non-zero control jumps to 6744, otherwise the value at the memory location at R1 is replaced with a linefeed character. This effectively truncates the INPUT command.

Control then jumps back to the main syntax parsing loop.

BASIC READ command

The READ command assigns values from DATA commands to the variable values specified in the READ command. Here's the code:

007150 012746 MOV #1, -(SP)
007154 005046 CLR -(SP)
007156 004767 JSR PC, 7066
007162 102001 BVC 7166
007164 104447 TRAP 47
007166 005046 CLR -(SP)
007170 010146 MOV R1, -(SP)
007172 016701 MOV 13670, R1
007176 001003 BNE 7206
007200 016701 MOV 13662, R1
007204 000426 BR 7262
007206 121127 CMPB (R1), #12
007212 001423 BEQ 7262
007214 004767 JSR PC, 6764
007220 102427 BVS 7300
007222 002413 BLT 7252
007224 010167 MOV R1, 13670
007230 012601 MOV (SP)+, R1
007232 005726 TST (SP)+
007234 005726 TST (SP)+
007236 001376 BNE 7234
007240 005726 TST (SP)+
007242 001776 BEQ 7240
007244 005301 DEC R1
007246 000167 JMP 2762
007252 005724 TST (R4)+
007254 005024 CLR (R4)+
007256 005714 TST (R4)
007260 001375 BNE 7254
007262 104534 TRAP 134
007264 122721 CMPB #147, (R1)+
007270 001746 BEQ 7206
007272 020103 CMP R1, R3
007274 103773 BCS 7264
007276 104451 TRAP 51
007300 104453 TRAP 53

Let's see how this works.

007150 012746 MOV #1, -(SP)
007154 005046 CLR -(SP)
007156 004767 JSR PC, 7066
007162 102001 BVC 7166
007164 104447 TRAP 47
007166 005046 CLR -(SP)

First a word with the value 1 is pushed onto the stack, then a zero word is pushed onto the stack. The subroutine described above for creating or locating variables is then used to push the memory locations of the values of all of the variables onto the stack. If this subroutine sets the overflow flag that means there was a parse error.

If the overflow flag is clear control branches to 7166 otherwise an error is returned by TRAP 47. Another zero word is then pushed onto the stack.

In summary, at this point the stack contains a word with the value 1, followed by a zero word, followed by all of the memory locations of the variables values, followed by another zero word.

Next, the location from which to start looking for DATA values is determined:

007170 010146 MOV R1, -(SP)
007172 016701 MOV 13670, R1
007176 001003 BNE 7206
007200 016701 MOV 13662, R1
007204 000426 BR 7262

Firstly, R1 is pushed onto the stack. Then, the value from memory address 13670 is copied into R1. This memory address is used to store the location in the command stream up to which DATA commands have already been consumed. If this value is zero, no DATA commands have already been consumed. If the value is non-zero control branches to address 7206.

Otherwise, if the value in 13670 is zero, the value from memory address 13662 is copied into R1. This memory address containes the lowest memory address of the program code. Control then branches to 7262 to seek to the next DATA command.

007206 121127 CMPB (R1), #12
007212 001423 BEQ 7262

Once a DATA line has been identified, the next character is compared to linefeed. If it is equal then control branches 7262 to locate the next DATA line.

007214 004767 JSR PC, 6764
007220 102427 BVS 7300
007222 002413 BLT 7252

The subroutine to load values into variables is then invoked. If the subroutine sets the overflow flag on return that means an error was encountered, so control branches to 7300 to return an error. If the subroutine sets the "N" (negative) flag on return, that means that there are remaining variables that need values (i.e. more DATA is needed). In this case, control branches to 7252.

Otherwise, all variables have been assigned values:

007224 010167 MOV R1, 13670
007230 012601 MOV (SP)+, R1

The current value of R1 is moved to address 13670. this memory location is used to store the current location up to which DATA values have been consumed. After this has been done, R1 is restored from the stack.

007232 005726 TST (SP)+
007234 005726 TST (SP)+
007236 001376 BNE 7234
007240 005726 TST (SP)+
007242 001776 BEQ 7240

A word is popped from the stack and discarded. Then, Another vaue is popped from the stack and if it is non-zero control branches back to 7234 to pop another value. These two instructions pop all non-zero values off the stack. Next, another value is popped off the stack and if it is zero control branches to 7242 to pop another value. These next two instructions pop all zero values off the stack.

So in summary these lines pop a single value off the stack, then all non-zero values that follow it until a zero value is encountered, then all zero values that follow until a non-zero value is encountered.

007244 005301 DEC R1
007246 000167 JMP 2762

Afterwards, R1 is decremented and then control branches back to the syntax parsing loop.

007252 005724 TST (R4)+
007254 005024 CLR (R4)+
007256 005714 TST (R4)
007260 001375 BNE 7254

After the variable assignment subroutine, in cases where all variables have not yet been assigned a value, R4 will point at the next variable entry that has not yet been assigned a value.

In this case R4 is post-incremented, so after the first instruction above, R4 will point at the last variable that was assigned a value. This value is then cleared and post-incremented. The next value is tested and if it is non-zero control branches to 7254 to repeat the process.

In summary, this code zeros all entries on the stack for variables that have been assigned values by the variable assignment subroutine.

007262 104534 TRAP 134
007264 122721 CMPB #147, (R1)+
007270 001746 BEQ 7206
007272 020103 CMP R1, R3
007274 103773 BCS 7264
007276 104451 TRAP 51
007300 104453 TRAP 53

Finally, this is the code that locates the next DATA command in the BASIC program. TRAP 134 is used to position R3 at the beginning of the runtime state storage (which is also the end of the BASIC program). The value pointed to by R1 is compared to the character "g" which is the token for the DATA command. See Part 6 for more detail of the tokenisation of BASIC commands. If the character "g" is identified, that means a DATA command has been identified, so control branches to 7206 to parse the DATA command.

Otherwise, the value in R1 is compared to the value in R3 to test if we have reached the end of the BASIC program. If we have not reached the end of the program then control branches to 7264. Otherwise TRAP 51 is used to generate an error.

TRAP 53 is used in the code above to generate an error.

BASIC RESTORE command

The BASIC RESTORE command is very simple, it resets the memory location that stores the current location up to which DATA values have been consumed by the running program so far. In effect, it will mean that subsequent READ commands will begin assigning values from the first of the DATA commands again.

Here's the code:

004240 005067 CLR 13670
004244 000507 BR 4464

You'll note that it simply clears the memory location 13670 and then branches back (indirectly) to the main syntax parsing loop.

Summary

This concludes the analysis of the READ and INPUT commands which, as you have seen, both operate in a very similar way. We also looked at the RESTORE command.

Reverse engineering PDP-11 BASIC: Part 24

Recent Posts