This post will describe the operation of the ASCII-to-floating point conversion code, TRAP 6. You will need to have read Part 14, Part 15, Part 16 and Part 17 before you will have a chance of understanding this post.
For context and a list of other posts on this topic, see the PDP-11 BASIC reverse engineering project page.
TRAP 6 parses characters from a string pointed to by R1 into a floating point number stored on the stack.
Parser state
The code works by parsing the string character-by-character. A word on the stack is used as flags for maintaining the state of the parser. Different bits have different meanings as follows:
bit 1, when set, indicates that a number has been identified within the context of the current component of the floating point number (i.e. either the exponent or the significand).
bit 2, when set, indicates that the floating point number has an invalid format.
bit 4, when set, indicates that an "E" has been identified meaning that we are parsing the exponent.
bit 10, when set, indicates that a decimal point has been encountered.
bit 20, when set, indicates that a significand sign has been encountered.
bit 40, when set, indicates that an exponent sign has been encountered.
bit 100, when set, indicates that the significand is negative.
bit 200, when set, indicates that the exponent is negative.
bit 400, when set, indicates that a minus sign has been encountered.
Setup code
In a change of format, I'm going to dive straight in because we're only going to be looking at one TRAP in this post.
010074 010546 MOV R5, -(SP)
010076 010046 MOV R0, -(SP)
Firstly, R5 and R0 are pushed onto the stack.
010100 005020 CLR (R0)+
010102 005020 CLR (R0)+
010104 005010 CLR (R0)
010106 005046 CLR -(SP)
010110 005046 CLR -(SP)
010112 005046 CLR -(SP)
The three words at R0 are cleared and three zero words are pushed onto the stack.
Character parsing loop
Control then enters a loop, reading and parsing individual characters to build up the floating point number.
010114 104472 TRAP 72
First a TRAP 72 is used to get the next non-whitespace character.
010116 122702 CMPB #105, R2
010122 001474 BEQ 10314
010124 122702 CMPB #55, R2
010130 001512 BEQ 10356
010132 122702 CMPB #53, R2
010136 001504 BEQ 10350
010140 122702 CMPB #56, R2
010144 001473 BEQ 10334
The character is compared against a set of characters that have special interpretations in floating point numbers, with branches to different locations for the purposes of processing each of the special characters.
The character is compared against the character "E" (ASCII 105). This character is used to separate the significand and the exponent, in the number format 1.1E5. If this character is identified, control branches to 10314.
Next the character is compared to "-" (ASCII 55), which is used to indicate a negative number. Either the significand or the exponent could be negative. If this character is identified, control branches to 10356. The next comparison is to the "+" character (ASCII 53), which is used to indicate a positive number. If this character is identified, control branches to 10350.
Finally for this set of tests, the character is compared to "." (ASCII 56) which is a decimal point. The decimal point can only be in the significand. Fractional exponents are not permitted. If a decimal point is identified, then control jumps to 10334.
010146 104470 TRAP 70
010150 001125 BNE 10424
TRAP 70 is used to check whether the character is numeric. If the result is non-zero that means the character is not numeric, in which case control jumps to 10424. Otherwise the value is numeric, in which case:
010152 162702 SUB #60, R2
010156 010146 MOV R1, -(SP)
010160 032766 BIT #4, 2(SP)
010166 001033 BNE 10256
60 is subtracted from the ASCII character code, which will leave the numeric value of the digit in R2. R1 is pushed onto the stack.
The value of bit 4 of SP+2 is tested. This memory location is used to store flags indicating the current processing state. In particular, the bit 4 flag is set later in the code to indicate that an "E" has been encountered and we are therefore parsing the exponent of the floating point number. If the bit is set, the result of the BIT test will be non-zero, means an "E" has been encountered. In this case, control jumps to 10256. Otherwise, if the bit is not set, the result of the BIT test will be zero and that means any digits being processed are part of the significand.
Parsing a significand digit
Here is the code used to process a digit of the significand:
010170 162706 SUB #6, SP
010174 010600 MOV SP, R0
010176 010201 MOV R2, R1
010200 104436 TRAP 36
010202 016600 MOV 16(SP), R0
010206 012701 MOV #10660, R1
010212 104430 TRAP 30
010214 102526 BVS 10472
010216 016600 MOV 16(SP), R0
010222 010601 MOV SP, R1
010224 104420 TRAP 20
010226 032766 BIT #10, 10(SP)
010234 001402 BEQ 10242
010236 005366 DEC 12(SP)
010242 062706 ADD #6, SP
010246 012601 MOV (SP)+, R1
010250 052716 BIS #1, (SP)
010254 000464 BR 10426
Let's see how this works.
010170 162706 SUB #6, SP
010174 010600 MOV SP, R0
Firstly, six is subtracted from the stack pointer, which allocates three words on the stack. Then the stack pointer is copied into R0.
010176 010201 MOV R2, R1
010200 104436 TRAP 36
The integer value of the current digit is moved from R2 into R1 and then TRAP 36 is used to convert the integer in R1 into a floating point number stored at R0.
010202 016600 MOV 16(SP), R0
010206 012701 MOV #10660, R1
The value SP+16 is moved into R0. This is the location of the three words pushed onto the stack at the beginning of the TRAP (see above instructions starting at 10106). This memory location is used to keep the running value of the currently identified digits of the significand. At the beginning of parsing the significand this value will be zero.
Then, memory address 10660 is moved into R1. At memory address 10660 we find the following:
010660 000000
010662 050000
010664 100004
This is the floating point representation of the the value 10.
010212 104430 TRAP 30
010214 102526 BVS 10472
TRAP 30 is used to multiply the value in R0 by the value in R1, with the result being stored in R0. This will multiply any previously encountered digits of the significand by ten. If there is an error, identified by the overflow flag being set, then control jumps to 10472.
010216 016600 MOV 16(SP), R0
010222 010601 MOV SP, R1
010224 104420 TRAP 20
Now the newly identified digit is added to the running total. The memory location SP+16 is moved to R0 and the location of the floating point value of the current digit, stored on the stack, is moved into R1. TRAP 20 is used to add the two values together, with the result being stored in R0.
010226 032766 BIT #10, 10(SP)
010234 001402 BEQ 10242
Bit 10 of the flags word stored on the stack is tested. This bit is used to indicate the presence of a decimal point. If the test returns zero, meaning that the flag was not set, control jumps to 10242.
010236 005366 DEC 12(SP)
If the test does not return zero, meaning that a decimal point was located, the integer value of the exponent is decremented.
010242 062706 ADD #6, SP
The three words allocated on the stack to hold the floating point representation of the current digit are popped off the stack and discarded.
010246 012601 MOV (SP)+, R1
The value of R1 is restored from the stack.
010250 052716 BIS #1, (SP)
010254 000464 BR 10426
The value of bit 1 is set in the flags word and then control branches to 10426.
Parsing a exponent digit
The alternative execution path for handling a digit is used in the case where the digit forms part of the exponent. Here is the code used in that case:
010256 010246 MOV R2, -(SP)
010260 016603 MOV 10(SP), R3
010264 012705 MOV #12, R5
010270 005002 CLR R2
010272 005004 CLR R4
010274 020327 CMP R3, #1724
010300 003100 BGT 10502
010302 104462 TRAP 62
010304 062603 ADD (SP)+, R3
010306 010366 MOV R3, 6(SP)
010312 000755 BR 10246
The exponent code is somewhat simpler because only integer values are allowed in the exponent element of a floating point number. Also, the exponent must be less than plus or minus 9800 (decimal).
010256 010246 MOV R2, -(SP)
Initially the newly identified digit is pushed onto the stack.
010260 016603 MOV 10(SP), R3
010264 012705 MOV #12, R5
010270 005002 CLR R2
010272 005004 CLR R4
The value SP+10 is moved into R3. This is used as a running total of the value of the exponent. The value 12 (decimal 10) is moved into R5, and the values in R2 and R4 are cleared.
010274 020327 CMP R3, #1724
010300 003100 BGT 10502
The value in R3 is compared to 1724 (this is decimal 980). Remember that the absolute value of the exponent must be less than 9800. The value in R3 will shortly be multiplied by ten, which will mean that if the current value is 980 then multiplying by ten will exceed the maximum allowed exponent. Therefore, if the current value is greater than 980 control jumps to 10502.
010302 104462 TRAP 62
TRAP 62 is used to multiply the value in R3 by the value in R5, with the result being stored in R3.
010304 062603 ADD (SP)+, R3
010306 010366 MOV R3, 6(SP)
The numeric value of the newly identified digit is now popped off the stack and added to R3. The resulting value is then copied back into the running total location on the stack, which is now SP+6, since we have popped one value off the stack.
010312 000755 BR 10246
Control then jumps up to address 10246, which restores R1 from the stack, sets bit 1 in the flags value on the stack to indicate we have located a digit and loops back to the beginning of the parse loop to parse the next character.
Parsing special characters
As mentioned earlier in the description of the parsing loop, there are certain special characters that may be encountered (specifically "E", "+", "-" and "."). This section describes the code used to handle these characters. Briefly, these characters are handled by setting different bits in a flags word on the stack, so that the numeric parts of the floating point number are handled correctly.
We'll take a look at them one-by-one to see how they work.
This is the code executed when an "E" is encountered:
010314 032716 BIT #4, (SP)
010320 001061 BNE 10464
010322 052716 BIS #4, (SP)
010326 042716 BIC #1, (SP)
010332 000435 BR 10426
Bit 4 in the flags word on the stack is used to indicate whether an "E" has been encountered so firstly bit 4 is tested to see whether it is already set. This would indicate an invalid floating point number with two "E"s in it. In the case where the bit test returns zero (i.e. bit 4 is already set), control jumps to 10464.
Otherwise, bit 4 is set and bit 1 is cleared. Bit 1 is used to indicate whether a numeric value with the current meaning (either the significand or exponent as appropriate) has been parsed. Since we are currently handling the situation where an "E" has been identified, no digits of the exponent itself will yet have been parsed, so bit 1 is cleared.
Control then branches to 10426 which (indirectly) loops back up to the beginning of the parse loop to read in the next character.
This is the code executed when a "." is encountered:
010334 032716 BIT #14, (SP)
010340 001051 BNE 10464
010342 052716 BIS #10, (SP)
010346 000427 BR 10426
Bit 10 is used to represent the identification of a decimal point, so bits 10 and 4 are tested to determine (a) whether a decimal point has already been encountered or (b) whether we are parsing the exponent (because the exponent must be an integer). If either of those bits are set, the floating point structure is invalid and control jumps to 10464.
Otherwise bit 10 is set and control branches to 10426 to loop back to the beginning of the parse loop and read the next character.
This is the code executed when a "+" is encountered:
010350 042716 BIC #400, (SP)
010354 000401 BR 10360
Bit 400 is cleared and then control branches to 10360.
This jump to 10360 is a bit strange. If you take a look at the code below that is executed when a "-" is encountered, you will see that there is no instruction at memory address 10360. that's because the instruction at 10356 uses immediate addressing and the value 400 (which you will see below in instruction 10356 is actually stored at address 10360. So, the words of memory for that instruction are actually as follows:
010356 052716
010360 000400
So, when you jump to address 10360 what actually happens? The value 000400 is going to be interpreted as an instruction, which will mean an unconditional branch with an offset of zero. Since the program counter is incremented before the instruction is executed, it will be pointing at address 10362. Zero will be added to this value, meaning that execution will continue at address 10362.
Anyway, here is the code executed when a "-" is encountered (and also when a "+" is encountered, with the exception of the first instruction):
010356 052716 BIS #400, (SP)
010362 032716 BIT #4, (SP)
010366 001020 BNE 10430
010370 032716 BIT #1, (SP)
010374 001013 BNE 10424
010376 032716 BIT #30, (SP)
010402 001030 BNE 10464
010404 052716 BIS #20, (SP)
010410 032716 BIT #400, (SP)
010414 001404 BEQ 10426
010416 052716 BIS #100, (SP)
010422 000401 BR 10426
010424 000430 BR 10506
010426 000632 BR 10114
010430 032716 BIT #1, (SP)
010434 001373 BNE 10424
010436 031627 BIT (SP), #40
010442 001010 BNE 10464
010444 052716 BIS #40, (SP)
010450 032716 BIT #400, (SP)
010454 001764 BEQ 10426
010456 052716 BIS #200, (SP)
010462 000761 BR 10426
When a minus is encountered, bit 400 is set in the flags word on the stack. Remember that bit 400 is cleared if a plus is encontered.
From then on, the code is executed in the case of both a "+" and a "-" being encountered.
Firstly bit 4 of the flags word on the stack is tested. This tests whether we are parsing the significand or the exponent. If the result is not equal to zero, that means bit 4 is set and we are parsing the exponent. In that case control branches to 10430. Otherwise we are processing the significand, in which case execution continues with instruction 10370:
010370 032716 BIT #1, (SP)
010374 001013 BNE 10424
Bit 1 in the flags word on the stack is tested to see whether any digits of the significand have been parsed. If the bit is set, control jumps to 10424 and the sign is not processed any further.
010376 032716 BIT #30, (SP)
010402 001030 BNE 10464
Now bits 10 and 20 are checked. This tests whether (a) a decimal point has been encountered and/or (b) a sign has already been encountered in the significand. If either of these cases is true, the result of the BIT test will be non-zero and control jumps to 10464.
010404 052716 BIS #20, (SP)
010410 032716 BIT #400, (SP)
010414 001404 BEQ 10426
010416 052716 BIS #100, (SP)
010422 000401 BR 10426
Bit 20 of the flags word on the stack is set to indicate that a sign has been identified in the signifcand element of the floating point number. Then, bit 400 is tested to see whether or not the sign was positive or negative. If bit 400 is not set, that means the sign was positive, so control jumps to 10426. Otherwise bit 100 is set to indicate that the significand is negative and then control jumps to 10426.
010424 000430 BR 10506
010426 000632 BR 10114
There's quite a lot of jumping around involved in this particular TRAP. The branch instruction can only jump backwords a maximum of 200 words, and forwards a maximum of 177 words. Therefore, these two lines are intermediate jump points that are used by code that needs to jump right back up to the top of the parse loop, but that distance is greater than the maximum possible in a single branch.
Branch 10506 is a branch to the beginning of the "tidy up and exit" code, described below. This is the exit point from the main character parsing loop. Branch 10114 is a branch back to the beginning of the character parsing loop.
The remaining code is used when the minus (or plus) is part of the exponent rather than the significand.
010430 032716 BIT #1, (SP)
010434 001373 BNE 10424
Bit 1 in the flags word on the stack is tested to see whether any digits of the exponent have been parsed. If the bit is set, control jumps to 10424 and the sign is not processed any further.
010436 031627 BIT (SP), #40
010442 001010 BNE 10464
Now bits 40 is checked. This tests whether a sign has already been encountered in the exponent, in which case control jumps to 10464.
010444 052716 BIS #40, (SP)
010450 032716 BIT #400, (SP)
010454 001764 BEQ 10426
010456 052716 BIS #200, (SP)
010462 000761 BR 10426
Bit 40 of the flags word on the stack is set to indicate that a sign has been identified in the exponent element of the floating point number. Then, bit 400 is tested to see whether or not the sign was positive or negative. If bit 400 is not set, that means the sign was positive, so control jumps to 10426. Otherwise bit 200 is set to indicate that the exponent is negative and then control jumps to 10426.
010464 052716 BIS #2, (SP)
010470 000755 BR 10424
There have been a lot of jumps in the code above to memory address 10464 and those situations have been where there has been an invalid series of characters identified. In these instances, bit 2 of the flags word on the stack is set to indicate invalid syntax. Control then jumps to 10424, which as described above, is an intermediate jump to the "tidy up and exit" code.
010472 062706 ADD #10, SP
010476 012601 MOV (SP)+, R1
010500 000771 BR 10464
010502 022626 CMP (SP)+, (SP)+
010504 000767 BR 10464
Two other error conditions are handled by this code before branching up to 10464, just described.
Earlier in the code, when we were multipliying the running value of the significand by ten, if that multiplication (performed by invoking TRAP 30) causes the overflow flag to be set, then control jumps to 10472. In this case, 10 is added to the stack pointer, which will pop four values off the stack, R1 is restored from the stack, and then control jumps to 10464 as described above.
The other case dealt with here is when we were parsing digits of the exponent, we checked whether the current running value of the exponent was greater than 1724 (980 in decimal). If it was, that meant that the exponent was too large and control branched to 10502. In this case, two values are popped off the stack and discarded before control jumps to 10464 to set the error flag and return.
"Tidy up and exit" code
The remainder of the code is used for tidying up and exiting. I'll divide it up into a few pieces for simplicity.
010506 010146 MOV R1, -(SP)
Before anything else is done, R1 is pushed onto the stack.
010510 032766 BIT #100, 2(SP)
010516 001404 BEQ 10530
010520 016600 MOV 10(SP), R0
010524 010001 MOV R0, R1
010526 104424 TRAP 24
The first piece of code negates the floating point number, if appropriate. Bit 100 of the flags word on the stack is tested. This flag, if set, indicates that the significand is negative. If the flag is zero, control jumps straight to 10530. Otherwise, SP+10, the memory location of the floating point representation of the significand is moved to R0 and then also copied to R1. TRAP 24 is then used to negate a floating point number.
010530 032766 BIT #200, 2(SP)
010536 001407 BEQ 10556
010540 005466 NEG 6(SP)
010544 102004 BVC 10556
010546 052766 BIS #2, 2(SP)
010554 000426 BR 10632
Next, bit 200 of the flags word on the stack is tested. This flag indicates whether the exponent is negative. If the flag is not set control jumps to 10556.
Otherwise the value at SP+6, which contains the exponent integer value, is negated. If this does not cause an overflow, control jumps to 10556. Otherwise, if it does cause an overflow, bit 2 of the flags is set and then control branches to 10632.
010556 066666 ADD 4(SP), 6(SP)
010564 001422 BEQ 10632
010566 002411 BLT 10612
The exponent of the floating point number representation of the significand is added to the calculated exponent. If the resulting value is equal to zero, control jumps to 10632. Otherwise, if the resulting value is less than zero (meaning that the resulting number is less than zero), control jumps to 10612.
The final option is that the resulting value is greater than zero, meaning that the resulting number is greater than zero, in which case execution continues as follows:
010570 016600 MOV 10(SP), R0
010574 012701 MOV #10660, R1
010600 104430 TRAP 30
010602 005366 DEC 6(SP)
010606 003370 BGT 10570
010610 000410 BR 10632
SP+10, the location of the floating point representation of the significand, is moved into R0. The location of the floating point representation of the constant value 10 is moved into R1. Then TRAP 30 is used to multiply these together.
The value of the exponent, at SP+6, is decremented and if it is greater than zero, control jumps to 10570 to repeat the process until the significand has been multiplied by ten the number of times equal to the value of the exponent.
Once this has been done, control jumps to 10632.
010612 016600 MOV 10(SP), R0
010616 012701 MOV #10660, R1
010622 104426 TRAP 26
010624 005266 INC 6(SP)
010630 002770 BLT 10612
Otherwise, in the case that the resulting number is less than zero, SP+10, the location of the floating point representation of the significand is moved into R0. The location of the floating point representation of the constant value 10 is moved into R1. Then TRAP 26 is used to divide these together.
The value of the exponent is incremented and execution loops for as long as the value of the exponent is less than zero. This means that the process repeats until the significand has been divided by ten the number of times equal to the value of the exponent.
Now, the last few instructions:
010632 012601 MOV (SP)+, R1
010634 005301 DEC R1
010636 012604 MOV (SP)+, R4
010640 062706 ADD #6, SP
010644 012605 MOV (SP)+, R5
010646 032704 BIT #2, R4
010652 001401 BEQ 10656
010654 000262 SEV
010656 000207 RTS PC
R1 is popped from the stack. R1 is the location of the string from which the floating point number has been parsed. The value at R1 is decremented, to point at the last character of the floating point number.
The value is popped off the stop of the stack into R4. Six is then added onto the stack pointer, popping three points off the stack and discarding them. The next value is popped off the stack into R5.
Finally, the value of bit 2 of the flags word, now stored in R4, is tested. This bit, if set, represents an error condition. If it is non-zero the overflow flag is set. If not, the overflow flag is not set.
Control now returns to the calling code.
Conclusion
In the last five posts we have worked through all of the key code for handling floating point numbers, culminating in the analysis in this post of TRAP 6, which is the ASCII-to-floating point conversion code.
Σχόλια