daveor
- Feb 22, 2021
- 10 min read

Reverse engineering PDP-11 BASIC: Part 14

In this post I'm going to start the analysis of the floating point number code. I will be looking at the PDP-11 BASIC representation of floating point numbers, as well as TRAP 36 (which is the integer to floating point conversion code, also known as FLT) and TRAP 40 (which is the floating point to integer conversion code, also known as FIX).

This is another aspect of the code that has been tricky to figure out because it seems that PDP-11 BASIC uses a strange, three word, representation for floating point numbers and I haven't been able to find any existing reference to explain how it works.

For context and a list of other posts on this topic, see the PDP-11 BASIC reverse engineering project page.

Floating point number representation

It might be best to start of with a few examples showing how to determine the value of floating point numbers. Here's the first one:

000000
040000
100001

The first word is the low of the significand, and the second word is the high word of the significand. So, the first what we do is but the first two words together into a two word value:

040000 000000

which in binary is:

0100 0000 0000 0000 0000 0000 0000 0000

The third word is the exponent, which basically encodes the location of the decimal point. The initial value for the exponent specified in the code, when creating a float, is 100017. This places the decimal point between the two words of the significand, as follows:

0100 0000 0000 0000 . 0000 0000 0000 0000

Then, the value of the exponent word of the operand, which in the case of the current example is 100001, is subtracted from value 100017. If the resulting value is positive the decimal point is shifted left that number of positions and if the value is negative the decimal point is shifted right that number of positions.

In this case, we have the calculation "100017 - 100001" (both octal), which gives us a value of 14 in decimal, meaning that the decimal point is shifted 14 places to the left, so we get:

01 . 00 0000 0000 0000 0000 0000 0000 0000 0000

In other words, the three words above represent the value 1.

Let's do another example:

000000
075022
100024

So, we put the first two words together into a single value, and place a decimal point in the middle:

075022 . 000000

which in binary is:

0111 1010 0001 0010 . 0000 0000 0000 0000 0000

Then we calculate the exponent, which in this case is "100017 - 100025", or -5. Therefore, we shift the decimal point five places to the right (because the exponent is negative). That gives us (rearranging the bits into groups of four):

1111 0100 0010 0100 0000 . 0000 0000 0000

which is the value 1,000,000 in decimal.

Third example:

031463 
051463
100000

First, put the two words of the significant together, with a decimal point between them:

051463 . 031463

which in binary is:

0101 0011 0011 0011 . 0011 0011 0011 0011

Then, calculate the exponent which in this case is 100017 - 100000, or 15 (decimal). So, we shift the decimal point 15 places to the left, giving us:

0 . 1010 0110 0110 0110 0110 0110 0110 011

which is the value 0.65 in decimal. Strictly speaking it's 0.649999999906868 but hey, floating point numbers, you know what they're like!

Hopefully that is sufficient explanation of how floating points work in PDP-11 BASIC. If you need more, let me know in the comments and I'll add more. In the meantime, let's take a loop at some of the simpler TRAPs that are associated with the use of floating point numbers.

Integer to floating point conversion (TRAP 36, FLT)

TRAP 36 converts an integer to a floating point number. The integer is passed in register R1 and the resulting floating point number is stored at the memory location stored in R0, and the two subsequent bytes. Conversion errors will result in the overflow bit being set, otherwise the overflow bit will be clear.

Here's the code:

013166 005020 CLR (R0)+
013170 010120 MOV R1, (R0)+
013172 012710 MOV #100017, (R0)
013176 024040 CMP -(R0), -(R0)
013200 010001 MOV R0, R1
013202 012104 MOV (R1)+, R4
013204 012102 MOV (R1)+, R2
013206 012103 MOV (R1)+, R3
013210 010301 MOV R3, R1
013212 005702 TST R2
013214 001004 BNE 13226
013216 005704 TST R4
013220 001002 BNE 13226
013222 005003 CLR R3
013224 000420 BR 13266
013226 005203 INC R3
013230 005303 DEC R3
013232 006304 ASL R4
013234 006102 ROL R2
013236 102374 BVC 13230
013240 103010 BCC 13262
013242 001007 BNE 13262
013244 005704 TST R4
013246 001004 BNE 13260
013250 000261 SEC
013252 006002 ROR R2
013254 005203 INC R3
013256 005201 INC R1
013260 000261 SEC
013262 006002 ROR R2
013264 006004 ROR R4
013266 010420 MOV R4, (R0)+
013270 010220 MOV R2, (R0)+
013272 010320 MOV R3, (R0)+
013274 020301 CMP R3, R1
013276 101002 BHI 13304
013300 000242 CLV
013302 000207 RTS PC
013304 000262 SEV
013306 000207 RTS PC

Let's work through it and see how it works.

013166 005020 CLR (R0)+
013170 010120 MOV R1, (R0)+
013172 012710 MOV #100017, (R0)
013176 024040 CMP -(R0), -(R0)

The value at R0 (the first word of the float) is cleared and then R0 is incremented. Then the value at R1 is moved into R0 (the second word of the float) and then R0 is incremented. Finally, the value 100017 is moved into R0 (the third word of the float). R0 is decremented twice, so it points at the first word of the float again.

Referring to the example floating point numbers discussed above, if R1 at the value 123, we have now created the floating point number 123.0.

013200 010001 MOV R0, R1
013202 012104 MOV (R1)+, R4
013204 012102 MOV (R1)+, R2
013206 012103 MOV (R1)+, R3
013210 010301 MOV R3, R1

The memory address of the floating point number is moved into R1. Then the first word of the float (the zero word) is moved into R4, the second word of the float (the integer value we are converting to a float) is moved into R2 and the third word of the float (the exponent) is moved into R3. R3, the exponent, is also moved into R1.

013212 005702 TST R2
013214 001004 BNE 13226

The value in R2 is tested and if not equal to zero, control branches to 13226.

Otherwise, the integer value passed to this TRAP was zero, so the following code is executed:

013216 005704 TST R4
013220 001002 BNE 13226
013222 005003 CLR R3
013224 000420 BR 13266

The value at R4, the first word of the float, is tested. If not equal to zero, branch to 13226. Otherwise R3 is cleared and then control jumps to 13266.

The next code shifts the integer value as far to the left as possible in R4 without overflowing or changing the sign of the number, while adjusting the exponent accordingly. Basically, the value in R4 is shifted to the left until a 1 bit is encountered.

013226 005203 INC R3
013230 005303 DEC R3
013232 006304 ASL R4
013234 006102 ROL R2
013236 102374 BVC 13230
013240 103010 BCC 13262
013242 001007 BNE 13262

R3 (the exponent) is pre-incremented before the loop so that the decrement of the R3 in the next instruction is cancelled out in the case no adjustment to the value in R4 is required. This pattern of only testing conditions at the end of a set of instructions and looping back to repeat is extremely common.

Anyway, R3 is then decremented (remember R3 is a negative value, so this will make R3 more negative). R4 is shifted left and R2 is rotated left. The top bit of R2 is rotated into the carry bit. Four possiblilities arise:

The bit we just rotated into the carry bit was zero, and the new most significant bit is zero.
The bit we just rotated into the carry bit was zero, and the new most significant bit is one.
The bit we just rotated into the carry bit was one, and the new most significant bit is zero.
The bit we just rotated into the carry bit was one, and the new most significant bit is one.

Next we test whether the rotate left of R2 caused the overflow flag to be set. This flag will be set when either the carry flag was set, or the resulting number was negative (i.e. the N flag was set), but not both. In other words, this flag will be set in two scenarios:

The number in R2 was negative, so the highest bit was set (indicating a negative number). This will have been shifted into the carry bit, so the carry bit will be set. In addition, the next bit of R2 would have needed to be zero, so that the number is no longer negative - this is option 3 above.
The number in R2 was positive, but the highest bit of the number (i.e. bit 15) was a 1 and this is rotated into the sign position, meaning that the sign of the number has changed from positive to negative - this is option 2 above.

If neither of those two scenarios happened, control jumps back to instruction 13230 to further reduce the exponent and shift R4 and R2 left by another bit. What this loop does is, in effect, consume a series of leading zero bits (in the case of a positive number) or one bits (in the case of a negative number) and increment the exponent by the corresponding number of bits.

If the overflow flag was set, we now need to distinguish between the two scenarios just described. In the first case, the carry flag is set but in the second case the carry flag is not set. Therefore, the status of the carry flag is tested. If the carry bit is clear, that means we used to have a positive number but we have rotated the highest bit of the number into the sign position, making the number negative. In other words, we have consumed all of the leading zeros in the positive number (because we have encountered a one), in which case we jump to 13262.

Otherwise, we have encountered the first of the two scenarios above, we are dealing with a negative number and we have consumed all of the leading ones (because we have encountered a zero). The remaining value of the number is tested and if it is not equal to zero, control jumps to 13262. Otherwise, the remaining value in R2 is zero, so the following code is executed:

013244 005704 TST R4
013246 001004 BNE 13260
013250 000261 SEC
013252 006002 ROR R2
013254 005203 INC R3
013256 005201 INC R1
013260 000261 SEC

The value in R4 is tested and if it's not equal to zero control jumps to 13260. The carry bit is set and then R2 is rotated to the right by one bit. This will restore the one bit into the sign position of R2. R1 and R3 are both incremented by one.

At instruction 13260 the carry bit is set again, and control moves on to rejoin the other options from above at instruction 13262:

013262 006002 ROR R2
013264 006004 ROR R4

R2 is rotated to the right, which will restore the last bit shifted out if control jumps directly here from the "leading ones/zeros loop" above, or will restore the one bit from the carry flag just set at instruction 13260 in the case of a negative number. R4 is also rotated right.

013266 010420 MOV R4, (R0)+
013270 010220 MOV R2, (R0)+
013272 010320 MOV R3, (R0)+

The three words of the floating point number are now copied to successive memory addresses starting at R0.

Finally, we set the overflow flag and return:

013274 020301 CMP R3, R1
013276 101002 BHI 13304
013300 000242 CLV
013302 000207 RTS PC
013304 000262 SEV
013306 000207 RTS PC

The value at R3 is compared to the value at R1. If R3 is greater than R1, which, as far as I can tell, will only happen in the case that a zero value integer is passed into this TRAP, then control jumps to 13304 which causes the overflow flag to be set before returning. Otherwise, the overflow flag is cleared before returning.

Floating point to integer conversion (TRAP 40, FIX)

TRAP 40 is used for floating point to integer conversion. The floating point number is passed in registers R2, R3 and R4. The result is returned in R0.

Here's the code:

013476 020427 CMP R4, #100017
013502 101013 BHI 13532
013504 001410 BEQ 13526
013506 020427 CMP R4, #100000
013512 103410 BCS 13534
013514 162704 SUB #100017, R4
013520 006203 ASR R3
013522 005204 INC R4
013524 001375 BNE 13520
013526 010300 MOV R3, R0
013530 000207 RTS PC
013532 104771 TRAP 371
013534 005000 CLR R0
013536 000207 RTS PC

Let's analyse this and see how it works.

013476 020427 CMP R4, #100017
013502 101013 BHI 13532
013504 001410 BEQ 13526
013506 020427 CMP R4, #100000
013512 103410 BCS 13534

Firstly, the value in R4 (the exponent of the floating point number) is compared to 100017. This is an unsigned comparison, so the value 100024 (for example) will be greater than 100017 even though it represents a more negative number. A couple of scenarios are considered;

An unsigned value greater than 100017 indicates that the value contained in the other two words is greater than will fit in a single word and therefore the value in the floating point number cannot meaningfully be converted back into an integer. In this case, control jumps to 13532 and a zero value is returned.
If the value contained in R4 equals 100017, that means that the value in R3 is the integer value, so R3 can be transferred directly into R0 and returned.
If the value contained in R4 is less than 100000 that means the floating point number consists of only a fractional component (i.e. no integer component), in which case zero is returned by branching to 13534.

013514 162704 SUB #100017, R4
013520 006203 ASR R3
013522 005204 INC R4
013524 001375 BNE 13520

In all other case, the exponent needs to be applied to the value contained in R3 so 100017 is subtracted from R4 and then the code enters a loop, shifting R3 to the right and incrementing R4 (remember exponents are negative) until R4 equals zero.

013526 010300 MOV R3, R0
013530 000207 RTS PC

Once the value in R3 has been adjusted by the exponent, R3 is moved into R0 and control returns to the calling code.

013532 104771 TRAP 371
013534 005000 CLR R0
013536 000207 RTS PC

In case of errors, the non-fatal TRAP 371 is invoked. The value in R0 is cleared (i.e. a zero value is returned) and then control returns.

Conclusion

In this post I looked at the structure of floating point numbers and the conversion in both directions betwen integers and floating point numbers. Next time we'll be looking at some of the mathematical operations on floating point numbers.

Reverse engineering PDP-11 BASIC: Part 14

Recent Posts