After extensively studying the process of loading programs from paper tape in my previous series of posts, I thought it might be fun to study in a bit more detail some programs that used to be available on paper tape. Since the paper tapes (and the corresponding files that are used to emulate the physical paper tapes) were structured into blocks, you can't simply dump the bytes from paper tapes and very easily interpret the contents.
You can use SIMH to load the paper tape directly, as I described in a previous post, but to enable further analysis I decided to write a little piece of code that would read in a paper tape file and output the equivalent set of SIMH DEPOSIT instructions required to load the program into memory.
Input data format
The absolute loader expects tapes to be formatted in a particular way. The tape is made up of a series of blocks, with each block consisting of:
A two byte leader with the values 001 and 000.
One word representing the byte count of the block (including the header bytes, but not including the checksum byte).
One word representing the load address of the block.
The data bytes of the block.
A checksum byte.
There is one special type of block that also needs to be mentioned. If the block has no data, which can be detected by the block having a byte count value of 6, the block only consists of:
A two byte leader with the values 001 and 000.
One word representing the byte count of the block (including the header bytes, but not including the checksum byte).
One word representing the load address of the block.
This is called a jump block. In this case the load address has a special meaning:
If the load address in the jump block is odd the program halts. I suspect, having analysed the absolute loader code in detail, that this may have been used in situations where a single program was split across multiple paper tapes. The program halts to allow the operator to switch the tape and when the operators presses continue on the panel, loading will continue from the new tape.
If the load address in the jump block is even, then this represents the memory address to jump to in order to start executing the program that has just been read from memory.
Output data format
The SIMH confguration file consists of a set of instructions to configure an emulated hardware device, in this case a PDP-11. For the purposes of this post, there are really broadly three categories of configuration instructions that are of intered:
Boilerplate instructions to specify the hardware to emulate. You'll need to add these manually to the output from the code below.
A series of "DEPOSIT" instructions, to load specific words into specific locations in memory. In the SIMH configuration file, "DEPOSIT" can be shortened to "DEP", or just "D". The instructions are of the form "D 000123 000456" where "D" means deposit, "000123" is the memory address and "000456" is the value to be deposited at that memory address.
The "GO" instruction, which is used to start running the program. This instruction is of the form "GO 000123" which means, load the value "000123" into the program counter and start executing from there.
Code walkthrough
I haven't provided the full walk through code in this post, but if anyone would me to post the full code, let me know in the comments and I'll update the post. All I've left out is boilerplate stuff like opening and closing files, checking errors, reading a byte from file, etc.
Here's the core function, called readBlock:
int readBlock(FILE *filep, FILE *outfile) {
if(seekBlockSignature(filep) != 0) {
//didn't find a block signature
return -1;
}
int byteCount = getWord(filep);
if(byteCount == -1) {
return -1;
}
if(byteCount == 6) {
return loadJumpBlock(filep, outfile);
} else {
byteCount -= 4;
return loadDataBlock(filep, byteCount, outfile);
}
}
This function takes as arguments the FILE pointer for the paper tape file and the FILE pointer that the output will be stored to. As you can see this has three parts, hopefully fairly self-explanatory;
Seek forwards until a block signature is identified
Read the byte count
Based on the byte count either load a jump block or a data block.
It is possible that the paper tape may contain some leading zeros before the first block begins, therefore the readBlock() code above uses the seekBlockSignature() function to read bytes until a byte with the value 001, followed by a byte with the value 000 is identified. This represents the block signature. When this pattern is found seekBlockSignature() returns zero. If no block signature is identified, or a read error occurs, the seekBlockSignature() function returns -1.
The code then uses the getWord() function to read two bytes and assemble them into a word. Here's the getWord() function:
int getWord(FILE *filep) {
int lowByte = getByte(filep);
int highByte = getByte(filep);
if(lowByte != -1 && highByte != -1) {
int wordValue = highByte << 8;
wordValue = wordValue + lowByte;
return wordValue;
} else {
return -1;
}
}
Remember that data is stored little-endian, so the code reads the low byte first and then the high byte. Once both bytes have been read, the high byte value is shifted left by 8 bits and the low byte is then added. This will convert both bytes into a single word value.
This function is used in the readBlock() function to read the byte count from the block.
If the byte count is 6, that means we have a jump block. Otherwise, we have a data block.
Let's deal with the data blocks first. Since we have already read four bytes of the block (the two signature bytes and the two bytes of the byte count value), we reduce the remaining byte count by 4 before calling the function loadDataBlock(), which can be seen here:
int loadDataBlock(FILE *filep, int remainingByteCount, FILE *outfile) {
int loadAddress = getWord(filep);
remainingByteCount -=2;
while (remainingByteCount != 0) {
if(remainingByteCount >= 2) {
int wordValue = getWord(filep);
fprintf(outfile, "D %06o %06o\n", loadAddress, wordValue);
remainingByteCount -= 2;
loadAddress += 2;
} else {
int byteValue = getByte(filep);
fprintf(outfile, "D %06o %06o\n", loadAddress, byteValue);
remainingByteCount--;
loadAddress++;
}
}
int checksumByte = getByte(filep);
fflush(outfile);
return 0;
}
This function takes as arguments the input file and output file pointers as well as the remaining byte count in the block.
The first thing this function does is read a word from the file, representing the load address of this block. The remaining byte count is then reduced by two.
All that remains now is to read in the data from the block, so the code loops until remainingByteCount is zero. It makes sense to read the bytes in two at a time and assemble them into words, to the greatest extent possible. Therefore, as long as there are more than two bytes remaining, the bytes are read in two at a time using the getWord() function.
For each word that is read in, a line is added to the output file consisting of "D" for deposit, and then the load address followed by the word value. Both the load address and the byte value are formatted as six character octal values, padded with zeros as needed.
After storing the word value in the output file, the remaining byte count is reduced by two and the load address is increased by two, because this will be the load address for the next data read from the block.
In situations where there is an odd number of bytes in the block, there might be one byte left over at the end of the block. This byte is read in and printed to the output file in the same way as a word.
Finally, I read and ignore the checksum byte, flush the data to the output file and return a zero to indicate successful completion.
The alternative to reading a data block from the tape is to read a jump block. The jump block processing code is really simple:
int loadJumpBlock(FILE *filep, FILE *outfile) {
int jumpAddress = getWord(filep);
if(jumpAddress % 2 != 0) {
printf("Odd jump address!\n");
return -1;
}
fprintf(outfile, "GO %06o\n", jumpAddress);
return 0;
}
All that needs to happen is to read a word from the file, which represents the load address, or in this case the jump address.
The code checks whether the jump address is even. If not, it returns an error code. Otherwise the jump address is written to the output file as a "GO" instruction, which will cause the SIMH emulator to start running the code from that address.
...and that's it basically. The remaining code is just glue that holds these key components together. As I mentioned, if anyone wants me to post the full code let me know in the comments below.
Testing
I ran my code on the PDP-11 BASIC paper tape, and I ended up with a file containing approximately 4,000 "D" instructions and a single "GO" instruction at the bottom. Here's a little excerpt from the end of the file:
D 017434 037516
D 017436 046400
D 017440 046505
D 017442 051117
D 017444 037531
D 017446 000000
D 017450 000000
D 017452 000000
D 017454 000000
D 017456 000000
D 017460 000000
D 017462 000000
D 017464 000000
GO 016104
When I stuck my boilerplate emulator configuration lines at the top of this file:
set cpu 11/70,4M
;set realcons=localhost
set realcons panel=11/70
set realcons interval=8
set realcons connected
and ran the configuration it in SIMH, BASIC started exactly the same as if I had loaded the code from the tape file, which seems to validate that my code works.
Improvements that could be made
Anyone who has ever done any coding will know that it is rare for code to be fully complete, particularly when you're developing hobby/experimental code.
Here are a few improvements that I thought could be made to the code, if I were bothered:
Validate the data block checksums, rather than ignore them.
The absolute loader supports loading the code with a configurable address offset. The address offset is configured by the operator using the control switches, which are read by the absolute loader when it is starting. This could could be improved to load code at a configurable address offset.
I could add support for pausing when an odd numbered jump address is read in and prompting the user for a second tape file, for instance. I didn't do this for two reasons: Firstly it is only a hypothesis that the odd numbered jump address is used as a means for loading multi-tape programs, but it seems reasonable. Secondly, I have not yet found an example of a multi-tape program to work with.
Sometimes I have found it useful to have the DEPOSIT instructions, other times I just want the addresses and instructions. A command-line switch to enable/disable the inclusion of the "D" (i.e. the DEPOSIT instruction) in the output might be good. For now, I have been removeing the D and recompiling.
Comments