This is a TTL CPU that fits on a prototype board, and is designed to be able to run complex C programs, all without microcode. Programs are written in a C-like language and are compiled by using a custom-designed toolchain, that contains an assembler, a C-like compiler and some library routines and macros.
Since the CPU doesn't utilize microcode (I don't have an EPROM burner so one of the goals was to create the CPU without EPROM chips), it can do only one simple thing: move data during each instruction cycle from the data source to the destination. Eventually the CPU executes a MOVE instruction during each cycle, the instruction code itself determines the component that will be the data source (e.g. accumulator, input port, RAM, program memory, etc...) and the data destination (accumulator, adder, inverter, output port, program counter, etc...).
During the execution phase, a data source component is signaled to place its data on the internal CPU data bus, and a data destination component is triggered at the same time to latch this data. A simple example is the 'JUMP' functionality; the CPU instruction logic triggers a data source component to place its data on the internal data bus, and triggers the program counter at the same time to store this data.
The only drawback of microcode-less design is the lack of more complex CPU functions (stack, interrput handling, etc), however stack can be emulated with software: I use one of the uppermost bytes in the RAM as a stack pointer, and have implemented assembly macros for PUSH, POP, CALL and RET. I'm currently able to write complex C-like programs with arbitrary deep function calls for this CPU using my C-like compiler.
The phase logic is responsible for providing phase signals to the CPU. One CPU instruction cycle is constituted of 4 CPU phases regardless of the instruction length (1 or 2 bytes). Two of the 4 cycles are IDLE cycles, primarily responsible for preventing glitches during the transition from 'fetch' to 'execute' phase. The phase circuit is constructed by using a 2 bit counter (74HC74).
- signal the program memory to place the next instruction word on the CPU internal data bus
- signal the instruction register to latch and store the instruction word
- increment PC
- activate instruction decoders to execute the instruction by activating /OE and /WR signals on the selected data source/destination components
- increment PC if the data source was the program memory
Waveforms of phase signals during each phase (refer to the schematics, I intentionally swapped inverted- and non-inverted prefixes):
The program counter is composed of parallel-loadable counter ICs (74HC161). Actually the schematics contain only 2 of them, but I extended the real implementation to use 4 of them, so the program memory space is eventually 64k. In order to load data into the higher 8 bits of the PC, an extra latch IC (74HC574) has been added. This latch must be loaded with the MSB prior to each JMP instruction, the content of it is loaded into the higher (MSB) counters at the same time when the lower 8 bits are loaded directly from the CPU internal data bus.
The PC is incremented automatically after each read from the program memory (fetch phase, or reading literal data from program memory).
The instruction register is an edge triggered D latch (74HC574), this register is loaded with data during each 'fetch' phase. The easiest way to imagine the fetch phase is as a special hardwired MOVE instruction that is being executed every time before the 'execute' phase; the data source is always the program memory and the data destination is always the instruction register.
The instruction decoder is composed of two demultiplexer ICs (2 x 74HC138) and is driven by the instruction register. A CPU instruction word is 8 bits wide, 3 bits select the data source and 3 bits select the destination. Each demultiplexer ICs apply the control signals to the selected destination/source at the execute phase (3. phase).
The instruction decoders (2x74HC138) use up 6 bits of an instruction word, I used the remaining two bits for instruction modifications:
One bit indicates conditional ('on zero') execution; when this bit is reset, the instruction is only being executed if the content of the accumulator is zero. I achieve this by simply inhibiting the data destination demux IC in case the content of the accumulator is non-zero. When this bit is set, the data destination demux IC is always signaled, regardless of the content of the accumulator. (Note: the data source demux IC must not be inhibited; if the source of data is the program memory, the CPU is expected to increase the PC, whether the instruction is being executed or not. For this reason, the data source signal must always be sent out, so the Program Counter can detect if the source was the program memory and can increase the PC automatically)
The other bit (missing from the schematic) is used to implement a tricky functionality; when set, the MSB carry output of the adder (74HC283) is fed back to the LSB carry input. Here's the use of it: the CPU doesn't have a bit shifter, the only ALU functionalities are addition and bitwise negation. If I wanted bit shifting to the left, I would need to add the content of the accumulator to itself. Where the carry feedback comes into picture is bit rotation; if I want to rotate bits, I simply add the contents of the accumulator itself (shift left) and indicate carry feedback; the MSB will be fed back to the LSB and the result will be the same as bit rotation to the left. I can rotate bits to the right by just rotating to the left seven times; and to be complete, I can do bit shifting to the right as well, by doing seven rotations and one shift to the left; and since I can compare two values (subtract and execute if the result is zero), I'm able to implement bitwise OR and bitwise AND algorithms too.
Not having native bit rotation, bitwise AND and OR support on any CPU is quite unheard of, but in reality these are the least utilized functionalities; from programming point of view, addition and subtraction are the most commonly used functions. I think it was a good choice to have native adder on the CPU and omitting basic logic functionalities, I haven't seen any homemade TTL CPU on the internet yet that fits on a 4" x 4" board, and runs 64k C programs, and all this without microcode.
All the ALU functionalities are implemented with two 4-bit full adders (2 x 74HC283) and a bitwise inverter (74HC540). Though the CPU only supports adding and bitwise negation, theoretically all the basic logic and arithmetic functions can be implemented by algorithms, making this CPU fully Turing-complete.
The accumulator can be loaded with data via 3 different paths: 1) directly via the 74HC541 'clear' buffer, 2) via the 74HC540 bitwise inverter, 3) and via the adder. One of the operand of the adder is always the previous value of the accumulator; the other operand comes from the CPU internal data bus from any CPU component (even from the accumulator, resulting in 'x2' or 'bitwise shift/rotate left' operation) or input port.
The zero detector is implemented by a 8-input diode-OR gate. Its output is evaluated when the conditional execution bit is set to zero in the opcode.
Other control circuits and peripherals
The CPU has a start-stop circuit (SR flip-flop composed of discrete gates) to inhibit the CPU clock and make the program memory available for programming. When stopped, the CPU phase circuit is kept in either phase 0. or phase 1. (MSB cpu phase latch is constantly being kept in reset state). Since phase 1. is a fetch phase and PC is incremented at the end of each fetch phase, making a phase 1. -> 0. transition by toggling the PC_INC button is the simple way of incrementing the PC when entering the program. Program memory can be written only in phase 0. so PC_INC button has to be toggled twice after each write (it's not elegant but the implementation was very simple).
(UPDATE: I created a simple programmer hardware based on the ATmega8 MCU and eliminated all the programming buttons. This programmer plugs into a socket and transfers the bytecode into the CPU program memory on each startup. The programmer can be deattached after programming the CPU.)
There's no separate input port circuit/buffer on the CPU; a series of 100k resistors transfer the logical state of the input port to the internal data bus when the output of all of the other components are in Hi-Z state.
The clock source is a basic variable-frequency relaxation oscillator, constructed of a Schmitt-trigger (74HC14), a capacitor and a trimmer pot.
In the final design, I included an SRAM data memory too. I also added an address latch (74HC574), which has to be loaded prior to reading from or writing to the SRAM.
7 6 5 4 3 2 1 0
|Carry fb| On Zero| Src_0 | Src_1 | Src_2 | Dst_0 | Dst_1 | Dst_2 |
literal (0b100) (literal 8 bit data has to be placed right after the opcode)
selecting any other data sources will leave the outputs of
all component in High-Z state, making it available to read
the input port
accumulator via inverter (0b100)
accumulator via adder (0b010)
output port (0b110)
ram address latch (0b001)
pc high latch (0b111)
pc (0b011) (contents of pc high latch will also be loaded into PC)
'On zero' bit:
Skip executing if the contents of the accumulator is not zero
'Carry feedback' bit:
Feed back the carry output of the 8th bit of the adder to the carry input of the 1st bit.
The CPU can be programmed with a custom toolchain that I created. It consists of an assembler, a C-like compiler and some miscellaneous tools. The C-like complier implements a basic subset of the standard C language with many restrictions (e.g. no struct, not arrays, no callback functions, no "strings", etc..) and the assembler is also basic, it just provides enough functionality to make the C-like compiler work on this platform. Originally I started writing the C-like compiler for fun under linux for the x86 architecture but I quickly adopted it to the TTL CPU after having realized that it would work if I also implemented an emulated stack.
The CPU doesn't have a stack pointer, nor it has stack manipulating instructions (push, pop, call and ret), therefore I implemented the stack functionality with some basic assembly macros. (see mcc/arch/ttlcpu/header.asm in the toolchain). One of the upmost memory locations is appointed as stack pointer, the macros basically load the value of it into the accumulator (after having saved its content if necessary), increase/decrease it then store it back. An additional trick was needed for the 'call' instruction, where the return address has to be stored on the stack; I just use individual labels that point to the next instruction after the 'call' and store it on the stack.
Overall, each stack manipulating instruction is carried out by 12-14 CPU instructions, therefore using them is extremely memory consuming; normally these functionalities are in microcode, the programmer doesn't have to tell the CPU what to do exactly each time a functionality is called; however with 64k address space it's hard to deplete the program memory and the CPU remains simple and its architecture is fully defined by hardware.
The CPU also lacks the standard ALU but it has everything to calculate any basic arithmetic and logic functions. The bitwise AND functionality is implemented in a subroutine (that is, the compiler actually calls it with 'call' instruction) as it would be too large to put it inline into the program at each invocation as a macro (it's a 64 step algorithm, see mcc/arch/ttlcpu/header.asm for details)
I created a simple 7-segment LED display interface and a keyboard interface on a separate board (you can see it on the video) and installed ribbon cable connectors on each board to interconnect them. The display and keyboard scanning routines can be found in mcc/arch/ttlcpu/header.asm. The board also features a 8-bit serial receiver and a 300 baud FSK modem (with an electret microphone) that I plan to use as an input / program source for the CPU. The serial receiver can be configured for 300, 600 and 9600 baud.
To join the Homebuilt CPUs ring, drop me a line,
mentioning your page's URL. I'll then add it to the list.
You will need to copy this code fragment into your page. Note: The ring is chartered for projects that include a home-built
CPU. It can emulate a commercial part, that′s OK. But actually using that
commercial CPU doesn′t rate. Likewise, the project must have been
at least partially built: pure paper designs don′t rate either. It can be built using any technology
you like, from relays to FPGAs.