Let’s implement power-on self test (POST)

I recently implemented simple Power-on self test (POST) routine for my 65C816 test board, so that it can stop and indicate a hardware failure before attempting to run any normal code.

This was an interesting adventure for two main reasons:

  • This code needs to work with no RAM present in the computer.
  • I wanted to try re-purposing the emulation status (E) output on the 65C816 CPU to blink an LED.


Even modern computers will stop and provide a blinking light or series of beeps if you don’t install RAM or a video card. This is implemented in the BIOS or UEFI.

I got the idea to use the E or MX CPU outputs for this purpose from this thread on the 6502.org forums. This method would allow me to blink a light with just a CPU, clock input, and ROM working.

My main goal is to perform a quick test that each device is present, so that start-up fails in a predictable way if I’ve connected something incorrectly. This is much simpler than the POST routine from a real BIOS, because I’m not doing device detection, and I’m not testing every byte of memory.

Boot-up process

On my test board, I’ve connected an LED directly to the emulation status (E) output on the 65C816 CPU. The CPU starts in emulation mode (E is high). However I have noticed that on power-up, the value of E appears to be random until /RES goes high. If I were wiring this up again, I would also prevent the LED from lighting up while the CPU is in reset:

The first thing the CPU does is read an address from ROM, called the reset vector, which tells it where start executing code.

In my case, the first two instructions set the CPU to native mode, which are clc (clear carry) and xce (exchange carry with emulation).

.segment "CODE"
    clc                            ; switch to native mode
    jmp post_check_loram

By default accumulator and index registers are 8-bit, the .a8 and .i8 directives simply tell the assembler ca65 that this is the case.

Next, the code will jmp to the start of the POST process.

Checking low RAM

The first part of the POST procedure checks if the lower part of RAM is available, by writing values to two address and checking that the same values can be read back.

Note that a:$00 causes the assembler to interpret $00 as an absolute address. This will otherwise be interpreted as direct-page address, which is not what’s intended here.

    ldx #%01010101                 ; Power-on self test (POST) - do we have low RAM?
    ldy #%10101010
    stx a:$00                      ; store known values at two addresses
    sty a:$01
    ldx a:$00                      ; read back the values - unlikely to be correct if RAM not present
    ldy a:$01
    cpx #%01010101
    bne post_fail_loram
    cpy #%10101010
    bne post_fail_loram
    jmp post_check_hiram

If this fails, then the boot process stops, and the emulation LED blinks in a distinctive pattern (two blinks) forever.

post_fail_loram:                   ; blink emulation mode output with two pulses
    pause 8
    jmp post_fail_loram            ; repeat indefinitely

Macros: pause and blink

It’s a mini-challenge to write code to blink an LED in a distinctive pattern without assuming that RAM works. This means no stack operations (eg. jsr and rts instructions), and that I need to store anything I need in 3 bytes: the A, X, Y registers. A triple-nested loop is the best I can come up with.

I wrote a pause macro, which runs a time-wasting loop for the requested duration – approximately a multiple of 100ms at this clock speed. Every time this macro is used, the len value is substituted in, and this code is included in the source file. This example also uses unnamed labels, which is a ca65 feature for writing messy code.

.macro pause len                   ; time-wasting loop as macro
    lda #len
    ldx #64
    ldy #255
    cpy #0
    bne :-
    cpx #0
    bne :--
    cmp #0
    bne :---

The second macro I wrote is blink, which briefly lights up the LED attached to the E output by toggling emulation mode. I’m using the pause macro from both native mode and emulation mode in this snippet, so I can only treat A, X and Y as 8-bit registers.

.macro blink
    sec                            ; switch to emulation mode
    pause 1
    clc                            ; switch to native mode
    pause 2
    sec                            ; switch to emulation mode

Checking high RAM

There is also a second RAM chip, and this process is repeated with some differences. For one, I can now use the stack, which is how I set the data bank byte in this snippet.

Here a:$01 is important, because with direct page addressing, $01 means $000001 at this point in the code, where I want to test that I can write to the memory address $080001.

    ldx #%10101010                 ; Power-on self test (POST) - do we have high RAM?
    ldy #%01010101
    lda #$08                       ; data bank to high ram
    stx a:$00                      ; store known values at two addresses
    sty a:$01
    ldx a:$00                      ; read back the values - unlikely to be correct if RAM not present
    ldy a:$01
    cpx #%10101010
    bne post_fail_hiram
    cpy #%01010101
    bne post_fail_hiram
    lda #$00                       ; reset data bank to boot-up value
    jmp post_check_via

The failure here is similar, but the LED will blink 3 times instead of 2.

post_fail_hiram:                   ; blink emulation mode output with three pulses. we could use RAM here?
    pause 8
    jmp post_fail_hiram            ; repeat indefinitely

To make sure that I was writing to different chips, I installed the RAM chips one at a time, first observing the expected failures, and then observing that the code continued past this point with the chip installed.

I also checked with an oscilloscope that both RAM chips are now being accessed during start-up. Now that I’ve got some confidence that the computer now requires both chips to start, I can skip a few debugging steps if I’ve got code that isn’t working later.

Checking the Versatile Interface Adapter (VIA)

The third chip I wanted to add to the POST process is the 65C22 VIA. I kept this check simple, because one read to check for a start-up default is sufficient to test for device presence.

VIA_IER = $c00e
post_check_via:                    ; Power-on self test (POST) - do we have a 65C22 Versatile Interface Adapter (VIA)?
    lda a:VIA_IER
    cmp #%10000000                 ; start-up state, interrupts enabled overall (IFR7) but all interrupt sources (IFR0-6) disabled.
    bne post_fail_via
    jmp post_ok

This stops and blinks 4 times if it fails. I recorded the GIF at the top of this blog post by removing the component which generates a chip-select for the VIA, which causes this code to trigger on boot.

    pause 8
    jmp post_fail_via

Beep for good measure

At the end of the POST process, I put in some code to generate a short beep.

This uses the fact that the 65C22 can toggle the PB7 output each time a certain number of clock-cycles pass. I’ve connected a piezo buzzer to that output, which I’m using as a PC speaker. The 65C22 is serving the role of a programmable interrupt timer from the PC world.

VIA_DDRB = $c002
VIA_T1C_L = $c004
VIA_T1C_H = $c005
VIA_ACR = $c00b
BEEP_FREQ_DIVIDER = 461            ; 1KHz, formula is CPU clock / (desired frequency * 2), or 921600 / (1000 * 2) ~= 461
post_ok:                           ; all good, emit celebratory beep, approx 1KHz for 1/10th second
    ; Start beep
    lda #%10000000                 ; VIA PIN PB7 only
    sta VIA_DDRB
    lda #%11000000                 ; set ACR. first two bits = 11 is continuous square wave output on PB7
    sta VIA_ACR
    lda #<BEEP_FREQ_DIVIDER        ; set T1 low-order counter
    sta VIA_T1C_L
    lda #>BEEP_FREQ_DIVIDER        ; set T1 high-order counter
    sta VIA_T1C_H
    ; wait approx 0.1 seconds
    pause 1
    ; Stop beep
    lda #%11000000                 ; set ACR. returns to a one-shot mode
    sta VIA_ACR
    stz VIA_T1C_L                  ; zero the counters
    stz VIA_T1C_H
    ; POST is now done
    jmp post_done

The post_done label points to the start of the old ROM code, which is currently just a “hello world” program.

Next steps

I’m now able to lock in some of my assumptions about should be available in software, so that I can write more complex programs without second-guessing the hardware.

Once the boot ROM is interacting with more hardware, I may add additional checks. I will probably need to split this into different sections, and make use of jsr/rts once RAM has been tested, because the macros are currently generating a huge amount of machine code. I have 8KiB of ROM in the memory map for this computer, and the code on this page tales up around 1.1KiB.

Building a 65C816 test board

In the last few months, I’ve been learning about the 65C816 processor, and trying to build a working computer which uses it. My latest breadboard-based prototype was not reliable, and I decided to convert it to a PCB to hopefully eliminate the problem, or to at least identify it.

Quick goals

I was aiming to make a debug-friendly 4-layer PCB, the size of two standard breadboards. This will be my first time designing a 4-layer board, and also my first time using KiCad 6 to create a PCB.

I didn’t have a working prototype, so I built in test points for connecting an oscilloscope or logic analyser to troubleshoot. In case of errors, I made it possible to leave some components unpopulated, and instead drive signals externally.

I also left some extra footprints which I might use for future improvements, or can fall back to if some of my ideas don’t work out.

The clock

Some components require the inverse of the CPU clock, and there was previously a small delay from inverting the signal. I don’t think this is a problem on its own, but I introduced a D-type flip-flop to create a proper two-phase clock.

This does halve the CPU clock speed, so I added jumpers to allow an alternative clock source to be selected for the CPU, where previously the UART and CPU clocks needed to be the same. Note that there is an error in the wiring here, which I only discovered later.

The oscillator I’m using on the breadboard prototype is a DIP-8 package. A larger variety of oscillators are available as a 5x7mm QFN package, where the four corner pins have the same function as the DIP-8 version. I added an alternative footprint, which fits without taking up any extra board space. I expect that I’ll only use this if I attempt to run the board at 3.3V later.

Some 6-pin QFN oscillators provide an inverted clock output on one of the pins, and it would be an idea to use one of those in a future design to reduce the component count.

New reset controller

Several components on the board require a reset input. These are mostly active-low inputs, but one component has an active-high reset. I discovered the tiny MIC2775 supervisory IC, which provides both.

I haven’t used this part before, so in case of problems, I also made it possible to remove this part and use a DS1813, which generates an active-low reset on my prototype.

These parts work by sensing voltage, and drop-in alternatives exist for either part if I move to 3.3V.


I am currently using a parallel EEPROM to store code for my prototype, and will add this to the PCB for this test board. As with the the clock and reset controller, I’m working with 5V components today, but considering what I would need to change to run the whole board at 3.3V.

The few EEPROMs which are available at 3.3V use a different package, and are quite slow (200ns or worse). Parallel flash chips seem to be the most promising alternative, since they have a similar interface, although I haven’t confirmed that I can program them yet. I added a footprint for the SST39LF010, which also has a 5V variant with the same pin-out.

Other footprints

I also added footprints so that I could connect the SD card module which I used on my 6502 computer, a piezo buzzer, and some electrolytic capacitors if needed.

Design lessons learned

Although my first attempt at making a PCB was a success, I learned a few things which I’m using here.

For that previous project, I found that machined-pin sockets are not very tolerant of used chips with bent legs, so I’ve used cheaper stamped pin sockets for this test board. I’ve also left spacing at the ends of each socketed IC, so that they can be pulled out more easily.

I made an effort to keep unrelated traces away from points which I need to solder, because I had some problems with this on my last board.

I also avoided creating one large expansion header, and instead exposed signals in places which are easy to route. For example, the I/O device select signals are exposed on a pin header right beside the chip which generates them.

PCB layout

The standard process is to make a schematic, then proceed to placing components, then routing traces.

For this test board, I instead chose the physical dimensions of the board first, and added components incrementally (particularly test points, expansion headers and alternative footprints) until I was making good use of the available space. KiCad has an item in the “Tools” menu to “Update PCB from Schematic”, which I used extensively.

Just to see how it would look, I also added the letters “65C816” to the back copper layer, which is visible in the top-right here.

I put a lot of time into labeling different parts of the board to help with debug/assembly, and used the 3D view to check that it wasn’t too crowded.

Before sending the files to a manufacturer, I printed it at 1:1 scale for a reality check.

Among other things, this confirmed that the ROM sockets had enough clearance from other components.

It also confirmed that either DIP-8 or QFN-packaged oscillators would fit.


I ordered the boards from a manufacturer which I’ve used before. They are large-ish, lead-free 4-layer boards, but I’m not optimising for cost. I’ve left a lot of options in the board, so I’m hoping to make use of several of these PCBs in different configurations, depending on which direction this project goes.

I assembled the board incrementally, starting with the power LED. Most of the passive components on this board are 0603 (imperial) size surface-mount parts, and I’ve used footprints with long pads for hand soldering.

The first problem I found was with the reset button – I had switched to using a footprint with the correct pin spacing, but had assigned the pins incorrectly, so I needed to cut/bend some legs and install it rotated 90 degrees.

I next found a problem with the clock, where I had wired up the flip-flop incorrectly in the schematic – the /Q output should go to D. I could recover from this by cutting a trace and running a short wire. Of course the board is mirrored when flipped upside-down, so I cut the wrong trace first and needed to repair that too.

I added enough components to get the CPU to run NOP instructions from ROM, then built up to a running some test programs which I’ve blogged about here. The final mistake I discovered was a mix-up with some of the lines used for selecting I/O devices, which means that devices are not mapped to their intended addresses. I can work around this in software.

This is already an improvement over the breadboard prototype, because I can quickly swap the ROM chip without accidentally disconnecting anything.

The board looked fairly complete at this point, and the only major component missing was the UART chip. This was the part which did not work reliably in my prototype, so I was prepared to do some debugging here. Note that I’ve got orange test points all over the board to connect important signals to an oscilloscope, with a few black test points for GND connections. All of the 74-series chips are in the 74AC logic family, and I sourced 74HC versions as well in case I needed to switch any to see the difference.

However, I was able to run more or less the same test program I used before, and it now works reliably. This is captured through a Cypress FX2-based logic analyser using Sigrok.

It’s great that this works, but I don’t know for sure why this was so unreliable on my previous prototype. Several possible causes were eliminated through this process, since I used a new UART chip and freshly programmed address decode PLD, and eliminated a possible timing issue. On a PCB, I’m also able to make better electrical connections, and add a ground plane, which is an immediate advantage over breadboards.

The design as it stands

Now that I am back to having a working prototype, I’ll take this chance to post some updated schematics.

Just a note of caution: this is snapshot of a work-in-progress learning project. I’m absolutely aware that there are errors in here, and that the layout it quite messy. Still, I hope that this is useful to anybody else who is attempting to use this relatively obscure CPU.

The design no longer fits on one page, so I’ve split it into 3 sheets.


The CPU sheet contains the circuitry for de-multiplexing the data bus and bank address byte. All power, reset, and clock components are in here as well, along with pin headers for the address & data bus, and all those test points.

There are a lot of “just in case” components as well, such as pull-up resistors on the data bus, which I have not fitted.


The memory section is quite straightforward. I’ve got a PLD generating chip-selects for ROM, RAM0, or RAM1, with some extra components to add some flexibility.


In case I/O is selected, there are three possible choices: the 65C22 VIA, or one of the two UART interfaces. Most of the other components on this sheet are optional footprints or external ports. Note that the clock going into the UART is mis-labelled on this sheet.

Next steps

I’m going to spend some time using this board as a development platform for low-level software which targets the 65C816 CPU. I’ll most likely also use the emulator which I put together a few weeks ago to speed up development, since I’ve been able to confirm that the code I’m writing works on real hardware.

The basic functionality is now a lot more stable than what I had before, so this test board will allow me to prototype some different hardware options once I’ve got some simple text-based programs up and running.

The hardware design, software and emulator for this computer can be found in the GitHub repository for this project. I’m updating the repository as I make progress with this project, and the version used for this blog post is here.

65C816 computer – second prototype

Back in February, I blogged about my 65816 computer prototype on breadboards. I recently spent some time re-building this in an attempt to add some improvements.

The new prototype didn’t work particularly well, so I’ve left out quite a lot of detail from this update.

Power delivery

I soldered stranded wire to pin-headers to deliver a reliable ground and +5V power connection to each breadboard. This corrected some problems from my previous prototype, which used a long chain of unreliable connections.

Adding more RAM

I added a second 512 KiB RAM chip, and extended the address decoding scheme to provide a chip-select signal for it. The computer now uses the planned 20-bit address bus, though the wiring around these chips became very dense.

Adding a UART chip

I previously tested an NXP SC16C752 UART, and added it to this computer to provide text I/O. This is a small surface-mount chip, which I am connecting via a break-out board.

My previous prototype only supported one I/O device, so I also extended the address decoding scheme with a 74AC138 decoder.


I could get some simple test programs running, but text I/O did not work reliably at all. With my limited debugging tools (just a logic analyser), I was only able to conclude that data being written by the CPU was different to what was being received by the UART.

This seemed like it could be a timing or signal integrity issue, but I couldn’t be certain from looking at digital output with a low sample-rate. This image shows some control signals from when I was troubleshooting this.

Next steps

After hitting a dead-end, I made a list of possible causes, and decided to re-build this prototype on a debug-friendly PCB, and to source an oscilloscope for troubleshooting.

More on that in my next post!

Building an emulator for my 65C816 computer

I’ve been working on adding text-based input and output to my 65C816 computer prototype, but it’s not yet working reliably.

To unblock software development, I decided to put together an emulator, so that I can test my code while the hardware is out of action.

Choosing a base project

There were three main options I considered for running 65C816 programs on a modern computer,

  1. Write a new emulator from scratch
  2. Extract the useful parts of an open source Super Nintendo or Apple IIgs emulator and adapt them
  3. Find a 65C816 CPU emulator, and extend it.

I found a good candidate for the third option in Lib65816 by Francesco Rigoni, which is a C++ library for emulating a 65C816 CPU. The code is GPLv3 licensed, and I could see where I would need to make changes to match the design of my computer.


After reading the code, I could see some places where I would need to modify the library, so I copied it in to a new folder, together with its logging dependency and sample program, and refactored it into one project.

I started by extending RAM up to 16 banks (1 megabyte), and mapping in a ROM device which serves bytes from a file. I also updated the handling of interrupt vectors, so that the CPU would retrieve them from the system bus.

Adding 65C22 support

My first test program was an example of preemptive multi-tasking from my previous blog post. At this point it ran, but did not switch between tasks, since that requires extra hardware support.

I implemented a minimal support for the 65C22 VIA used in my computer design, just enough to log changes to the outputs, and to fire interrupts from a timer.

The Lib65816 library does not support NMI interrupts, which are used in this test program. I added edge detection, and connected the interrupt output of the 65C22 to the CPU NMI input in code.

This screen capture shows the test program during switching between two tasks, with a different VIA port being accessed before and after the switch.

This gave me some confidence that the CPU emulation was good enough to develop on, so I moved on to the next test program.

Adding serial support

The second test program is also from an earlier blog post, and tests printing and reading characters from an NXP SC16C752 UART.

I added a very minimal emulation of the UART chip I’m using, suppressed all of the logging, and converted output to use the curses library. This last step enabled character-by-character I/O, instead of the default line-buffered I/O.

This screen capture shows the program running under emulation. It simply prints “Hello world”, then echoes back the next 3 characters that the user types.


I didn’t plan to write an emulator for my custom computer, but I am currently debugging some tricky hardware problems, and this detour will allow me to continue to make progress in other areas while I try to find a solution.

I plan to keep the emulator up to date with hardware changes, since it will make it possible to demonstrate the system without any hardware access. It’s also easier to debug issues now that I can choose two different execution environments.

The hardware design, software and emulator for this computer are online at mike42/65816-computer. I would like to again acknowledge Francesco Rigoni’s work on Lib65816. I’m very grateful that he chose to release this under the GPL, since it allows me to re-use the code for my project.

Let’s implement preemptive multitasking

I am currently working on building a computer based on the 65C816 processor. This processor has a 16-bit stack pointer, which should make it far more capable than the earlier 65C02, at least in theory.

I wanted to check my understanding of how the stack works on this processor, so I tried to build the simplest possible implementation of preemptive multitasking in 65C816 assembly.

What I’m working with

I am working with a breadboard computer. It has no operating system, and no ability to load programs or start/stop processes.

To get code running, I am assembling it on a modern computer, then writing the resulting binary to an EEPROM chip.

To see the result of the code, I am using a logic analyser to watch some output pins on the 65C22 VIA. The 65C22 chip also has a timer, so I connected its interrupt output to the NMI input of the CPU for this test.

Test programs

I wrote two simple programs which infinitely loop. These programs use ca65 assembler syntax.

The 65C22 has two 8-bit ports, so I made each test program write a recognisable pattern to its own output port, so that it’s easy to see which one is running at any given time, and check that it is not disrupted by the task switching.

The first program writes to Port A of the 65C22, and alternates the value between 0000 0000, and 0000 0011.

.segment "CODE"
; Task 1
  .a8                             ; use 8-bit accumulator and index registers
  sep #%00110000
  lda #%00000000                  ; alternate between two values
  sta PORTA
  lda #%00000011
  sta PORTA
  jmp @repeat_1                   ; repeat forever

The second program writes to Port B of the 65C22. This uses a ror instruction to produce a pattern across all digits of the output port.

.segment "CODE"
; Task 2
  .a8                             ; use 8-bit accumulator and index registers
  sep #%00110000
  lda #%01010101                  ; grab a start value
  ror                             ; rotate right
  sta PORTB
  jmp @repeat_2                   ; repeat forever

Context switching

I’m implementing round-robin scheduling between two processes.

My basic plan was to use a regular interrupt routine to save context, switch the stack pointer to the other process, restore context for the next process, then return from the interrupt.

.segment "CODE"
  rep #%00110000
  ; Save task context to stack
  pha                             ; Push A, X, Y
  phb                             ; Push data bank, direct register

  ; swap stack pointer so that we will restore the other task
  tsc                             ; transfer current stack pointer to memory
  sta temp
  lda next_task_sp                ; load next stack pointer from memory
  lda temp                        ; previous task is up next
  sta next_task_sp

  ; Clear interrut
  sep #%00110000                  ; save X only (assumes it is 8 bits..)
  ldx T1C_L                       ; Clear the interrupt, side-effect of reading

  rep #%00110000
  ; Restore process context from stack, reverse order
  pld                             ; Pull direct register, data bank
  ply                             ; Pull Y, X, A

Setup process

The above switch works while the two processes are up and running, that takes some setting up. Firstly, the two variables need to be referenced somewhere.

.segment "BSS"
next_task_sp: .res 2              ; Stack pointer of whichever task is not currently running
temp: .res 2

The boot-up routine then sets everything up. It’s broken up a bit here. First is the CPU initialisation, since the 65C816 will boot to emulation mode.

.segment "CODE"
  clc                             ; switch to native mode

The second step is to start a task in the background. This involves pushing appropriate values to the stack, so that when we context switch, “Task 2” will start executing from the top.

  ; Save context as if we are at task_2_main, so we can switch to it later.
  .a16                            ; use 16-bit accumulator and index registers
  rep #%00110000
  lda #$3000                      ; set up stack, direct page at $3000
  ; emulate what is pushed to the stack before NMI is called: program bank, program counter, processor status register
  phk                             ; program bank register, same as current code, will be 0 here.
  pea task_2_main                 ; 16 bit program counter for the start of this task
  php                             ; processor status register
  ; match what we push to the stack in the nmi routine
  lda #0
  pha                             ; Push A, X, Y
  phb                             ; Push data bank, direct register
  tsc                             ; save stack pointer to next_task_sp
  sta next_task_sp

  lda #$2000                      ; set up stack, direct page at $2000 for task_1_main

The nex step is to set up a timer, so that the interrupt routine will fire regularly. This causes interrupts to occur ~28 times per second.

  ; Set up the interrupt timer
  .a8                             ; use 8-bit accumulator and index registers
  sep #%00110000
  lda #%11111111                  ; set all VIA pins to output
  sta DDRA
  sta DDRB
  ; set up timer 1
  lda #%01000000                  ; set ACR. first two bits = 01 is continuous interrupts for T1
  sta ACR
  lda #%11000000                  ; enable VIA interrupt for T1
  sta IER
  ; set up a timer at ~65535 clock pulses.
  lda #$ff                        ; set T1 low-order counter
  sta T1C_L
  lda #$ff                        ; set T1 high-order counter
  sta T1C_H

Finally, “Task 1” can be started in the foreground.

  ; start running task 1
  jmp task_1_main

Linker configuration and boilerplate

This is the first time I’m blogging about code written in 65C816 native mode, so for the sake of completeness, I’ll also include the updated linker configuration. The most important change (compared with this blog post) to that 32 bytes are now set aside for interrupt vectors.

    ZP:     start = $00,    size = $0100, type = rw, file = "";
    RAM:    start = $0100,  size = $7e00, type = rw, file = "";
    PRG:    start = $e000,  size = $2000, type = ro, file = %O, fill = yes, fillval = $00;

    ZEROPAGE: load = ZP,  type = zp;
    BSS:      load = RAM, type = bss;
    CODE:     load = PRG, type = ro,  start = $e000;
    VECTORS:  load = PRG, type = ro,  start = $ffe0;

Note that this linker configuration is not quite complete, but it is good enough to get code running. The zero page is not relevant for the code I’ll be writing, so I might remove that at some point.

I also have definitions for the 65C22 I/O registers, which correspond to the location that it is mapped into RAM on my prototype computer.

PORTB = $c000
PORTA = $c001
DDRB = $c002
DDRA = $c003
T1C_L = $c004
T1C_H = $c005
ACR = $c00b
IFR = $c00b
IER = $c00e

The final piece of the puzzle then, are definitions for all those interrupt vectors.

.segment "CODE"

unused_interrupt:                 ; Probably make this into a crash.

.segment "VECTORS"
; native mode interrupt vectors
.word unused_interrupt            ; Reserved
.word unused_interrupt            ; Reserved
.word unused_interrupt            ; COP
.word unused_interrupt            ; BRK
.word unused_interrupt            ; Abort
.word nmi                         ; NMI
.word unused_interrupt            ; Reserved
.word irq                         ; IRQ

; emulation mode interrupt vectors
.word unused_interrupt            ; Reserved
.word unused_interrupt            ; Reserved
.word unused_interrupt            ; COP
.word unused_interrupt            ; Reserved
.word unused_interrupt            ; Abort
.word unused_interrupt            ; NMI
.word reset                       ; Reset
.word unused_interrupt            ; IRQ/BRK

The process for building this assembly code into a usable ROM are:

ca65 --cpu 65816 main.s
ld65 -o rom.bin -C system.cfg main.o

This outputs an 8KiB file. I need to pad this with zeroes up to 32KiB, which is the size of the ROM chip I am burning it to.

truncate -s 32768 rom.bin

Lastly, I write this to the ROM.

minipro -p AT28C256 -w rom.bin


I connected a logic analyser to two wires from each output port, the NMI interrupt line, and the reset signal, and opened up sigrok.

This clearly shows the CPU switching between the two processes each time an interrupt is triggered.

Zooming in on the task cut-over, I can also see that the patterns on each port are different, as expected.

Of course this did not work the first time, and I spent quite a bit of time de-constructing and re-constructing the code to isolate bugs.

As result, I shipped some improvements to my 6502 assembly plugin for JetBrains IDE’s, which I have blogged about previously. The plugin will now provide suggestions for mnemonics. There is a project setting for 6502 vs 65C02 vs 65C816 mode, and it will only suggest mnemonics which are available for the CPU.

It also provides a weak warning for binary or hex numbers which are not 8, 16 or 24 bits, which would have saved me some time.

Lastly, it has optional checking for undefined/unused values. This helps catch problems in the editor, and allows me to identify unused code and variables.


I already knew how multitasking works on a high-level, but it was quite interesting to implement it myself. I have been using this as a test program for my 65C816 computer, since it exercises RAM, ROM, I/O, and uses interrupts.

I don’t think I will be able to use this exact method for more complex programs, because it will break down a bit with multiple interrupt sources. After writing this code, I also found that it’s not common for modern systems to save register values to the stack when context-switching, and that a process control block is used instead.

The next step for my 65C816 computer will be the addition of an NXP UART, which I tested in an earlier blog post.

65C816 computer – initial prototype

I am currently working on building a 65C816-based computer from scratch. I have some ideas for how I want to do this, but my first task is to put together a working prototype. I’ll be extending this in different ways to work towards the goals outlined in my last blog post.

This is about the simplest computer I can come up with which uses the 65C816 processor, but there is still quite a lot to go though.


All of the integrated circuits on this prototype are through-hole packages, suitable for use on a breadboard.

  • WDC W65C816S CPU
  • WDC W65C22S VIA I/O chip.
  • Alliance AS6C4008-55PCN 512k x 8 55ns SRAM
  • Atmel AT28C256 32k x 8 EEPROM
  • Atmel ATF22LV10 PLD for address decoding
  • 74AC04, 74AC245, 74AC573 for handling the multiplexed data bus and bank address bits
  • 1.8432 MHz oscillator
  • DS1813-5+ supervisory circuit

I also used the following equipment to program the chips, and view the result of the test program:

  • TL-866II+ EEPROM programmer
  • 8-channel Cypress FX2-based logic analyser

I’m working on Linux, and also use the following software tools:

  • ca65 – for assembling code for the CPU
  • minipro – for programming the EEPROM and PLD.
  • galette – for assembling the logic definitions for the PLD.
  • sigrok – for working with the logic analyser

Bank address latching

The 65C816 has a 24-bit address bus, and the top 8 bits are multiplexed onto the data bus. This is not a big problem, but it does mean that I need some extra glue logic compared to a plain 6502 system. A diagram in the 65C816 data sheet shows a suggestion for how to do this, and I’m starting with that implementation.

I used a 74AC04 to invert the clock, 74AC245 bus transceiver for the data bus, and a 74AC573 latch for the bank address (upper 8 bits of the address bus). This first page also shows the DS1813-5+ supervisory circuit on the reset line, which resets all of the components on power-up.

I’m aware that I should be generating a two-phase clock (there is some discussion about that here), though I can’t see any problems arising from the the delay on the inverted clock on this particular circuit.

Memory map and address decoding

The memory map for this computer is very simple. All of this will change, but I did need to start with something to get some code running.

Only the lower 19 bits of the address space are properly decoded, and mapped to one of three chips.

Address Maps to
010000 – 07FFFF RAM – 448 KiB
00E000 – 00FFFF ROM – 8 KiB
00C000 – 00DFFF I/O – 8 KiB
000000 – 00BFFF RAM – 48 KiB

I used a programmable logic device (PLD) to implement this. It’s easy enough to do this with discrete gates, but a PLD will make it much easier to keep chip count and propagation delays under control as I add more features. I’ve written about programming PLD’s in previous blog posts (here, here and here), and I plan to use them to implement several features.

The particular part I am programming here is an ATF22LV10. These are still produced the time of writing, and operate at either 3.3 volts or 5 volts, which is useful for this project. They can be also programmed with the common TL-866II+ chip programmer, which I also use for programming EEPROMs.

The definitions for decoding this are below. I called this file address_decode.pld.


Clock  RW     NC     NC     A19    A18    A17    A16    A15    A14    A13    GND
NC     /RAMCS /ROMCS /IOCS  /RD    /WR    NC     NC     NC     NC     NC     VCC

; ROM is the top 8 KiB of bank 0.
; prefix is 0000 111xxxxx xxxxxxxx
ROMCS = /A19*/A18*/A17*/A16*A15*A14*A13

; IO addresses are mapped below ROM in bank 0.
; prefix is 0000 110xxxxx xxxxxxxx
IOCS = /A19*/A18*/A17*/A16*A15*A14*/A13

; RAM is selected for any address does *not* match 0000 11xxxxxx xxxxxxxx
RAMCS = A19 + A18 + A17 + A16 + /A15 + /A14

; Qualify writes with clock
WR = Clock*/RW


Address decode logic for 65C816 computer.

ROM is top 8K of bank 0, I/O is 8k below that, everything else is RAM.

I assembled this with galette:

galette address_decode.pld

This generates a .jed file, which can be used to program the chip. The ATF22LV10 is not on the compatibility list for my programmer, but it works just fine if I identify it as an ATF22V10CQZ.

minipro -p ATF22V10CQZ -w address_decode.jed

Firmware and testing

I came up with a simple test program which would require working ROM, RAM, and I/O. The CPU boots to 6502 emulation mode, and this test program does not access memory outside bank 0.

I created a linker configuration file which mostly reflects the system’s memory map, and called it system.cfg.

    ZP:     start = $00,    size = $0100, type = rw, file = "";
    RAM:    start = $0100,  size = $7e00, type = rw, file = "";
    PRG:    start = $e000,  size = $2000, type = ro, file = %O, fill = yes, fillval = $00;

    ZEROPAGE: load = ZP,  type = zp;
    BSS:      load = RAM, type = bss;
    CODE:     load = PRG, type = ro,  start = $e000;
    VECTORS:  load = PRG, type = ro,  start = $fffa;

I then wrote this test program, main.s. The program runs from ROM, and writes a hard-coded value to one of the output ports. It then stores a 0 in RAM, then increments it and stores that to an output port in an infinite loop, which will show a counting pattern if RAM can be read and written.

PORTB = $c000
DDRB = $c002


.segment "CODE"

; Write 42 to VIA port B
  lda #%11111111 ; Set all pins to output
  sta DDRB

  lda #42
  sta PORTB

; Start at 0, and increment a value in RAM
  lda #00

  sta PORTB         ; Output the value
  jmp loop



.segment "VECTORS"
.word nmi
.word reset
.word irq

I assembled this with ca65, then linked it with ld65. As far as the assembler knows, it is generating code for a 6502 CPU, not a 65C816. I am not using any advanced features, so this should be fine for this test.

ca65 main.s
ld65 -o rom.bin -C system.cfg main.o

This outputs an 8 KiB binary file, which is the amount of ROM space mapped in the computer’s memory. The chip is actually 32 KiB though, so I filled the rest with zeroes.

truncate -s 32768 rom.bin

I then wrote the program to the EEPROM using minipro.

minipro -p AT28C256 -w rom.bin

I ran this program, with a logic analyser connected to port B of the 65C22 VIA. The capture below shows the initial hard-coded number, followed by the counting pattern, which indicates that everything is working as expected.


I’m starting simple here, so that I have a working base system to modify. Everything seems to be working fine on breadboards, so I’ll proceed with that for now. I may still need to re-build everything with shorter wires, since this is getting quite difficult to work with.

It is also worth noting that most of the components here should run at 3.3 volts, which is where this project is eventually headed. Only the clock, reset IC and EEPROM would need to be substituted at the moment.

This blog post is only a snapshot of the project. You can check the current status on GitHub here.

A first look at the 65C816 processor

The 65C816 is an interesting processor from the 1980’s. I recently wired one up on a breadboard, and I’ll be blogging here about my attempts to build something useful with it.

How I got here

I spent a couple of months last year designing and building a computer based on the 65C02, an old 8 bit processor. I wanted to create a hardware platform for running 6502 assembly programs. This was successful, and I learned a lot about how computers work while building and programming it.

This has got me interested in the 65C816, which is a later extension of the same architecture. The only well-known systems which used it were the Super Nintendo and Apple IIGS. From a programmer’s point of view, this chip seems far more capable than its pre-decessor, though developer tools are a bit scarce.

New project goals

My plan is to build a modern computer which uses the 65C816 CPU, so that I can really learn how it works.

Over a few revisions, I am aiming to build up to a system with a serial connection for simple text I/O, a modest clock speed, and 1 megabyte of static RAM.

I’ve researched some existing designs, and settled on two simple constraints to give this project its own character.

  1. Use only in-production parts.
  2. Don’t use an FPGA or microcontroller to bootstrap the system.

This will ensure that I am learning what it takes to build a computer around this processor, and not just offloading the tricky parts to more powerful device with better tooling. I also hope this increases the accessibility of the design, since anybody could build it without needing to obtain obscure retro parts.

I am not avoiding programmable logic entirely though. Where classic designs often used custom chips, I will be using ATF22V10 PLD’s. I will also leave the door open adding video output via a microcontroller-based terminal emulator in the future.

Lastly, I’m selecting parts with a view to converting everything from 5 volts to 3.3 volts part-way through the project, which will open up many possibilities.

Other than the Apple IIGS and Super Nintendo, there are three hobbyist designs which I’m using for inspiration:

A quick note about open source

It’s never been a stated goal of this blog, but wherever possible, I use open source tools, and produce open source software. I expect that to be difficult for this project. The 65C816 not widely used, and most hobbyist code for the CPU is licensed as source-available freeware, with restrictions on commercial use.

I would like to be able to release my code under an OSI-approved open source license, and have the option of incorporating copyleft code later on. Unfortunately for me, it seems I will need to write a lot of low-level code from scratch to get this kind of licensing certainty. I will be working primarily in assembly language, though C code would be very useful, so I will certainly be spending some time investigating compiler options along the way.

A smoke test

This is the first iteration of the design, which will live for now on a series of solderless breadboards.

This is simply a 65C816 processor, being fed a hard-coded NOP (no-operation) command. When it runs, the lights show the address bus counting upwards, as the CPU program counter increases.

The RDY, ABORTB, and BE inputs are connected to +5v through resistors, while RESB, NMIB, IRQB are connected to +5v directly. The NOP opcode is fed to the CPU over the data bus, by making the pattern 11101010 (0xEA) with 1k resistors. The schematic for this is below.

The clock is an LM555 timer, fed through a 74AC04 inverter, because the rise and fall times on a 555 are not fast enough. The clock speed is 6.9Hz (that is not a typo), since R1 is 10 kΩ, and R2 is 100 kΩ, and the capacitor is 1 µF.

Next steps

This is going to be a long project, which I will develop incrementally. The next step will be adding ROM, RAM, glue logic and a 65C22 I/O chip.

I am also testing alternatives to the 65C51N UART chip, which I found quite limited when I used it for my last project.