Porting the Amiga bouncing ball demo to the NES

Earlier this year, I ported the Amiga bouncing ball demo to the Nintendo Entertainment System. This is a video capture from a NES emulator.

I completed this back in January, but I’m only publishing this blog post now, because I was considering entering it into a demo competition.

How it works

If you are familiar with NES development, then you will probably notice that there is nothing ground-breaking going on here. The ball is made up of 64 8×8 sprites, bouncing around the screen over a static background.

There are two different versions of the ball in sprite memory, and with some palette swapping, this can be stretched to 4 frames of animation, just enough to make the ball appear to spin.

There is enough sprite memory to extend this to 8 frames of animation without any banking tricks, but I would need to start again from scratch to achieve that.

Pre-rendered 3D

I drew the ball in Blender, with twice the number of segments required for the final image. This is a UV sphere with 8 rings, and 32 segments.

I coloured each face with one of 4 colours, then rendered it with the Workbench renderer, which I had configured to use Flat lighting, no anti-aliasing, and the texture colour.

I also tilted the ball.

This gave me a crisp, high-resolution ball with solid colours.

To test the idea, I processed this down to a 64×64 image on a transparent background. I need to substitute colours, so there is no antialiasing here.

I then wrote up a Python script to swap out colours to make the 4-frame pattern. Note that frames 3 and 4 are the same as frames 1 and 2, but with white and red swapped.

I ran this through ImageMagick to convert it into a GIF preview.

convert -delay 3.5 -loop 0 *.png animation.gif

The result appeared workable, so I went about making the same animation in a NES rom.

NES implmentation

On the NES, it’s not possible to create the spinning ball effect with palette changes only, because it would require four colours plus transparency. To overcome this, I split the image into two frames, each stored as two colours plus transparency, which is possible.

To start the code, I checked out a fresh copy of Brad Smith’s NES example project, then deleted things that I didn’t need.

When working with sprites on the NES, it is typical to store object attributes in RAM, then perform a DMA operation once per frame to copy this over to the Picture Processing Unit (PPU). For this project, I wanted to try setting up two different copies of the object attribute memory in RAM – one for each of the two rotations.

This should allow me to set any of the four frames by choosing between two possible colour palettes, and two possible DMA source addresses. This worked, and the first milestone was a spinning ball.

Switching between two DMA sources did not save much effort in the end, because I still needed quite a lot of code to set the X/Y positions of 64 sprites each frame. I set up some assembler macros to help with this.

Physics and sound

I have made one NES game before, and I was not happy with the physics. For this project, I wanted to do a bit better.

For the X position, I use a fixed speed, and simple collision detection with left/right boundaries. The animation frame is calculated from the X position, so the ball changes rotation depending on which direction it is moving, just like the Amiga demo.

The Y location of the ball is a simple loop, and follows the absolute value of a sine wave. I pre-computed this with some Python.

My last project also had no sound, so I read up on the NES APU, and added some noise when the ball changes direction.

Development setup

Since my last NES project, I have improved my development setup quite a bit. As always, I am using the ca65 assembler on Linux.

Previously, I was using a text editor. I have since moved to a custom 6502 assembly plugin on PyCharm, which allows me to quickly jump to definitions/usages. I developed this for my other 65C02 and 65C816 projects, which you can read about on this blog.

When I click run, I’ve set up the project to assemble a NES ROM, then launch with fceux, which has debugging features.

The debug features were previously only available on Windows builds of fceux, so when I made this demo, I was running the emulator via WINE. This has been added to the Linux builds as of v2.5.0, so I’ll most likely switch to that for my next project.

Wrap-up

I learn something new every time I write code for the NES, and it’s a lot of fun to make simple demos for these old systems. It’s also refreshing to create something standalone which I can explain through screenshots, instead of pages of assembly code.

For those who do want to read pages of assembly code, though, this project is up on GitHub at mike42/nes-ball-demo.

Interfacing an NXP UART to an 8-bit computer

I recently tested the SC16C752 UART from NXP with my 6502 computer, and I was able to use it to receive and send characters. This blog post shows the simple test program and circuit that I used.

Some background

I built a 65C02-based computer last year, which uses a serial connection for text I/O. I used the WDC 65C51N to provide a UART interface, and I found it to be quite limited.

I am now building a 16-bit computer, and I want to avoid the 65C51N, for two main reasons:

  • It runs at 5v, and I’m planning to convert the design to 3.3v later on. The other 65-series chips I’m using support a wide voltage range.
  • It can only store one byte at a time. I am planning to run at a higher baud rate, and the system will have multiple interrupt sources. It would take very careful programming to avoid losing bytes with a 65C51N in this setup.

I spent some time investigating alternative UART IC’s, and found some discussion on 6502.org about replacing the 6551 with an NXP industrial (SC26 or SC28 series) UART. At the time of writing, many of the hobbyist-friendly PLCC versions of these chips are listed as end of life, though I’ve since found pin and software-compatible versions of this made by MaxLinear (eg. the XR88C92).

For my uses though, I chose the NXP SC16C752. It has two channels, 64-byte receive and transmit buffers, and supports a wide voltage range. I don’t mind that it’s a different series to these standard chips, because I plan to write software from scratch. The only downside is the packaging, which is LQFP48 with 0.5mm pin pitch. I needed to solder this tiny chip to an adapter board in order to use it.

The test circuit

I wired this up to my 6502 computer project as shown in the diagram below. IO3 is a spare address decoding output, which maps the chip to address $8c00, while the rest of the signals connect to the 6502 CPU directly.

I used a 74AC00 and 74AC04 to generate the active-high reset and separate read/write signals (as suggested on the same thread linked before). I plan to use a PLD to generate a qualified write signals on my 65C816 computer build.

Note that there are separate chip-selects for each UART, and I’ve only connected one here. I have also not connected the interrupt output for this test, and have some unused inputs which are left floating.

Developing a test program

I started writing some code, and connected a logic analyser to the UART output. The early tests did not work well, and I now know that this is because I had not enabled the FIFOs, causing characters to be lost.

This screen capture shows an attempt to send the entire alphabet to the UART, but only the letters “A”, “B” and “M” appearing at the output.

I came up with this test program, which does 8-N-1 communication at 115,200 bits per second.

;
; Test program for the SC16C752B dual UART from NXP
;
.import sys_exit

; Assuming /CSA is connected to IO3
UART_BASE = $8c00

; UART registers
; Not included: FIFO ready register, Xon1 Xon2 words.
UART_RHR = UART_BASE        ; Receiver Holding Register (RHR) - R
UART_THR = UART_BASE        ; Transmit Holding Register (THR) - W
UART_IER = UART_BASE + 1    ; Interrupt Enable Register (IER) - R/W
UART_IIR = UART_BASE + 2    ; Interrupt Identification Register (IIR) - R
UART_FCR = UART_BASE + 2    ; FIFO Control Register (FCR) - W
UART_LCR = UART_BASE + 3    ; Line Control Register (LCR) - R/W
UART_MCR = UART_BASE + 4    ; Modem Control Register (MCR) - R/W
UART_LSR = UART_BASE + 5    ; Line Status Register (LSR) - R
UART_MSR = UART_BASE + 6    ; Modem Status Register (MSR) - R
UART_SPR = UART_BASE + 7    ; Scratchpad Register (SPR) - R/W
; Different meaning when LCR is logic 1
UART_DLL = UART_BASE        ; Divisor latch LSB - R/W
UART_DLM = UART_BASE + 1    ; Divisor latch MSB - R/W
; Different meaning when LCR is %1011 1111 ($bh).
UART_EFR = UART_BASE + 2    ; Enhanced Feature Register (EFR) - R/W
UART_TCR = UART_BASE + 6    ; Transmission Control Register (TCR) - R/W
UART_TLR = UART_BASE + 7    ; Trigger Level Register (TLR) - R/W

.segment "CODE"
main:
  lda #$80                  ; Enable divisor latches
  sta UART_LCR
  lda #1                    ; Set divisior to 1 - on a 1.8432 MHZ XTAL1, this gets 115200bps.
  sta UART_DLL
  lda #0
  sta UART_DLM
  lda #%00010111            ; Sets up 8-n-1
  sta UART_LCR
  lda #%00001111            ; Enable FIFO, set DMA mode 1
  sta UART_FCR
  jsr test_print
  jsr test_recv
  lda #0
  jmp sys_exit

test_print:                 ; Send test string to UART
  ldx #0
@test_char:
  lda test_string, X
  beq @test_done
  sta UART_THR
  inx
  jmp @test_char
@test_done:
  rts

test_string: .asciiz "Hello, world!"

test_recv:                  ; Receive three characters and echo them back
  jsr uart_recv_char
  sta UART_THR
  jsr uart_recv_char
  sta UART_THR
  jsr uart_recv_char
  sta UART_THR
  rts

uart_recv_char:             ; Receive next char to A register
  lda UART_LSR
  and #%00000001
  cmp #%00000001
  bne uart_recv_char
  lda UART_RHR
  rts

The program sends the text “Hello, world!”, then waits for input. The next three characters it receives are sent back to the user, and then the program terminates. The only thing in this test program which is specific to my 6502 computer is sys_exit, which a routine to transfer control back to the operating system.

It was very rewarding to see the fast, error-free text output on the logic analyser for the first time. This is a great improvement from the 65C51N.

I was also surprised at how short the delay is to echo back a character. It will be interesting to see how much the latency increases once there is some more complex software handling the input.

Wrap up and next steps

I plan to use the NXP SC16C752 for my 65C816 computer project, which will be controlled via text-based input and output.

I’m testing it on its own first, so that I can refer to a known-good setup if it doesn’t work. This is enough to get started, though there are some features which I wasn’t able to test yet, since I can’t use custom interrupt service routines on this setup.

Rendering my 6502 computer project in Blender

For the past couple of months, I’ve been working on building a 6502-based computer from scratch.

In a previous blog post, I wrote about designing a 3D-printed enclosure for this project in Blender.

During this build, I needed to wait for some parts, so I had some time to see what I could do with the 3D models of the case and circuit board. I ended up animating the assembly process, and learned a couple of tricks along the way.

Splitting components and PCB

I already had a 3D model of the circuit board and components. This was initially exported from KiCad, in STEP format. I then used FreeCAD to convert it from STEP to STL, which I imported into Blender for designing the case. I also repeated this export with some different footprints, since I needed sockets and microchips.

It is a challenge to make this look good, because the circuit board and all components come in as one object, and have lost all of the detail from the KiCad 3D view.

I spent a lot of time splitting objects, then selecting faces so that I could assign different materials. I’m using a free BlenderKit account for materials, and using procedural materials where possible.

I also modeled the DC barrel jack and slide switches, which were the first of many components which were missing from the original KiCad export. Still, something was missing.

Adding missing details

I thought that the PCB in the previous image was too bare. The real PCB uses plated through-holes, and almost every hole in a real PCB has a shiny solder pad around it.

To fix this, I exported the front solder-mask layer from KiCad as an SVG, which looks something like this:

I imported this SVG into Blender, scaled it to fit the PCB, extruded it, then went through a series of boolean modifiers. This resulted in a new object which contained every part of the PCB which was not covered in solder mask. I assigned this a shiny metal material.

After this was applied, I subtracted this from the original PCB model, giving the parts of the PCB which are covered in solder mask.

Showing both of these at the same time, I have a PCB with solder pads and plated through-holes.

Importing the silkscreen

This still was not right. On the real PCB, everything is labelled.

I exported the front silkscreen layer from KiCad as an SVG. This layer looks something like this:

I then used InkScape to change this to white text on a transparent background, before finally exporting it as a PNG. I mixed this into the procedural texture which I was using for the PCB, using the transparency as the mix factor. As a final touch, I also exported the front copper layer, and used this as the displacement input for the final material. The nodes are shown in the screen capture below.

This allows me to render a PCB which looks a lot like the real thing.

UV unwrapping everything

In the earlier render, all of the chips had clear surfaces, while real chips are covered with text and logos.

I created small transparent/brown PNG images for each chip, using Inkscape, then UV unwrapped the chips. I mixed the logo into the texture, much the same as I did for the PCB.

I also UV unwrapped each resistor, in order to paint on the correct colour codes. This was perhaps the most egregious waste of time in this whole thing.

Adding final parts

There were plenty more parts which I couldn’t export from KiCad, and needed to model from scratch. I used the bolt factory add-on to create screws, nuts, and hex standoffs (both a screw and a nut, combined).

I also added power and reset buttons (which are partly threaded), plus the speaker. I decided not to add wires, since it would be beyond my skill at the moment.

When everything is displayed at once, a lot of this detail is hidden behind other objects.

Animating

With that last render, I decided to see if I could make an animation of all of the parts coming together. I’ve never animated anything in Blender, so I had a lot to learn.

I added a tracking constraint to the camera, which is panning around an invisible object on the table, and animated everything using key-framed movement. Parts are raised out of view with slightly randomised rotation/location before falling into place.

Rendering

I exported the animation from Blender as 360 separate 1920 x 1080 images, to be played back at 30 frames per second. I would normally use ffmpeg to convert an image sequence into a video, but I wanted to try adding an audio track.

I imported the rendered animation into Kdenlive, which can process image sequences (the button is “Add clip” / “Import as sequence”), fetched a track from the Free Music Archive (Scanglobe – Current. CC BY-NC), and rendered the project.

Wrap-up

I wanted to get a textured, 3D model of this entire project so that I could easily illustrate documentation and project pages. I definitely achieved this, though working with the exported circuit board and components in Blender was tedious.

The version of Blender used here is 2.93, and the files were exported from KiCad 5.1.10.

Assembling my 6502 computer

I recently finished assembling my 6502 computer, so that the whole project is housed in a 3D-printed case.

This is something that should be easy. I already blogged about making the case, and making the circuit board. All that is left is to put the board in the case, right? Well thanks to some mistakes on my part, things are not quite so simple.

More parts

I needed quite a few parts to finish this project, including a power button, reset button, hex standoffs, short screws, and rubber feet. Since this is the first time I’m hitting this stage of a project, I also needed to buy a Du Pont connector kit, stranded wire, heat-shrink tube, and glue.

I have been powering the project from a USB/serial adapter, so I also bought a new one to install permanently. A full parts list is available in the GitHub repository for this project.

I also needed to widen the hole for the reset button due to a mix-up. The specific part I used requires a 7mm hole, while I had designed the case for a button with a 6mm hole.

Making connections

The first step was to modify the board so that it would be compatible with the new UART, which involved de-soldering the female pin headers which I installed previously, and replacing them with male pin headers. I used only a soldering iron, solder wick, and flux for de-soldering, and it was not a fun process. The board survived, so I’ll call it a success.

Next, I soldered wires to the underside of the board for the power button, reset button, and speaker to connect. Each of these is a pair of wires, soldered at one end, and a 2-pin female Du Pont connector on the other.

I attached corresponding connectors on the buttons and speaker. The power button also has an in-built LED, and there are plenty of places to power this from the board’s expansion headers. I added a 2-pin female Du Pont connector, plus a resistor, neatly hidden by heat-shrink.

I checked everything with a multimeter before powering it on. The UART pins were connected correctly to the 65C51N chip and ground. No continuity between the board’s power and the external power unless the power button is pressed, no continuity between the reset line and ground unless the reset button is pressed, and so on.

Troubleshooting

When I powered on the computer, nothing came up on the terminal, which means that I broke something. I tested some voltages, and found the computer was stuck in reset.

The reset signal on the 6502 is active-low, and I’ve added a supervisory IC. Any time the reset signal is low, or the power drops below a minimum voltage, this IC holds it low for 150ms before returning it to +5v. I was confident that the reset buttons and power voltage were fine, so I suspected that one one of the other chips could be causing the problem.

Everything is socketed, so I removed the 65C51N and tested again. This chip is closest to the de-soldering work I was doing earlier, so I thought I may have damaged it.

There was no change, so I started removing other chips, and found that after removing the CPU, the computer was no longer getting stuck in reset.

Following that lead, I found that one of the data bus lines, D4, was connected to the reset line. This would cause activity on the data bus to reset the computer, which would cause this problem.

A quick look at the board in KiCad showed that at one of the points I had attached the reset button, D4 is 0.5mm away from the reset line.

I de-soldered that wire and found a tiny break in the solder mask on D4. This was easy enough to fix, I simply attached the wire for the external reset button elsewhere.

I also bent the CPU pins when re-installing, but after fixing that, I was back in business.

SD card changes

I have been writing some test programs which use a microSD card via one of the 65C22 I/O ports, and wanted to fit this in the case as well.

I swapped the Adafruit ADA254 microSD break-out for the SparkFun DEV-13743. It is slightly smaller, includes pull-up resistors, and still provides level-shifting.

I made a small adapter by attaching a female pin header to a Du Pont housing with superglue. I couldn’t get the “card detect” feature to work, so I wired it up differently to my previous blog post, and updated the code to match. Details of the new wiring is on GitHub.

I also added some masking tape when I installed this to prevent any short circuits. The metal on the outside of the microSD socket is connected to ground, and it could touch some IC pins.

Quick lessons

This is only a learning project for me, so I am trying to take some lessons away from each part of it.

The high-level lessons would be to plan for external connections before building the PCB, and to have all parts on hand before designing the case. That would have saved a lot of re-work, though adapting a design without starting from scratch is a valuable skill in itself.

If I were starting from scratch, I would add pin headers to the board for connecting the power and reset buttons instead of including buttons on the board, and remove the (now unused) DC barrel jack as a power input. I would also make the board larger, and the case taller, so that everything is less cramped. It takes a lot of force to remove chips from machined pin sockets, and the board is too compact to get a screwdriver under the chips.

SD card support is also a bit of a hack, and could be built-in to the board with some effort.

Final results

I now have a fully self-contained 8 bit computer, and it looks more like a finished project than an in-progress experiment. It boots to BASIC, or with the flip of a switch, launches a ROM which allows me to load machine code assembled on my PC.

With the lid removed, I can access the 40-pin expansion header, and one spare I/O port.

The hardware side of this project is now done, and I have a platform for testing any 6502 assembly code on real hardware.

Adding an SD card reader to my 6502 computer

I recently attempted to add SD card support to my home-built 6502 computer.

This was successful, but more challenging than I expected. This blog post is just a few things that I learned along the way.

Background

SD cards are a popular way to add storage to electronics projects. In my case, I will be using the SD card in SPI mode. My home-built computer has an expansion port with some GPIO pins, which should be sufficient for this.

My basic plan was to:

  • write some test data to an SD card from a modern computer
  • interface the SD card to my home-built 6502 computer
  • write a 6502 assembly program to print out the test data

Since this is a learning project, I started from scratch, and avoided reading any existing implementations of SPI or SD cards before jumping in.

Test data

I started this project by generating a small test file, repeating each character in the alphabet 512 times.

echo abcdefghijklmnopqrstuvwxyz | fold -w1 | while read line; do yes $line | head -n512 ; done | tr -d "\n" > disk.img

It takes a one-line shell command to write this image to an SD card on all good operating systems, or a few clicks in the disk imaging utility.

dd if=disk.img of=/dev/my_sd_card

My laptop has an SD card slot, but I also have Windows on it at the moment, which turned this into a far larger problem than I could have imagined.

Driver support for this particular SD card reader was dropped in Windows 10, there are no built-in tools for imaging disks on Windows, and I had to try multiple tools before I could find one which would write a raw disk image (one which does not contain a valid filesystem).

In retrospect, this was the first of many lessons about not having the right tools for this project.

Physical interface

I connected an SD card module to my 6502 computer, via a breadboard. I used a module which could level-shift, because my computer runs at 5V, where SD cards usually run at 3.3V. This particular module is an Adafruit ADA254.

I also added pull-up resistors to all of the data lines, after noticing that there were several sitations where they would float. The wiring I used is:

The full schematic for my 6502 computer is on this blog post, while the schematic for the SD card module is online here.

Software: Part 1

With some test data and a hardware interface, I got to the third step, which was to read data from the SD card. I was completely unprepared for how tricky it is to do this from scratch, without the right experience or debugging tools.

As a quick re-cap, my development environment is still very limited: I am writing 6502 assembly on a modern computer, and I have the ability to send the assembled program to my home-built 6502 computer over a serial connection.

I started by using lecture notes from the University at Buffalo’s EE-379 course, available online here. This describes SPI, SD card command structure, and the sequence to initialise a card.

The two protocols I needed to implement are SPI, which transfers the bits, plus the SD card commands which run over that. I had to write quite a lot of code with no feedback, because it takes several commands before SD cards are ready to work with in SPI mode, and the exact details vary depending on the generation of card (SDHC vs SDXC, etc).

After days of debugging, I was unable to get a response from the SD card. With few other options, I started filling my code with the assembly-language equivalent of print statements, and found that every byte from the SD card was the default FF. This means ‘no data’ in this context.

Print statements also slowed the program down to a crawl, which allowed me to use a pizeo buzzer to listen to each line. This clicks each time a signal changes between 0 and 1, so I was able to check that each line had the expected activity. I found that the SD card was sending responses, but that the computer was not receiving them, because they were not connected to eachother on the breadboard.

Software: Part 2

After connecting the card correctly, I could see responses for the first time. The format here is byte sent / byte received, with line breaks between commands.

#
SD card detected
40/7f 00/ff 00/ff 00/ff 00/ff 95/ff ff/ff ff/01 ff/ff
48/7f 00/ff 00/ff 01/ff aa/ff 0f/ff ff/ff ff/09 ff/ff ff/ff ff/ff ff/ff
7a/7f 00/ff 00/ff 00/ff 00/ff 75/ff ff/ff ff/01 ff/00 ff/ff ff/80 ff/00 ff/ff ff/ff
#

The correct initialisation sequence depends on the type of SD card, and I spent a lot of time going through each possibility, looking for one one which worked. When this failed, I found this summary online, which includes more information about decoding the responses: Secure Digital (SD) Card Spec and Info.

As it turned out, I was not sending the correct CRC for CMD8, and a CRC error is indicated by the 09 (binary 0000 1001) response to the second command above. Each bit in the response has a different meaning, and I had mistaken this for indicating an unsupported command, which is expected for some card types.

It’s a good thing that I had this problem, because searching for the CRC algorithm turned up this collection of AVR C tutorials, which includes a 4-part series on implementing SD card support.

I noticed that all of the command examples were wrapped with code like this:

    // assert chip select
    SPI_transfer(0xFF);
    CS_ENABLE();
    SPI_transfer(0xFF);
    // ... code to send a command...
    // deassert chip select
    SPI_transfer(0xFF);
    CS_DISABLE();
    SPI_transfer(0xFF);

Later on, I had a problem where responses were sometimes mis-aligned by one bit. When I also tried sending empty bytes between commands, while the SD card is not selected, these problems disappeared entirely.

Software: Part 3

With the initialisation code complete, I could finally request data from the SD card.

Data on an SD card is arranged in 512-byte blocks by default. I wrote a routine to read one block of data from the SD card into RAM, plus a routine to print sections of RAM.

The result was a program which prints the first kilobyte of the SD card, which is the letter a 512 times, followed by the letter b 512 times.

This screen capture shows the output of a program matching a hexdump of the test file, which is exactly what I was aiming for.

The full code is too long to include in a blog post, but can be found in the repository for this project, mike42/6502-computer on GitHub.

Some lessons

While I was able to read data from an SD card, this took a long time, and there were several points where I nearly hit a dead-end.

I couldn’t see what was happening at a hardware level, and I couldn’t compare my setup with one that worked, because I had never done anything similar before. Spot-the-difference is a powerful debugging tool, and by jumping straight into 6502 assembly, while avoiding existing 6502 implementations, I definitely made things difficult for myself.

With that in mind, there are several ways I could have made this easier:

  • Better equipment. A logic analyser, an SD card writer, and spare SD cards would have been useful for testing.
  • Building somewhere less-constrained first. I could have implemented this first on a Raspberry Pi, then ported the result to 6502 assembly. It would then be possible to verify my hardware setup by running existing code, while still getting the learning experience from writing my implementation from scratch. Note that this would have required a different SD card break-out board.
  • Building something simpler first. For example, Ben Eater released a a YouTube video around the time I was working on this, where he reads from an SPI temperature sensor on a 6502 computer. It would be much easier to get started if I had used any other SPI device on a 6502 first.

The next step in this project is to load programs from an SD card, which would require me to implement a filesystem. This is significantly more complex than what I’ve done here, and I know not to approach it the same way.

More equipment

I picked up a logic analyser, USB SD card reader, alternative SD card module, and a spare SD card.

The logic analyser is compatible with sigrok, which is open source. This is the first time I’ve used this software, and it can decode SD card commands running over SPI, among other things.

The USB SD card reader will allow me to write disk images directly from my development environment, instead of copying them over to a laptop. I do most of my development from a virtual machine, and USB peripherals are easy to pass though.

The alternative SD card module is from SparkFun. Compared with the module I was writing in this blog post, it has a smaller footprint, and includes the pull-up resistors. I’m hoping I can use this to fit an SD card into the 3D-printed enclosure for this project.

Implementing the XMODEM protocol for file transfer

I recently built a 6502-based computer from scratch, and I’m using it as a platform for testing 6502 assembly code. The software on this computer is currently very limited, and re-programming an EEPROM chip to test changes is a slow process.

This blog post covers changes which I made to implement the XMODEM protocol, allowing me to upload programs over a serial connection.

The protocol itself

I chose XMODEM because it’s the simplest protocol which is compatible with modern terminal software. All I need right now is the ability to place some data (machine code) into RAM, so that I can execute it.

An important note that is that modern file transfer protocols would normally run over TCP/IP, while XMODEM runs over a serial connection. This works for me, since it’s the only interface I’ve got.

For what it’s worth, XMODEM is also era-appropriate technology for this retro computer, since it was first implemented in 1977. The 6502 processor came out in 1975.

Implementation

Implementing XMODEM was the easy part of this process. There are existing 6502 assembly implementations which I could have used, but I decided to work from the description on the Wikipedia article instead. This is partly to make sure I understand it, and partly to keep the software licensing of my ROM as clear as possible.

I have the beginnings of a shell for running commands from ROM, so I implemented two new commands:

  • rx, to receive a file over XMODEM and load it into memory.
  • run to jump to the program

To start the transfer, I found it useful to write a method for reading a character with a time-out. I already have routines for reading and writing characters from serial.

; Like acia_recv_char, but terminates after a short time if nothing is received
shell_rx_receive_with_timeout:
  ldy #$ff
@y_loop:
  ldx #$ff
@x_loop:
  lda ACIA_STATUS              ; check ACIA status in inner loop
  and #$08                     ; mask rx buffer status flag
  bne @rx_got_char
  dex
  cpx #0
  bne @x_loop
  dey
  cpy #0
  bne @y_loop
  clc                          ; no byte received in time
  rts
@rx_got_char:
  lda ACIA_RX                  ; get byte from ACIA data port
  sec                          ; set carry bit
  rts

The entire receive process is around 60 lines of assembly code. The process involves ASCII NAK character repeatedly until a SOH character is received, which indicates the start of a 128 byte packet. All padding around each packet is discarded, and the payload is written to memory. This process is repeated until the end of transmission (EOT) is received.

; Built-in command: rx
; recives a file over XMODEM protocol, and loads it into memory
USER_PROGRAM_START = $0400   ; Address for start of user programs
USER_PROGRAM_WRITE_PTR = $00 ; ZP address for writing user program

shell_rx_main:
  ; Set pointers
  lda #<USER_PROGRAM_START   ; Low byte first
  sta USER_PROGRAM_WRITE_PTR
  lda #>USER_PROGRAM_START   ; High byte next
  sta USER_PROGRAM_WRITE_PTR + 1
  ; Delay so that we can set up file send
  lda #1                     ; wait ~1 second
  jsr shell_rx_sleep_seconds
  ; NAK, ACK once
@shell_block_nak:
  lda #$15                   ; NAK gets started
  jsr acia_print_char
  lda SPEAKER                ; Click each time we send a NAK or ACK
  jsr shell_rx_receive_with_timeout  ; Check in loop w/ timeout
  bcc @shell_block_nak       ; Not received yet
  cmp #$01                   ; If we do have char, should be SOH
  bne @shell_rx_fail         ; Terminate transfer if we don't get SOH
@shell_rx_block:
  ; Receive one block
  jsr acia_recv_char         ; Block number
  jsr acia_recv_char         ; Inverse block number
  ldy #0                     ; Start at char 0
@shell_rx_char:
  jsr acia_recv_char
  sta (USER_PROGRAM_WRITE_PTR), Y
  iny
  cpy #128
  bne @shell_rx_char
  jsr acia_recv_char         ; Checksum - TODO verify this and jump to shell_block_nak to repeat if not mathing
  lda #$06                   ; ACK the packet
  jsr acia_print_char
  lda SPEAKER                ; Click each time we send a NAK or ACK
  jsr acia_recv_char
  cmp #$04                   ; EOT char, no more blocks
  beq @shell_rx_done
  cmp #$01                   ; SOH char, next block on the way
  bne @shell_block_nak       ; Anything else fail transfer
  lda USER_PROGRAM_WRITE_PTR ; This next part moves write pointer along by 128 bytes
  cmp #$00
  beq @block_half_advance
  lda #$00                   ; If low byte != 0, set to 0 and inc high byte
  sta USER_PROGRAM_WRITE_PTR
  inc USER_PROGRAM_WRITE_PTR + 1
  jmp @shell_rx_block
@block_half_advance:         ; If low byte = 0, set it to 128
  lda #$80
  sta USER_PROGRAM_WRITE_PTR
  jmp @shell_rx_block
@shell_rx_done:
  lda #$6                    ; ACK the EOT as well.
  jsr acia_print_char
  lda SPEAKER                ; Click each time we send a NAK or ACK
  lda #1                     ; wait a moment (printing does not work otherwise..)
  jsr shell_rx_sleep_seconds
  jsr shell_rx_print_user_program
  jsr shell_newline
  lda #0
  jmp sys_exit
@shell_rx_fail:
  lda #1
  jmp sys_exit

This code is the first time that I’ve used the “Zero Page Indirect Indexed with Y” address mode. It uses a pointer to the program’s location, which is incremented 128 each time a packet is accepted.

The index in the current packet is stored in the Y register, allowing each incoming byte to be written to memory with one operation, which is called in a tight loop.

sta (USER_PROGRAM_WRITE_PTR), Y

The other routines mentioned in this source listing are:

  • shell_rx_sleep_seconds – sleeps for the number of seconds in the A register.
  • shell_rx_print_user_program – prints the first 256 bytes of the program for verification.

The checksum is entirely ignored. Everything here is running on my desk, so I’m not too worried about transmission errors.

After a few iterations, I was able to upload placeholder text to RAM.

The screen capture shows 256 bytes of memory (2 packets in XMODEM). The ASCII SUB character (1a in hex) fills out unused bytes at the end, since everything sent over XMODEM needs to be a multiple of 128 bytes.

Assembling some programs

I thought I was almost done at this point, but it turns out that writing a “Hello World” test program to run from RAM is easier said than done.

I started by creating a new ld65 linker configuration which links code to run from address $0400 onwards, which is where programs are loaded into RAM. This system does not have a loader for re-locatable code, and absolute addresses are assigned by the linker.

This is the configuration which I used for these tests. It has some flaws, but it’s good enough to place the CODE segment in the right place.

# Test configuration for user programs
MEMORY {
    ZP:     start = $00,    size = $0100, type = rw, file = "";
    RAM:    start = $0100,  size = $0300, type = rw, file = "";
    MAIN:   start = $0400,  size = $1000, type = rw, file = %O;
    ROM:    start = $C000,  size = $4000, type = ro, file = "";
}

SEGMENTS {
    ZEROPAGE: load = ZP,  type = zp;
    BSS:      load = RAM, type = bss;
    CODE:     load = MAIN, type = ro;
    VECTORS:  load = ROM, type = ro,  start = $FFFA, optional=yes;
}

If I wanted my test program to terminate, then I needed to find the address of sys_exit in ROM. This routine hands control back to the shell after a command completes.

To get this, I re-assembled the ROM, but added the flag --debug-info to the ca65 assembler, and --dbgfile boot.dbg to the ld65 linker. This produces a text file, which contains the final address of each symbol.

sym id=45,name="sys_exit",addrsize=absolute,scope=0,def=28,ref=298+192+426+390+157,val=0xC11E,seg=0,type=lab

The simplest test program I could come up with was:

; location in ROM
sys_exit = $c11e

.segment "CODE"
main:
  jmp sys_exit

I was able to assemble this program to 3 bytes of binary and run it. It does absolutely nothing, and simply returns to the shell.

To extend this into a “Hello World” program, I needed to look up the location of two more routines in the ROM.

; locations of some functions in ROM
acia_print_char = $c020
shell_newline   = $c113
sys_exit        = $c11e

.segment "CODE"
main:
  ldx #0
@test_char:
  lda test_string, X
  beq @test_done
  jsr acia_print_char
  inx
  jmp @test_char
@test_done:
  jsr shell_newline
  lda #0
  jmp sys_exit

test_string: .asciiz "Test program"

This animation shows the process of uploading the binary and executing it via minicom.

There is nothing special about the invocation for minicom here, it’s just a bit-rate and device name, which in my case is:

minicom -b 19200 -D /dev/ttyUSB0

Automating the boring stuff

This approach is very fragile, because the code is full of hard-coded memory addresses.

Any time I change the ROM, the addresses for these routines could change, and this program will no longer work.

On any real architecture, applications call into the OS via system calls, or maybe a jump table for 8-bit systems. I don’t need a stable binary interface, and there is no userspace/kernel distinction, so I decided not to go down that path.

Instead, I wrote a Python script to generate an assembly source file, using the same debug symbols which I was manually reading. This runs after each ROM build.

The format is:

...
.export VIA_T2C_H                        = $8009
.export VIA_T2C_L                        = $8008
.export acia_delay                       := $c02b
.export acia_print_char                  := $c014
...

When this file is assembled, it doesn’t generate any code, but it does define all of the constants and labels from the ROM. When applications link against this, they can .import anything which would be available to ROM-based code.

The start of the above script is re-written as:

.import acia_print_char, shell_newline, sys_exit

This means that I have source compatibility, just not binary compatibility.

Bringing everything together

With my initial goal achieved, I was still not happy with the development experience. I started to dig into the murky world of minicom scripting to see if I could automate the process of uploading/running programs, and found that it is indeed possible.

The scripting capabilities of minicom do not appear to be widely used, and I needed to read through the source code to find ways to work around the problems I encountered.

For example, the “send” command was sending characters too quickly for the target machine. There are hard-coded delays between commands in the interpreter, but you can’t use multiple “send” commands to slow things down, because it is also hard-coded to append a line ending each time.

The fix was to use !<, which sends the output of a command verbatim. This script uses echo -n command to send each character individually. This is the exact same procedure as the animation in the previous section. First, the script types “rx”, then sends the file, then types “run”.

# confirm we have a shell prompt
send ""
sleep 0.1
# put into receive mode
expect "# "
sleep 0.1
!< echo -n 'r'
!< echo -n 'x'
send ""
sleep 0.1
# send the program and wait for prompt
! sx -q $TEST_PROGRAM 2> /dev/null
expect "# "
# run program
sleep 0.1
!< echo -n 'r'
!< echo -n 'u'
!< echo -n 'n'
send ""
sleep 0.1
# Terminate minicom when program completes (bit crazy..)
expect "# "
sleep 0.1
! killall -9 minicom

It’s also worth talking about the end of this script. When minicom scripts end, the user is returned to an interactive session. I wanted the minicom process to terminate without clearing the screen, so I run killall -9 minicom as the last line of the script.

This is not the safest thing to include in a script, but it’s certainly effective, and I am quite certain that there is no built-in way to achieve this with the current version of minicom. Realistic alternatives include patching minicom, switching to different terminal software.

The minicom invocation includes the name of the minicom script to execute, and the script will use the TEST_PROGRAM environment variable to name the binary file to run.

TEST_PROGRAM=test.bin minicom -b 19200 -S minicom_autorun.txt -D /dev/ttyUSB0

When running this command, the program executes on the 6502 computer seamlessly, though minicom leaves the terminal in a broken state when it is killed, and bash prints the “Killed” text to the screen.

To work around this, I wrapped the minicom invocation in a shell script which builds the program, suppresses all errors, and always returns 0. This avoids constant reporting that killall -9 is a problem, but it does hide any other problems.

#!/bin/bash
set -eu -o pipefail
make $1.bin
(TEST_PROGRAM=$1.bin minicom -b 19200 -S minicom_autorun.txt -D /dev/ttyUSB0 || true) 2> /dev/null

This script is invoked as:

./assemble_and_run.sh test

A few months back, I wrote an IntelliJ plugin for 6502 assembly, and I’m writing all of my code in PyCharm. The last step was to get this running from my IDE.

I added a build configuration which invokes the assemble_and_run.sh script. The script is completely generic, and I can use it to run other programs by changing the argument.

I can now write a program, click run, and watch it execute on real hardware, all without leaving my IDE. It took a lot of effort, but this is a huge improvement from where I started.

Wrap-up

A fast compile/test/run cycle is a huge gain for developer productivity on any system. I can now get fast, accurate feedback on whether my 6502 assembly programs work on real hardware. The first program I’m writing involves interfacing with an SD card, which will be a lot more approachable with this improvement.

I was surprised at how tricky it was to assemble and link a small program to run in RAM instead of ROM, and I can see that my linker configuration will need to be improved as I write programs which use more data. My basic plan is to test features by uploading them over XMODEM, then add them to the ROM when they are working.

The full source code for this project, which includes hardware and software, can be found on GitHub at mike42/6502-computer.

Designing a 3D printed enclosure for my KiCad project in Blender

I recently built a small 6502-based computer on a custom PCB, which I designed in KiCad. This blog post is about the process of building a 3D-printed case to house this project, using Blender.

This is my first ever 3D print, so I had a few things to learn.

Quick background

My effort on this project has been focused on making a design which works, then transferring that design to a PCB. Since this is my first electronics project, I’m working sequentially, and have completely ignored the need for a case until now.

I could have saved some time if I positioned the ports and mounting holes to match pre-made electronics enclosures, but I’ll take that as a lesson for my next project.

The requirements for the enclosure are simple:

  • Ability to mount the PCB inside the case
  • Cut-out for power input
  • Cut-outs for cable routing to expansion ports
  • Cut-outs for power and reset buttons

My PCB design files are in KiCad, and I decided to design the case in Blender, since I already know how to use it. Blender is not a CAD tool, but it is excellent for general-purpose 3D work, and will do just fine for this task.

Getting scale

My first challenge was to get the PCB into Blender at the correct scale, so that I could build a model around it. This is mostly just juggling file formats, but also note that Blender works in metres by default. I’ve set it to millimetres, with a scale of 0.001. There are guides on the web about how to set this.

I tried two different methods.

Method 1: Image import

In KiCad, I added two “dimensions” to show the distance between the centre of the mounting holes, then plotted the front copper layer as an SVG.

I used Inkscape to convert from SVG to PNG. I then used GIMP to add cross-hairs to the centre of the mounting holes.

In Blender, I added a single vertex at the origin, then extruded it on the X axis to match the measured dimension. I then added a reference image (the PNG file exported above), and scaled/moved the reference image until the cross-hairs lined up with the vertex.

Method 2: Model import

KiCad can export its 3D model to a STEP file. I imported this file into FreeCAD, then exported it to STL.

Lastly, I imported the STL into Blender.

The scale is correct with both of these methods, because I can align the image and model.

The second method produces much larger .blend files, but has less room to make undetected errors, so I’ll be using it in future.

Building a box

I modeled out the case as a lid and base with 3mm walls, and used boolean modifiers to cut out spaces for buttons and cables.

One challenge was rounding the corners without the two pieces intersecting. I built the box as a square, then beveled it in wire-frame view. The spaces are larger than necessary, and I will try using modifiers slot the pieces together with a fixed tolerance if I need to do this again.

The result was two pieces which sit together but do not attach, with rounded corners and no sharp overhangs, for ease of manufacturing.

Manufacturing and test

I exported the STL files, and sent them to a local manufacturer, since I don’t have my own 3D printer. The print is black PLA, printed with fused deposition modeling, with an 0.2mm layer height and 30% infill.

When I received the parts, I first checked that the two pieces fitted together, then placed a blank PCB over the base to check that the holes lined up. PLA shrinks as it cools, but this aligned perfectly, which is either good luck or correct calibration.

The computer’s board will be installed on standoffs, which are secured by a nut on the other side. I like the idea of using heat-set inserts instead, but I don’t want to try too many new things at once, so I’ve skipped the idea for now.

Next steps

At the time of writing, I’m still waiting for some parts to assemble this computer with a power/reset button, which will be the end of the hardware side of this project.

I’m quite happy with how this turned out as my first ever 3D print. If I need to make enclosures for future projects, I’ll take another look at open source parametric CAD tools such as OpenSCAD and FreeCAD, which are more targeted to this kind of work.

The STL files for this case are available on GitHub at mike42/6502-computer, critique is welcome.

Building a hardware interrupt controller

I’ve recently been adding simple hardware devices to my home-built 6502 computer, and ran into a problem.

The 6502 has two active-low interrupt inputs (IRQ and NMI), both of which are used in my design. If I add any devices which can trigger their own interrupts, I will need a way to combine multiple interrupt signals into one.

Choosing an approach

I researched some retro computer designs to see how this problem was handled, and found some very simple approaches.

Most commonly, multiple I/O chips would be connected to the 6502’s IRQ line, which is level-triggered and active-low. As long as all connected chips had an open-drain IRQ output, they could be connected in a so-called wired-or configuration:

In software, each interrupt source would be checked in priority order.

I can’t use something so straightforward to add hardware to my design, for two main reasons:

  • I would need to use logic gates to combine the inputs instead. The I/O chip which is connected to the CPU’s IRQ input at the moment is a WDC 65C22S, which does not have an open-drain IRQ output.
  • It’s not going to be practical to check every I/O device from an interrupt service routine. I’m planning to add devices which are accessed over SPI, and will take many clock cycles to return their status.

Over in the PC world, programmable interrupt controllers such as the Intel 8259 were used. Among other features, these chips combine multiple interrupt sources into one, and can report the interrupt source on the data bus.

Rather than use out-of-production retro parts, I decided to program an ATF22V10 programmable logic device with these functions. A PLD is cheaper, and I’ve just figured out how to program them, so I might as well put those skills to use.

Creating the interrupt controller

I am using galette to program these PLD’s, and went through 5 revisions of the PLD source file before landing on something which could complete these functions.

  • combine multiple interrupt lines into one.
  • allow the CPU to quickly identify the highest-priority interrupt to service.

For this section, I’ll briefly describe each part of the PLD definition is doing.

The file starts with the name of the target device, and a name for the chip.

GAL22V10
IrqController

Next pin definitions are listed. I’m using the ATF22V10, which has 24 pins. The first row is pins 1-12, while the second row is pins 13-24. Numbers go left-to-right in both rows, unlike a physical microchip!

Clock    /IRQ0 /IRQ1 IRQ2  /IRQ3 /IRQ4 IRQ5  /IRQ6 /IRQ7 /IRQ8 /IRQ9  GND
/CS      D7    D6    D5    D4    D3    D2    D1    D0    /IRQ  /WE    VCC

For combining the interrupts, I simply use a big OR function.

IRQ = IRQ0 + IRQ1 + IRQ2 + IRQ3 + IRQ4 + IRQ5 + IRQ6 + IRQ7 + IRQ8 + IRQ9

This reads “IRQ is active if IRQ0 is active or IRQ1 is active or IRQ2 is active, etc.”. All logic is the positive case, so “IRQ0” means “IRQ0 is active”. Whether “active” means 1 or 0 at the input depends on those pin definitions. The active-low inputs are inverted before being fed to this expression, and the result will be inverted if the output is active-low. This confused me at first, because I thought that “/IRQ0” is just a pin name.

The expressions for the data bus come next.

  • These are all ‘registered’ outputs (indicated with the .R suffix). This runs the output through a D-type flip-flop, so that it will only change on clock transitions, rather than asynchronously. This is to avoid garbage output if an interrupt triggers while we are reading from the controller.
  • D1..D4 is a priority encoder, encoding the number 0 to 10 to identify the lowest-numbered interrupt source, or 11 if there is no interrupt.
  • D0 is always 0. This multiples the output by 2, which makes things easier for the software.
D0.R = GND
D1.R = IRQ1*/IRQ0 + IRQ3*/IRQ2*/IRQ1*/IRQ0 + IRQ5*/IRQ4*/IRQ3*/IRQ2*/IRQ1*/IRQ0 + IRQ7*/IRQ6*/IRQ5*/IRQ4*/IRQ3*/IRQ2*/IRQ1*/IRQ0 + IRQ9*/IRQ8*/IRQ7*/IRQ6*/IRQ5*/IRQ4*/IRQ3*/IRQ2*/IRQ1*/IRQ0
D2.R = IRQ2*/IRQ1*/IRQ0 + IRQ3*/IRQ2*/IRQ1*/IRQ0 + IRQ6*/IRQ5*/IRQ4*/IRQ3*/IRQ2*/IRQ1*/IRQ0 + IRQ7*/IRQ6*/IRQ5*/IRQ4*/IRQ3*/IRQ2*/IRQ1*/IRQ0 + /IRQ9*/IRQ8*/IRQ7*/IRQ6*/IRQ5*/IRQ4*/IRQ3*/IRQ2*/IRQ1*/IRQ0
D3.R = IRQ4*/IRQ3*/IRQ2*/IRQ1*/IRQ0 + IRQ5*/IRQ4*/IRQ3*/IRQ2*/IRQ1*/IRQ0 + IRQ6*/IRQ5*/IRQ4*/IRQ3*/IRQ2*/IRQ1*/IRQ0 + IRQ7*/IRQ6*/IRQ5*/IRQ4*/IRQ3*/IRQ2*/IRQ1*/IRQ0
D4.R = /IRQ7*/IRQ6*/IRQ5*/IRQ4*/IRQ3*/IRQ2*/IRQ1*/IRQ0
D5.R = GND
D6.R = GND
D7.R = GND

For the 22V10, not all pins support the same number of product terms. As I built up to 10 interrupt sources, I needed to re-arrange the pins a few times.

Lastly, I needed to tri-state the output when the chip was not being read. This is a second definition for each pin (.E suffix). I used both a chip select (/CS) and write enable (/WE) input. The interrupt controller cannot be written to, this just allows me to avoid bus contention if somebody is attempting to.

D0.E = CS*/WE
D1.E = CS*/WE
D2.E = CS*/WE
D3.E = CS*/WE
D4.E = CS*/WE
D5.E = CS*/WE
D6.E = CS*/WE
D7.E = CS*/WE

Test circuit and software

After testing everything on breadboard, I connected the new interrupt controller to my 6502 computer. All of the required signals are exposed via pin headers on my 6502 computer, more information about that can be found on previous blog posts.

This is how it looks. It’s not the neatest, but at least I don’t have the whole computer on breadboards anymore.

I then started writing some 6502 assembly code to test this out. I’ve been working on a shell which runs named commands, so I added a command called irqtest. This code references other parts of the ROM, which can be found in the GitHub repository for this project.

This code sets up a timer to trigger an interrupt on the 65C22 VIA, then uses the wai instruction to wait for interrupts. When the interrupt service routine completes, the code resumes. I’ve assigned two addresses (DEBUG_LAST_INTERRUPT_INDEX and DEBUG_INTERRUPT_COUNT) so that we can check that the correct interrupt routines are running.

; Set up a timer to trigger an IRQ
IRQ_CONTROLLER = $8C00

; Some values to help us debug
DEBUG_LAST_INTERRUPT_INDEX = $00
DEBUG_INTERRUPT_COUNT = $01

shell_irqtest_main:
  lda #$ff              ; set interrupt index to dummy value (so we can see if it's not being overridden)
  sta DEBUG_LAST_INTERRUPT_INDEX
  lda #$00              ; reset interrupt counter
  sta $01
  ; setup for via
  lda #%00000000        ; set ACR. first two bits = 00 is one-shot for T1
  sta VIA_ACR
  lda #%11000000        ; enable VIA interrupt for T1
  sta VIA_IER
  sei                   ; enable IRQ at CPU - normally off in this code
  ; set up a timer at ~65535 clock pulses.
  lda #$ff              ; set T1 low-order counter
  sta VIA_T1C_L
  lda #$ff              ; set T1 high-order counter
  sta VIA_T1C_H
  wai                   ; wait for interrupt
  ; reset for via
  cli                   ; disable IRQ at CPU - normally off in this code
  lda #%01000000        ; disable VIA interrupt for T1
  ; Print out which interrupt was used, should be 02 if irq1_isr ran
  lda DEBUG_LAST_INTERRUPT_INDEX
  jsr hex_print_byte
  jsr shell_newline
  ; print number of times interrupt ran, should be 01 if it only ran once
  lda DEBUG_INTERRUPT_COUNT
  jsr hex_print_byte
  jsr shell_newline
  lda #0
  jmp sys_exit

I set my interrupt service routine read from the interrupt controller, then jump to the correct routine.

irq:
  phx                       ; push x for later
  inc DEBUG_INTERRUPT_COUNT ; count how many times this runs..
  ldx IRQ_CONTROLLER        ; read interrupt controller to find highest-priority interrupt to service
  jmp (isr_jump_table, X)   ; jump to matching service routine

irq_return:
  plx                       ; restore x
  rti

The interrupt controller can return 11 possible values, so I made an 11-entry table, with the address for each interrupt handler (2 bytes each). I have connected the VIA to IRQ1, so I set the second entry to a subroutine called irq1_isr.

isr_jump_table:              ; 10 possible interrupt sources
.word nop_isr
.word irq1_isr
.word nop_isr
.word nop_isr
.word nop_isr
.word nop_isr
.word nop_isr
.word nop_isr
.word nop_isr
.word nop_isr
.word nop_isr               ; 11th option for when no source is triggering the interrupt (phantom interrupt)

Most of the interrupt routines map to nothing at all. If this were a real OS, an interrupt from an unknown source would need to be a fatal error, since it can’t be individually masked/ignored on this hardware.

nop_isr:                         ; interrupt routine for anything else
  stx DEBUG_LAST_INTERRUPT_INDEX ; store interrupt index for debugging
  jmp irq_return

When IRQ1 is triggered though, this routine will clear the interrupt on the VIA. If we fail to do this, then the CPU will become stuck, processing interrupts forever.

irq1_isr:                        ; interrupt routine for VIA
  stx DEBUG_LAST_INTERRUPT_INDEX ; store interrupt index for debugging
  ldx VIA_T1C_L                  ; clear IFR bit 6 on VIA (side-effect of reading T1 low-order counter)
  jmp irq_return

Once I fixed some bugs, I was able to get the expected output, which is 02 (the index we are using to jump into isr_jump_table), followed by 01 (the number of times the interrupt service routine runs).

While I was researching this, I also found that online 6502 assembly guides often suggest clearing the interrupt flag on I/O chips near the start of the interrupt service routine, apparently to avoid running the routine twice. From this test, I know that the interrupt service routine is only running once, so I am happily disregarding that advice. Thinking about it, I can only see this applying to open-drain IRQ outputs.

Note about KiCad symbols

Since my last blog post about PLD’s, I’ve started to create separate symbols in KiCad for each programmed chip. You can see one of these in the schematic above.

I’m producing these automatically. Galette has a .pin file as one of its outputs. The file for this chip is as follows:

 Pin # | Name     | Pin Type
-----------------------------
   1   | Clock    | Clock/Input
   2   | /IRQ0    | Input
   3   | /IRQ1    | Input
   4   | IRQ2     | Input
   5   | /IRQ3    | Input
   6   | /IRQ4    | Input
   7   | IRQ5     | Input
   8   | /IRQ6    | Input
   9   | /IRQ7    | Input
  10   | /IRQ8    | Input
  11   | /IRQ9    | Input
  12   | GND      | GND
  13   | /CS      | Input
  14   | D7       | Output
  15   | D6       | Output
  16   | D5       | Output
  17   | D4       | Output
  18   | D3       | Output
  19   | D2       | Output
  20   | D1       | Output
  21   | D0       | Output
  22   | /IRQ     | Output
  23   | /WE      | Input
  24   | VCC      | VCC

To produce schematic symbols, I wrote a small Python script to convert this text format to a CSV file, suitable for KiPart.

IRQ_Controller

Pin,Type,Name
1,input,Clock
2,input,~IRQ0
3,input,~IRQ1
4,input,IRQ2
5,input,~IRQ3
6,input,~IRQ4
7,input,IRQ5
8,input,~IRQ6
9,input,~IRQ7
10,input,~IRQ8
11,input,~IRQ9
12,power_in,GND
13,input,~CS
14,output,D7
15,output,D6
16,output,D5
17,output,D4
18,output,D3
19,output,D2
20,output,D1
21,output,D0
22,output,~IRQ
23,input,~WE
24,power_in,VCC

KiPart ships with profiles for FPGA chips, but the generic input worked fine for the ATF22V10 PLD.

./KiPart/venv/bin/kipart -r generic --overwrite irq_controller.csv

The output is a .lib file, which I can include in any KiCad project.

Wrap-up

This approach works, and introduces far less overhead compared with checking each device in the interrupt service routine. In particular, it will allow me to test interrupts from slow-to-access SPI devices.

I will be disconnecting this this for now, but may include it in an expansion board or updated computer design if I ever make one!

Re-creating the world’s worst sound card

I recently built a computer from scratch, and wanted to add some basic sound output. I am not attempting to produce any 8-bit music, but I would like to be able to write programs use beeps and clicks to provide feedback, instead of only text.

I did a bit of digging into the subject, and found quite possibly the world’s worst sound card, in the Apple II. This is my attempt to re-draw the relevant part of the schematic, which I found in a 1983 publication called The Apple II Circuit Description.

The linked book does a good job of explaining the details, but the important part for this blog post is simply that Z3 is an address decoding output. It is normally high, but is low for half a cycle each time that memory address $C030 is accessed. This is fed to a flip-flop, which will toggle each time this happens, driving a speaker.

This should be easy for me to implement, since I have spare address decoding outputs on my computer.

First attempt

I decided to use a piezo buzzer, which can be driven directly from an IC pin, simplifying things greatly.

I first connected the buzzer directly to one of the spare address decoding outputs of my computer.

This is so simple that it is almost not worth diagramming, but here goes:

The input is IO2, which works out to address $8800 in my computer’s memory map. I wrote a bit about in an earlier blog post.

From BASIC (yes, this computer runs BASIC), I can read from this address and produce a click.

A=PEEK($8800)

Second attempt

To produce a beep, I wanted a square wave with a 50% duty cycle. The Apple II used a 74LS74 dual D flip flop IC for this, which I don’t have in my inventory. Instead, I used a 74AC163 4-bit counter.

The circuit on the breadboard is wired up like this:

I was then able to produce a beeping sound by reading a memory address in a loop, just as you could on the Apple II.

10 A=PEEK($8800)
20 GOTO 10
RUN

Mission accomplished.

Alternatives and next steps

I’ve already started adding clicks and beeps to assembly code while testing, and my immediate plan is to add a separate startup beep for each of the two boot ROM’s. If I ever do a second revision of this computer board, I’ll most likely add a speaker as an on-board peripheral, using a 74LS74 instead of the counter.

There are two other methods which considered for adding beeps to this computer. The first was to utilise its 65C22 chip, which can be set to output a square wave on its PB7 pin. I have other plans for that output, which is the reason I haven’t used it for sound.

The second idea I looked into was sampled audio, which would require me to latch 8-bit audio samples a few thousand times per second, and then use some resistors to build a digital-to-analog converter. There are a few ways that I could achieve this with parts which I have on-hand, such as a SN74AC573 latch, or the 65C22.

6502 computer – from breadboard to PCB

I’ve recently been working on building a 6502-based computer on breadboards as a learning project. After making a few revisions, and porting some software to run on it, I was confident that my computer worked well enough to connect up permanently on a circuit board.

This blog post about the process that I went through to convert my working breadboard prototype to a PCB. As somebody who is not trained in electronics, this involved some fresh challenges for me.

Schematic capture

As a first step, I needed to draw up a proper schematic in a CAD tool. I’ve already been learning to use KiCad’s Eeschema for schematic capture, so I went ahead and drew up the whole thing.

I decided to put all of the components on one crowded page, just because I haven’t learned to manage multiple pages yet.

Mostly, this was just replicating what I’d already built, but I also needed to:

  • create a symbol for the DS-1813, which I could not find in the libraries I’m using
  • decide on a pinout for expansion ports
  • add a jumper block, so that the two interrupt lines (NMI and IRQ) can be disconnected from the on-board chips

I avoided changing the scope of the board to include important features (storage, audio), and instead added a header with all of the important buses and signals. My goal is to have a stable base system to work with, and I can always do a second board once I’ve figured out how these should work.

I did not use KiCad’s inbuilt BOM plugins, but instead manually wrote a parts list for future reference. The only components which I needed to order were sockets and pin headers, everything else was being lifted from the prototype.

Footprint assignment

Once the Eeschema electrical rules check was passing, I moved on to selecting footprints for everything. I spent quite a bit of time checking the datasheets for my planned components against the KiCad library to choose footprints that would fit.

I could not find anything which matched the footprint of the mini SPDT switches that I’m using, so I made my own.

I had not yet chosen a ZIF socket for the EEPROM, so I selected a footprint for a standard socket instead. This was a risky move, because I had to guess how much space to leave when laying out the board.

PCB design

I had already selected a board manufacturer, so I set up my track widths and via sizes according to their capabilities.

Next, I imported the netlist and placed the components. I attempted to fit everything in 10cm x 10cm, 2-layer board to save on manufacturing costs, but it was very crowded. I was worried about having enough space for traces, fitting an (unmeasured) ZIF socket, and adding some mounting holes, so I spaced it out to fit on a 10cm x 11.6cm board instead.

I also added footprints for M3-sized mounting holes, and edge cuts with rounded corners.

It took me 4 attempts to successfully route all of the traces. The result breaks every PCB layout best practice which I had read about, but I was happy just to have everything connected by that point.

In hindsight, I could have made this task easier by splitting up the expansion header, and building an accurate footprint for a ZIF socket. I also could have placed the decoupling capacitors closer to the IC power pins, and avoided interrupting the ground plane.

The silkscreen setup was more time-consuming than expected. I wanted to make the board as self-documenting as possible, so that I would not need to open the design files on a computer when writing code and adding hardware peripherals. I labelled every switch, button, light and IC, as well as every expansion header pin (74 of them!). I also added some simple assembly hints such as resistor values, and +/- signs to indicate the polarity of the lone electrolytic capacitor.

The 3D render view in KiCad shows how the finished board would look.

Components did not display on the board at first, but there is a menu option to download the 3D models. After setting this up, and setting the solder mask colour to black, I was able to render the board properly. This looks very similar to the assembled board, with just a few parts missing.

Manufacturing

I only recently discovered that low-volume PCB manufacturing is accessible to hobbyists. Ordering the boards was really easy. I exported gerber and drill files according to their instructions, loaded them into a zip file, and uploaded them to a web portal.

I am glad that an online gerber viewer was available, because it allowed me to spot an error which I had not noticed in KiCad, where I had two pins labelled PA1.

The default settings are reasonable, but I chose black solder mask, a lead-free surface finish, and chose the option to exclude the manufacturer’s order number from the board.

The gerber viewer also has a tab which shows checks against their manufacturing rules.

I placed the order, and eagerly checked each day as the boards progressed through the manufacturing steps. From placing the order to getting the boards on my desk, the whole process took 7 days.

Assembly & test

Once all of the sockets arrived in the mail, I started assembling.

The standard advice I’ve read for through-hole assembly is to solder low-profile components first. Instead, I added enough components to light up the power LED.

I know that the power LED dims if anything is drawing too much current (eg. a short circuit). By getting this working first, I will know I’ve made a mistake (and can switch off the board quickly) if it doesn’t light up later.

I don’t think I’ve ever soldered more than a few pins or wires at a time, and this board had 323 pins to solder.

After a about 2 hours, I had a fully-populated board. It’s smaller and looks better than the breadboards, but it was time to move all the chips across to see if it actually worked.

Much to my surprise, the computer booted up to EhBASIC on the first attempt.

Wrap-up

This has been quite a fun project. It has proven to me that open source tools are perfectly sufficient for this type of hardware development. I’m also very happy to have reached a point where can call this project “done”, since I have a permanent home-built computer.

My plans for learning about low-level computing are not done yet, of course. I’ve kept everything at a low 1.8 MHz, so that I can continue to prototype hardware peripherals on a breadboard. I’ve also left a switch to select which half of ROM to boot from, so that I don’t brick my computer every time I make an error in an assembly language program.

I’ve uploaded the full parts list, design files, and firmware to GitHub at mike42/6502-computer.