Implementing the XMODEM protocol for file transfer

I recently built a 6502-based computer from scratch, and I’m using it as a platform for testing 6502 assembly code. The software on this computer is currently very limited, and re-programming an EEPROM chip to test changes is a slow process.

This blog post covers changes which I made to implement the XMODEM protocol, allowing me to upload programs over a serial connection.

The protocol itself

I chose XMODEM because it’s the simplest protocol which is compatible with modern terminal software. All I need right now is the ability to place some data (machine code) into RAM, so that I can execute it.

An important note that is that modern file transfer protocols would normally run over TCP/IP, while XMODEM runs over a serial connection. This works for me, since it’s the only interface I’ve got.

For what it’s worth, XMODEM is also era-appropriate technology for this retro computer, since it was first implemented in 1977. The 6502 processor came out in 1975.

Implementation

Implementing XMODEM was the easy part of this process. There are existing 6502 assembly implementations which I could have used, but I decided to work from the description on the Wikipedia article instead. This is partly to make sure I understand it, and partly to keep the software licensing of my ROM as clear as possible.

I have the beginnings of a shell for running commands from ROM, so I implemented two new commands:

  • rx, to receive a file over XMODEM and load it into memory.
  • run to jump to the program

To start the transfer, I found it useful to write a method for reading a character with a time-out. I already have routines for reading and writing characters from serial.

; Like acia_recv_char, but terminates after a short time if nothing is received
shell_rx_receive_with_timeout:
  ldy #$ff
@y_loop:
  ldx #$ff
@x_loop:
  lda ACIA_STATUS              ; check ACIA status in inner loop
  and #$08                     ; mask rx buffer status flag
  bne @rx_got_char
  dex
  cpx #0
  bne @x_loop
  dey
  cpy #0
  bne @y_loop
  clc                          ; no byte received in time
  rts
@rx_got_char:
  lda ACIA_RX                  ; get byte from ACIA data port
  sec                          ; set carry bit
  rts

The entire receive process is around 60 lines of assembly code. The process involves ASCII NAK character repeatedly until a SOH character is received, which indicates the start of a 128 byte packet. All padding around each packet is discarded, and the payload is written to memory. This process is repeated until the end of transmission (EOT) is received.

; Built-in command: rx
; recives a file over XMODEM protocol, and loads it into memory
USER_PROGRAM_START = $0400   ; Address for start of user programs
USER_PROGRAM_WRITE_PTR = $00 ; ZP address for writing user program

shell_rx_main:
  ; Set pointers
  lda #<USER_PROGRAM_START   ; Low byte first
  sta USER_PROGRAM_WRITE_PTR
  lda #>USER_PROGRAM_START   ; High byte next
  sta USER_PROGRAM_WRITE_PTR + 1
  ; Delay so that we can set up file send
  lda #1                     ; wait ~1 second
  jsr shell_rx_sleep_seconds
  ; NAK, ACK once
@shell_block_nak:
  lda #$15                   ; NAK gets started
  jsr acia_print_char
  lda SPEAKER                ; Click each time we send a NAK or ACK
  jsr shell_rx_receive_with_timeout  ; Check in loop w/ timeout
  bcc @shell_block_nak       ; Not received yet
  cmp #$01                   ; If we do have char, should be SOH
  bne @shell_rx_fail         ; Terminate transfer if we don't get SOH
@shell_rx_block:
  ; Receive one block
  jsr acia_recv_char         ; Block number
  jsr acia_recv_char         ; Inverse block number
  ldy #0                     ; Start at char 0
@shell_rx_char:
  jsr acia_recv_char
  sta (USER_PROGRAM_WRITE_PTR), Y
  iny
  cpy #128
  bne @shell_rx_char
  jsr acia_recv_char         ; Checksum - TODO verify this and jump to shell_block_nak to repeat if not mathing
  lda #$06                   ; ACK the packet
  jsr acia_print_char
  lda SPEAKER                ; Click each time we send a NAK or ACK
  jsr acia_recv_char
  cmp #$04                   ; EOT char, no more blocks
  beq @shell_rx_done
  cmp #$01                   ; SOH char, next block on the way
  bne @shell_block_nak       ; Anything else fail transfer
  lda USER_PROGRAM_WRITE_PTR ; This next part moves write pointer along by 128 bytes
  cmp #$00
  beq @block_half_advance
  lda #$00                   ; If low byte != 0, set to 0 and inc high byte
  sta USER_PROGRAM_WRITE_PTR
  inc USER_PROGRAM_WRITE_PTR + 1
  jmp @shell_rx_block
@block_half_advance:         ; If low byte = 0, set it to 128
  lda #$80
  sta USER_PROGRAM_WRITE_PTR
  jmp @shell_rx_block
@shell_rx_done:
  lda #$6                    ; ACK the EOT as well.
  jsr acia_print_char
  lda SPEAKER                ; Click each time we send a NAK or ACK
  lda #1                     ; wait a moment (printing does not work otherwise..)
  jsr shell_rx_sleep_seconds
  jsr shell_rx_print_user_program
  jsr shell_newline
  lda #0
  jmp sys_exit
@shell_rx_fail:
  lda #1
  jmp sys_exit

This code is the first time that I’ve used the “Zero Page Indirect Indexed with Y” address mode. It uses a pointer to the program’s location, which is incremented 128 each time a packet is accepted.

The index in the current packet is stored in the Y register, allowing each incoming byte to be written to memory with one operation, which is called in a tight loop.

sta (USER_PROGRAM_WRITE_PTR), Y

The other routines mentioned in this source listing are:

  • shell_rx_sleep_seconds – sleeps for the number of seconds in the A register.
  • shell_rx_print_user_program – prints the first 256 bytes of the program for verification.

The checksum is entirely ignored. Everything here is running on my desk, so I’m not too worried about transmission errors.

After a few iterations, I was able to upload placeholder text to RAM.

The screen capture shows 256 bytes of memory (2 packets in XMODEM). The ASCII SUB character (1a in hex) fills out unused bytes at the end, since everything sent over XMODEM needs to be a multiple of 128 bytes.

Assembling some programs

I thought I was almost done at this point, but it turns out that writing a “Hello World” test program to run from RAM is easier said than done.

I started by creating a new ld65 linker configuration which links code to run from address $0400 onwards, which is where programs are loaded into RAM. This system does not have a loader for re-locatable code, and absolute addresses are assigned by the linker.

This is the configuration which I used for these tests. It has some flaws, but it’s good enough to place the CODE segment in the right place.

# Test configuration for user programs
MEMORY {
    ZP:     start = $00,    size = $0100, type = rw, file = "";
    RAM:    start = $0100,  size = $0300, type = rw, file = "";
    MAIN:   start = $0400,  size = $1000, type = rw, file = %O;
    ROM:    start = $C000,  size = $4000, type = ro, file = "";
}

SEGMENTS {
    ZEROPAGE: load = ZP,  type = zp;
    BSS:      load = RAM, type = bss;
    CODE:     load = MAIN, type = ro;
    VECTORS:  load = ROM, type = ro,  start = $FFFA, optional=yes;
}

If I wanted my test program to terminate, then I needed to find the address of sys_exit in ROM. This routine hands control back to the shell after a command completes.

To get this, I re-assembled the ROM, but added the flag --debug-info to the ca65 assembler, and --dbgfile boot.dbg to the ld65 linker. This produces a text file, which contains the final address of each symbol.

sym id=45,name="sys_exit",addrsize=absolute,scope=0,def=28,ref=298+192+426+390+157,val=0xC11E,seg=0,type=lab

The simplest test program I could come up with was:

; location in ROM
sys_exit = $c11e

.segment "CODE"
main:
  jmp sys_exit

I was able to assemble this program to 3 bytes of binary and run it. It does absolutely nothing, and simply returns to the shell.

To extend this into a “Hello World” program, I needed to look up the location of two more routines in the ROM.

; locations of some functions in ROM
acia_print_char = $c020
shell_newline   = $c113
sys_exit        = $c11e

.segment "CODE"
main:
  ldx #0
@test_char:
  lda test_string, X
  beq @test_done
  jsr acia_print_char
  inx
  jmp @test_char
@test_done:
  jsr shell_newline
  lda #0
  jmp sys_exit

test_string: .asciiz "Test program"

This animation shows the process of uploading the binary and executing it via minicom.

There is nothing special about the invocation for minicom here, it’s just a bit-rate and device name, which in my case is:

minicom -b 19200 -D /dev/ttyUSB0

Automating the boring stuff

This approach is very fragile, because the code is full of hard-coded memory addresses.

Any time I change the ROM, the addresses for these routines could change, and this program will no longer work.

On any real architecture, applications call into the OS via system calls, or maybe a jump table for 8-bit systems. I don’t need a stable binary interface, and there is no userspace/kernel distinction, so I decided not to go down that path.

Instead, I wrote a Python script to generate an assembly source file, using the same debug symbols which I was manually reading. This runs after each ROM build.

The format is:

...
.export VIA_T2C_H                        = $8009
.export VIA_T2C_L                        = $8008
.export acia_delay                       := $c02b
.export acia_print_char                  := $c014
...

When this file is assembled, it doesn’t generate any code, but it does define all of the constants and labels from the ROM. When applications link against this, they can .import anything which would be available to ROM-based code.

The start of the above script is re-written as:

.import acia_print_char, shell_newline, sys_exit

This means that I have source compatibility, just not binary compatibility.

Bringing everything together

With my initial goal achieved, I was still not happy with the development experience. I started to dig into the murky world of minicom scripting to see if I could automate the process of uploading/running programs, and found that it is indeed possible.

The scripting capabilities of minicom do not appear to be widely used, and I needed to read through the source code to find ways to work around the problems I encountered.

For example, the “send” command was sending characters too quickly for the target machine. There are hard-coded delays between commands in the interpreter, but you can’t use multiple “send” commands to slow things down, because it is also hard-coded to append a line ending each time.

The fix was to use !<, which sends the output of a command verbatim. This script uses echo -n command to send each character individually. This is the exact same procedure as the animation in the previous section. First, the script types “rx”, then sends the file, then types “run”.

# confirm we have a shell prompt
send ""
sleep 0.1
# put into receive mode
expect "# "
sleep 0.1
!< echo -n 'r'
!< echo -n 'x'
send ""
sleep 0.1
# send the program and wait for prompt
! sx -q $TEST_PROGRAM 2> /dev/null
expect "# "
# run program
sleep 0.1
!< echo -n 'r'
!< echo -n 'u'
!< echo -n 'n'
send ""
sleep 0.1
# Terminate minicom when program completes (bit crazy..)
expect "# "
sleep 0.1
! killall -9 minicom

It’s also worth talking about the end of this script. When minicom scripts end, the user is returned to an interactive session. I wanted the minicom process to terminate without clearing the screen, so I run killall -9 minicom as the last line of the script.

This is not the safest thing to include in a script, but it’s certainly effective, and I am quite certain that there is no built-in way to achieve this with the current version of minicom. Realistic alternatives include patching minicom, switching to different terminal software.

The minicom invocation includes the name of the minicom script to execute, and the script will use the TEST_PROGRAM environment variable to name the binary file to run.

TEST_PROGRAM=test.bin minicom -b 19200 -S minicom_autorun.txt -D /dev/ttyUSB0

When running this command, the program executes on the 6502 computer seamlessly, though minicom leaves the terminal in a broken state when it is killed, and bash prints the “Killed” text to the screen.

To work around this, I wrapped the minicom invocation in a shell script which builds the program, suppresses all errors, and always returns 0. This avoids constant reporting that killall -9 is a problem, but it does hide any other problems.

#!/bin/bash
set -eu -o pipefail
make $1.bin
(TEST_PROGRAM=$1.bin minicom -b 19200 -S minicom_autorun.txt -D /dev/ttyUSB0 || true) 2> /dev/null

This script is invoked as:

./assemble_and_run.sh test

A few months back, I wrote an IntelliJ plugin for 6502 assembly, and I’m writing all of my code in PyCharm. The last step was to get this running from my IDE.

I added a build configuration which invokes the assemble_and_run.sh script. The script is completely generic, and I can use it to run other programs by changing the argument.

I can now write a program, click run, and watch it execute on real hardware, all without leaving my IDE. It took a lot of effort, but this is a huge improvement from where I started.

Wrap-up

A fast compile/test/run cycle is a huge gain for developer productivity on any system. I can now get fast, accurate feedback on whether my 6502 assembly programs work on real hardware. The first program I’m writing involves interfacing with an SD card, which will be a lot more approachable with this improvement.

I was surprised at how tricky it was to assemble and link a small program to run in RAM instead of ROM, and I can see that my linker configuration will need to be improved as I write programs which use more data. My basic plan is to test features by uploading them over XMODEM, then add them to the ROM when they are working.

The full source code for this project, which includes hardware and software, can be found on GitHub at mike42/6502-computer.

Designing a 3D printed enclosure for my KiCad project in Blender

I recently built a small 6502-based computer on a custom PCB, which I designed in KiCad. This blog post is about the process of building a 3D-printed case to house this project, using Blender.

This is my first ever 3D print, so I had a few things to learn.

Quick background

My effort on this project has been focused on making a design which works, then transferring that design to a PCB. Since this is my first electronics project, I’m working sequentially, and have completely ignored the need for a case until now.

I could have saved some time if I positioned the ports and mounting holes to match pre-made electronics enclosures, but I’ll take that as a lesson for my next project.

The requirements for the enclosure are simple:

  • Ability to mount the PCB inside the case
  • Cut-out for power input
  • Cut-outs for cable routing to expansion ports
  • Cut-outs for power and reset buttons

My PCB design files are in KiCad, and I decided to design the case in Blender, since I already know how to use it. Blender is not a CAD tool, but it is excellent for general-purpose 3D work, and will do just fine for this task.

Getting scale

My first challenge was to get the PCB into Blender at the correct scale, so that I could build a model around it. This is mostly just juggling file formats, but also note that Blender works in metres by default. I’ve set it to millimetres, with a scale of 0.001. There are guides on the web about how to set this.

I tried two different methods.

Method 1: Image import

In KiCad, I added two “dimensions” to show the distance between the centre of the mounting holes, then plotted the front copper layer as an SVG.

I used Inkscape to convert from SVG to PNG. I then used GIMP to add cross-hairs to the centre of the mounting holes.

In Blender, I added a single vertex at the origin, then extruded it on the X axis to match the measured dimension. I then added a reference image (the PNG file exported above), and scaled/moved the reference image until the cross-hairs lined up with the vertex.

Method 2: Model import

KiCad can export its 3D model to a STEP file. I imported this file into FreeCAD, then exported it to STL.

Lastly, I imported the STL into Blender.

The scale is correct with both of these methods, because I can align the image and model.

The second method produces much larger .blend files, but has less room to make undetected errors, so I’ll be using it in future.

Building a box

I modeled out the case as a lid and base with 3mm walls, and used boolean modifiers to cut out spaces for buttons and cables.

One challenge was rounding the corners without the two pieces intersecting. I built the box as a square, then beveled it in wire-frame view. The spaces are larger than necessary, and I will try using modifiers slot the pieces together with a fixed tolerance if I need to do this again.

The result was two pieces which sit together but do not attach, with rounded corners and no sharp overhangs, for ease of manufacturing.

Manufacturing and test

I exported the STL files, and sent them to a local manufacturer, since I don’t have my own 3D printer. The print is black PLA, printed with fused deposition modeling, with an 0.2mm layer height and 30% infill.

When I received the parts, I first checked that the two pieces fitted together, then placed a blank PCB over the base to check that the holes lined up. PLA shrinks as it cools, but this aligned perfectly, which is either good luck or correct calibration.

The computer’s board will be installed on standoffs, which are secured by a nut on the other side. I like the idea of using heat-set inserts instead, but I don’t want to try too many new things at once, so I’ve skipped the idea for now.

Next steps

At the time of writing, I’m still waiting for some parts to assemble this computer with a power/reset button, which will be the end of the hardware side of this project.

I’m quite happy with how this turned out as my first ever 3D print. If I need to make enclosures for future projects, I’ll take another look at open source parametric CAD tools such as OpenSCAD and FreeCAD, which are more targeted to this kind of work.

The STL files for this case are available on GitHub at mike42/6502-computer, critique is welcome.

Building a hardware interrupt controller

I’ve recently been adding simple hardware devices to my home-built 6502 computer, and ran into a problem.

The 6502 has two active-low interrupt inputs (IRQ and NMI), both of which are used in my design. If I add any devices which can trigger their own interrupts, I will need a way to combine multiple interrupt signals into one.

Choosing an approach

I researched some retro computer designs to see how this problem was handled, and found some very simple approaches.

Most commonly, multiple I/O chips would be connected to the 6502’s IRQ line, which is level-triggered and active-low. As long as all connected chips had an open-drain IRQ output, they could be connected in a so-called wired-or configuration:

In software, each interrupt source would be checked in priority order.

I can’t use something so straightforward to add hardware to my design, for two main reasons:

  • I would need to use logic gates to combine the inputs instead. The I/O chip which is connected to the CPU’s IRQ input at the moment is a WDC 65C22S, which does not have an open-drain IRQ output.
  • It’s not going to be practical to check every I/O device from an interrupt service routine. I’m planning to add devices which are accessed over SPI, and will take many clock cycles to return their status.

Over in the PC world, programmable interrupt controllers such as the Intel 8259 were used. Among other features, these chips combine multiple interrupt sources into one, and can report the interrupt source on the data bus.

Rather than use out-of-production retro parts, I decided to program an ATF22V10 programmable logic device with these functions. A PLD is cheaper, and I’ve just figured out how to program them, so I might as well put those skills to use.

Creating the interrupt controller

I am using galette to program these PLD’s, and went through 5 revisions of the PLD source file before landing on something which could complete these functions.

  • combine multiple interrupt lines into one.
  • allow the CPU to quickly identify the highest-priority interrupt to service.

For this section, I’ll briefly describe each part of the PLD definition is doing.

The file starts with the name of the target device, and a name for the chip.

GAL22V10
IrqController

Next pin definitions are listed. I’m using the ATF22V10, which has 24 pins. The first row is pins 1-12, while the second row is pins 13-24. Numbers go left-to-right in both rows, unlike a physical microchip!

Clock    /IRQ0 /IRQ1 IRQ2  /IRQ3 /IRQ4 IRQ5  /IRQ6 /IRQ7 /IRQ8 /IRQ9  GND
/CS      D7    D6    D5    D4    D3    D2    D1    D0    /IRQ  /WE    VCC

For combining the interrupts, I simply use a big OR function.

IRQ = IRQ0 + IRQ1 + IRQ2 + IRQ3 + IRQ4 + IRQ5 + IRQ6 + IRQ7 + IRQ8 + IRQ9

This reads “IRQ is active if IRQ0 is active or IRQ1 is active or IRQ2 is active, etc.”. All logic is the positive case, so “IRQ0” means “IRQ0 is active”. Whether “active” means 1 or 0 at the input depends on those pin definitions. The active-low inputs are inverted before being fed to this expression, and the result will be inverted if the output is active-low. This confused me at first, because I thought that “/IRQ0” is just a pin name.

The expressions for the data bus come next.

  • These are all ‘registered’ outputs (indicated with the .R suffix). This runs the output through a D-type flip-flop, so that it will only change on clock transitions, rather than asynchronously. This is to avoid garbage output if an interrupt triggers while we are reading from the controller.
  • D1..D4 is a priority encoder, encoding the number 0 to 10 to identify the lowest-numbered interrupt source, or 11 if there is no interrupt.
  • D0 is always 0. This multiples the output by 2, which makes things easier for the software.
D0.R = GND
D1.R = IRQ1*/IRQ0 + IRQ3*/IRQ2*/IRQ1*/IRQ0 + IRQ5*/IRQ4*/IRQ3*/IRQ2*/IRQ1*/IRQ0 + IRQ7*/IRQ6*/IRQ5*/IRQ4*/IRQ3*/IRQ2*/IRQ1*/IRQ0 + IRQ9*/IRQ8*/IRQ7*/IRQ6*/IRQ5*/IRQ4*/IRQ3*/IRQ2*/IRQ1*/IRQ0
D2.R = IRQ2*/IRQ1*/IRQ0 + IRQ3*/IRQ2*/IRQ1*/IRQ0 + IRQ6*/IRQ5*/IRQ4*/IRQ3*/IRQ2*/IRQ1*/IRQ0 + IRQ7*/IRQ6*/IRQ5*/IRQ4*/IRQ3*/IRQ2*/IRQ1*/IRQ0 + /IRQ9*/IRQ8*/IRQ7*/IRQ6*/IRQ5*/IRQ4*/IRQ3*/IRQ2*/IRQ1*/IRQ0
D3.R = IRQ4*/IRQ3*/IRQ2*/IRQ1*/IRQ0 + IRQ5*/IRQ4*/IRQ3*/IRQ2*/IRQ1*/IRQ0 + IRQ6*/IRQ5*/IRQ4*/IRQ3*/IRQ2*/IRQ1*/IRQ0 + IRQ7*/IRQ6*/IRQ5*/IRQ4*/IRQ3*/IRQ2*/IRQ1*/IRQ0
D4.R = /IRQ7*/IRQ6*/IRQ5*/IRQ4*/IRQ3*/IRQ2*/IRQ1*/IRQ0
D5.R = GND
D6.R = GND
D7.R = GND

For the 22V10, not all pins support the same number of product terms. As I built up to 10 interrupt sources, I needed to re-arrange the pins a few times.

Lastly, I needed to tri-state the output when the chip was not being read. This is a second definition for each pin (.E suffix). I used both a chip select (/CS) and write enable (/WE) input. The interrupt controller cannot be written to, this just allows me to avoid bus contention if somebody is attempting to.

D0.E = CS*/WE
D1.E = CS*/WE
D2.E = CS*/WE
D3.E = CS*/WE
D4.E = CS*/WE
D5.E = CS*/WE
D6.E = CS*/WE
D7.E = CS*/WE

Test circuit and software

After testing everything on breadboard, I connected the new interrupt controller to my 6502 computer. All of the required signals are exposed via pin headers on my 6502 computer, more information about that can be found on previous blog posts.

This is how it looks. It’s not the neatest, but at least I don’t have the whole computer on breadboards anymore.

I then started writing some 6502 assembly code to test this out. I’ve been working on a shell which runs named commands, so I added a command called irqtest. This code references other parts of the ROM, which can be found in the GitHub repository for this project.

This code sets up a timer to trigger an interrupt on the 65C22 VIA, then uses the wai instruction to wait for interrupts. When the interrupt service routine completes, the code resumes. I’ve assigned two addresses (DEBUG_LAST_INTERRUPT_INDEX and DEBUG_INTERRUPT_COUNT) so that we can check that the correct interrupt routines are running.

; Set up a timer to trigger an IRQ
IRQ_CONTROLLER = $8C00

; Some values to help us debug
DEBUG_LAST_INTERRUPT_INDEX = $00
DEBUG_INTERRUPT_COUNT = $01

shell_irqtest_main:
  lda #$ff              ; set interrupt index to dummy value (so we can see if it's not being overridden)
  sta DEBUG_LAST_INTERRUPT_INDEX
  lda #$00              ; reset interrupt counter
  sta $01
  ; setup for via
  lda #%00000000        ; set ACR. first two bits = 00 is one-shot for T1
  sta VIA_ACR
  lda #%11000000        ; enable VIA interrupt for T1
  sta VIA_IER
  sei                   ; enable IRQ at CPU - normally off in this code
  ; set up a timer at ~65535 clock pulses.
  lda #$ff              ; set T1 low-order counter
  sta VIA_T1C_L
  lda #$ff              ; set T1 high-order counter
  sta VIA_T1C_H
  wai                   ; wait for interrupt
  ; reset for via
  cli                   ; disable IRQ at CPU - normally off in this code
  lda #%01000000        ; disable VIA interrupt for T1
  ; Print out which interrupt was used, should be 02 if irq1_isr ran
  lda DEBUG_LAST_INTERRUPT_INDEX
  jsr hex_print_byte
  jsr shell_newline
  ; print number of times interrupt ran, should be 01 if it only ran once
  lda DEBUG_INTERRUPT_COUNT
  jsr hex_print_byte
  jsr shell_newline
  lda #0
  jmp sys_exit

I set my interrupt service routine read from the interrupt controller, then jump to the correct routine.

irq:
  phx                       ; push x for later
  inc DEBUG_INTERRUPT_COUNT ; count how many times this runs..
  ldx IRQ_CONTROLLER        ; read interrupt controller to find highest-priority interrupt to service
  jmp (isr_jump_table, X)   ; jump to matching service routine

irq_return:
  plx                       ; restore x
  rti

The interrupt controller can return 11 possible values, so I made an 11-entry table, with the address for each interrupt handler (2 bytes each). I have connected the VIA to IRQ1, so I set the second entry to a subroutine called irq1_isr.

isr_jump_table:              ; 10 possible interrupt sources
.word nop_isr
.word irq1_isr
.word nop_isr
.word nop_isr
.word nop_isr
.word nop_isr
.word nop_isr
.word nop_isr
.word nop_isr
.word nop_isr
.word nop_isr               ; 11th option for when no source is triggering the interrupt (phantom interrupt)

Most of the interrupt routines map to nothing at all. If this were a real OS, an interrupt from an unknown source would need to be a fatal error, since it can’t be individually masked/ignored on this hardware.

nop_isr:                         ; interrupt routine for anything else
  stx DEBUG_LAST_INTERRUPT_INDEX ; store interrupt index for debugging
  jmp irq_return

When IRQ1 is triggered though, this routine will clear the interrupt on the VIA. If we fail to do this, then the CPU will become stuck, processing interrupts forever.

irq1_isr:                        ; interrupt routine for VIA
  stx DEBUG_LAST_INTERRUPT_INDEX ; store interrupt index for debugging
  ldx VIA_T1C_L                  ; clear IFR bit 6 on VIA (side-effect of reading T1 low-order counter)
  jmp irq_return

Once I fixed some bugs, I was able to get the expected output, which is 02 (the index we are using to jump into isr_jump_table), followed by 01 (the number of times the interrupt service routine runs).

While I was researching this, I also found that online 6502 assembly guides often suggest clearing the interrupt flag on I/O chips near the start of the interrupt service routine, apparently to avoid running the routine twice. From this test, I know that the interrupt service routine is only running once, so I am happily disregarding that advice. Thinking about it, I can only see this applying to open-drain IRQ outputs.

Note about KiCad symbols

Since my last blog post about PLD’s, I’ve started to create separate symbols in KiCad for each programmed chip. You can see one of these in the schematic above.

I’m producing these automatically. Galette has a .pin file as one of its outputs. The file for this chip is as follows:

 Pin # | Name     | Pin Type
-----------------------------
   1   | Clock    | Clock/Input
   2   | /IRQ0    | Input
   3   | /IRQ1    | Input
   4   | IRQ2     | Input
   5   | /IRQ3    | Input
   6   | /IRQ4    | Input
   7   | IRQ5     | Input
   8   | /IRQ6    | Input
   9   | /IRQ7    | Input
  10   | /IRQ8    | Input
  11   | /IRQ9    | Input
  12   | GND      | GND
  13   | /CS      | Input
  14   | D7       | Output
  15   | D6       | Output
  16   | D5       | Output
  17   | D4       | Output
  18   | D3       | Output
  19   | D2       | Output
  20   | D1       | Output
  21   | D0       | Output
  22   | /IRQ     | Output
  23   | /WE      | Input
  24   | VCC      | VCC

To produce schematic symbols, I wrote a small Python script to convert this text format to a CSV file, suitable for KiPart.

IRQ_Controller

Pin,Type,Name
1,input,Clock
2,input,~IRQ0
3,input,~IRQ1
4,input,IRQ2
5,input,~IRQ3
6,input,~IRQ4
7,input,IRQ5
8,input,~IRQ6
9,input,~IRQ7
10,input,~IRQ8
11,input,~IRQ9
12,power_in,GND
13,input,~CS
14,output,D7
15,output,D6
16,output,D5
17,output,D4
18,output,D3
19,output,D2
20,output,D1
21,output,D0
22,output,~IRQ
23,input,~WE
24,power_in,VCC

KiPart ships with profiles for FPGA chips, but the generic input worked fine for the ATF22V10 PLD.

./KiPart/venv/bin/kipart -r generic --overwrite irq_controller.csv

The output is a .lib file, which I can include in any KiCad project.

Wrap-up

This approach works, and introduces far less overhead compared with checking each device in the interrupt service routine. In particular, it will allow me to test interrupts from slow-to-access SPI devices.

I will be disconnecting this this for now, but may include it in an expansion board or updated computer design if I ever make one!

Programming PLD’s with open source software

A few weeks ago, I blogged about my setup for programming PLD’s from Linux, which are the simpler ancestors of modern FPGA’s.

The device I’m using is the ATF22V10, and programming it in involved running decades-old proprietary software called WinCUPL under WINE. I recently found an opportinity to test-run an open-source alternative, galette, and this blog post is a few notes about how it went.

Re-visiting my assumptions

The Atmel ATF22V10 is pin-compatible with the Lattice GAL22V10, which was discontinued over 10 years ago. There is a lot more information and software available for Lattice GAL’s, which I assume had the mind-share back when these devices were relevant.

There are some slight differences between the Atmel and Lattice devices. Some programming hardware works with one but not the other, and I had assumed that the fuse map (JED file) would be incompatible as well. This turned out to be incorrect.

A few statements online made me look at this again “You can also write Lattice GAL maps to the Atmel ATF16V8 and 22V10 parts” (Andrew B on retrobrewcomputers.org), and “The most suitable replacement for the GAL22V10 from Atmel is the ATF22V10C/CQ/CQZ. They are pin-to-pin and JEDEC fusemap compatible.” (GAL22V10 replacement on the Microchip knowledgebase).

The open source tool galette can create JED files for the Lattice GAL22V10, so I decided to try to write its output to an ATF22V10 to see for myself.

Detour: Installing a Rust toolchain

Galette is written in Rust, and available only as source code at the time of writing. The instructions for setting up Rust are online here.

apt install curl
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Then to compile any Rust project is the same process:

$ cargo build

Finding an example JED file

Next I tried running the tests, which did not pass

$ ./run_tests.sh

I found that this was simply the assembler version being different, plus a trailing number (maybe a checksum?).

diff -ru baseline/vcc.jed test_tmp/vcc.jed
--- baseline/vcc.jed    2021-08-19 20:16:20.526847059 +1000
+++ test_tmp/vcc.jed    2021-08-19 20:26:00.367537520 +1000
@@ -1,5 +1,5 @@

-GAL-Assembler:  Galette 0.2.0
+GAL-Assembler:  Galette 0.3.0
 Device:         GAL16V8

 *F0
@@ -25,4 +25,4 @@
 *L2193 0
 *C3c8f
 *
-924f
+9250

I then took one of the test outputs, and tried writing it to a device. As detailed in my earlier blog post, I’m using minipro, with a TL866II+ programmer, which could only program these devices after a firmware update.

$ minipro -p ATF22V10CQZ -w baseline/GAL22V10_combinatorial.jed
Found TL866II+ 04.2.126 (0x27e)
Warning: Firmware is newer than expected.
  Expected  04.2.123 (0x27b)
  Found     04.2.126 (0x27e)

VPP=12V
Warning! JED file doesn't match the selected device!

Declared fuse checksum: 0x87E2 Calculated: 0x87E2 ... OK
Declared file checksum: 0x12CF Calculated: 0x12D0 ... Mismatch!
JED file parsed OK

Use -P to skip write protect

Erasing... 0.33Sec OK
Writing jedec file...  5.04Sec  OK
Reading device...  0.41Sec  OK
Writing lock bit... 0.36Sec OK

With failed tests, checksum errors, firmware version mismatch and device ID mismatch, I was not expecting this to work.

Checking the source for this file, one of the lines is a simple AND expression.

O0 = I0 * I1

I wired up two inputs and one output (pins 2, 3 and 14), to confirm that it behaved as expected.

I used a multimeter to check that it was drawing a reasonably low current (about 7 mA). In previous experiments, these chips had made a lot of heat when programmed incorrectly.

Everything worked, so this open-source replacement is looking good so far.

Update: I repeated the simple test of combinatorial logic in an ATF22LV10C running at 3.3v. At the time of writing, this ‘low voltage’ variant is not listed as a supported chip in minipro, but can be programmed using the exact same procedure as an ATF22V10, and then operated at either 5V or 3.3V. The command I am using to program this chip is:

minipro -p ATF22V10CQZ -w baseline/GAL22V10_combinatorial.jed

Interfacing with a 6502 computer

I have some ideas for using programmable logic to extend my 6502-based computer, so I wanted to test tri-state output.

The previous test only used combinatorial output, where each pin is set high or low according to a logic function. For the ATF22V10, it should also possible to program an “output enable” function for each pin, in order to use tri-state logic. This would allow me to connect it to the 8-bit data bus on my home-built computer.

The best documentation for this was:

To test this, I programmed an ATF22V10 to output “42” (0010 1010) to the data bus when an output enable pin is low. In the source file, 0 is GND, and 1 is VCC.

Tri-state outputs are suffixed with .T, while outputs suffixed with .E are the “output enable” functions for that pin (the pin is high-impedance when false). I left one output in the combinatorial mode (always enabled), to verify that this is being controlled separately for each pin.

GAL22V10
Test1

NC    NC    I1    OE    NC    NC    NC    NC    NC    NC    NC   GND
NC    NC    O1    D7    D6    D5    D4    D3    D2    D1    D0   VCC

O1 = I1

D0.T = GND
D1.T = VCC
D2.T = GND
D3.T = VCC
D4.T = GND
D5.T = VCC
D6.T = GND
D7.T = GND

D0.E = /OE
D1.E = /OE
D2.E = /OE
D3.E = /OE
D4.E = /OE
D5.E = /OE
D6.E = /OE
D7.E = /OE

DESCRIPTION

Output 42 on the data bus when output is enabled.

I used my local build of Galette to generate a JED file, then wrote the JED file to the chip using the same process as before.

$ ./galette/target/debug/galette test1.pld
$ minipro -p ATF22V10CQZ -w test1.jed

I then built this test circuit, which connects one I/O select line from my computer to the PLD, plus the 8-bit data bus. The combinatorial output is connected to an LED for testing.

The IO2 input maps to address $8800 in my computer’s address decoding scheme. I have other blog posts about mapping new hardware devices into memory if you’re interested to know how that works.

This computer boots to BASIC, and I was able to print “42” by reading this memory address via the PEEK command.

This also confirms that D0 is the least-significant bit in the data bus, which I was not sure of before.

Next steps

This has been an interesting discovery for me, and I’m planning to use Galette instead of WinCUPL for my electronics projects. PLD’s are relatively obscure, and it’s great to have an open-source tool for working with them. The only thing I will miss from WinCUPL it its test/simulation feature. I now need to program a chip and test it in a circuit, which is laborious, and I’m quickly using up the limited number of write cycles that these devices are rated for.

I created an “ATF22V10” symbol in KiCad to draw the schematics for this blog post, which is not particularly clear, since the pin names do not correspond with the ones defined in the .pld file. The next time I’m including one of these in a schematic, I’ll try creating a separate symbol for each programmed chip.

I’m planning to use an ATF22V10 PLD to build an interrupt controller for my computer, look out for a future blog post about that.

Re-creating the world’s worst sound card

I recently built a computer from scratch, and wanted to add some basic sound output. I am not attempting to produce any 8-bit music, but I would like to be able to write programs use beeps and clicks to provide feedback, instead of only text.

I did a bit of digging into the subject, and found quite possibly the world’s worst sound card, in the Apple II. This is my attempt to re-draw the relevant part of the schematic, which I found in a 1983 publication called The Apple II Circuit Description.

The linked book does a good job of explaining the details, but the important part for this blog post is simply that Z3 is an address decoding output. It is normally high, but is low for half a cycle each time that memory address $C030 is accessed. This is fed to a flip-flop, which will toggle each time this happens, driving a speaker.

This should be easy for me to implement, since I have spare address decoding outputs on my computer.

First attempt

I decided to use a piezo buzzer, which can be driven directly from an IC pin, simplifying things greatly.

I first connected the buzzer directly to one of the spare address decoding outputs of my computer.

This is so simple that it is almost not worth diagramming, but here goes:

The input is IO2, which works out to address $8800 in my computer’s memory map. I wrote a bit about in an earlier blog post.

From BASIC (yes, this computer runs BASIC), I can read from this address and produce a click.

A=PEEK($8800)

Second attempt

To produce a beep, I wanted a square wave with a 50% duty cycle. The Apple II used a 74LS74 dual D flip flop IC for this, which I don’t have in my inventory. Instead, I used a 74AC163 4-bit counter.

The circuit on the breadboard is wired up like this:

I was then able to produce a beeping sound by reading a memory address in a loop, just as you could on the Apple II.

10 A=PEEK($8800)
20 GOTO 10
RUN

Mission accomplished.

Alternatives and next steps

I’ve already started adding clicks and beeps to assembly code while testing, and my immediate plan is to add a separate startup beep for each of the two boot ROM’s. If I ever do a second revision of this computer board, I’ll most likely add a speaker as an on-board peripheral, using a 74LS74 instead of the counter.

There are two other methods which considered for adding beeps to this computer. The first was to utilise its 65C22 chip, which can be set to output a square wave on its PB7 pin. I have other plans for that output, which is the reason I haven’t used it for sound.

The second idea I looked into was sampled audio, which would require me to latch 8-bit audio samples a few thousand times per second, and then use some resistors to build a digital-to-analog converter. There are a few ways that I could achieve this with parts which I have on-hand, such as a SN74AC573 latch, or the 65C22.

6502 computer – from breadboard to PCB

I’ve recently been working on building a 6502-based computer on breadboards as a learning project. After making a few revisions, and porting some software to run on it, I was confident that my computer worked well enough to connect up permanently on a circuit board.

This blog post about the process that I went through to convert my working breadboard prototype to a PCB. As somebody who is not trained in electronics, this involved some fresh challenges for me.

Schematic capture

As a first step, I needed to draw up a proper schematic in a CAD tool. I’ve already been learning to use KiCad’s Eeschema for schematic capture, so I went ahead and drew up the whole thing.

I decided to put all of the components on one crowded page, just because I haven’t learned to manage multiple pages yet.

Mostly, this was just replicating what I’d already built, but I also needed to:

  • create a symbol for the DS-1813, which I could not find in the libraries I’m using
  • decide on a pinout for expansion ports
  • add a jumper block, so that the two interrupt lines (NMI and IRQ) can be disconnected from the on-board chips

I avoided changing the scope of the board to include important features (storage, audio), and instead added a header with all of the important buses and signals. My goal is to have a stable base system to work with, and I can always do a second board once I’ve figured out how these should work.

I did not use KiCad’s inbuilt BOM plugins, but instead manually wrote a parts list for future reference. The only components which I needed to order were sockets and pin headers, everything else was being lifted from the prototype.

Footprint assignment

Once the Eeschema electrical rules check was passing, I moved on to selecting footprints for everything. I spent quite a bit of time checking the datasheets for my planned components against the KiCad library to choose footprints that would fit.

I could not find anything which matched the footprint of the mini SPDT switches that I’m using, so I made my own.

I had not yet chosen a ZIF socket for the EEPROM, so I selected a footprint for a standard socket instead. This was a risky move, because I had to guess how much space to leave when laying out the board.

PCB design

I had already selected a board manufacturer, so I set up my track widths and via sizes according to their capabilities.

Next, I imported the netlist and placed the components. I attempted to fit everything in 10cm x 10cm, 2-layer board to save on manufacturing costs, but it was very crowded. I was worried about having enough space for traces, fitting an (unmeasured) ZIF socket, and adding some mounting holes, so I spaced it out to fit on a 10cm x 11.6cm board instead.

I also added footprints for M3-sized mounting holes, and edge cuts with rounded corners.

It took me 4 attempts to successfully route all of the traces. The result breaks every PCB layout best practice which I had read about, but I was happy just to have everything connected by that point.

In hindsight, I could have made this task easier by splitting up the expansion header, and building an accurate footprint for a ZIF socket. I also could have placed the decoupling capacitors closer to the IC power pins, and avoided interrupting the ground plane.

The silkscreen setup was more time-consuming than expected. I wanted to make the board as self-documenting as possible, so that I would not need to open the design files on a computer when writing code and adding hardware peripherals. I labelled every switch, button, light and IC, as well as every expansion header pin (74 of them!). I also added some simple assembly hints such as resistor values, and +/- signs to indicate the polarity of the lone electrolytic capacitor.

The 3D render view in KiCad shows how the finished board would look.

Components did not display on the board at first, but there is a menu option to download the 3D models. After setting this up, and setting the solder mask colour to black, I was able to render the board properly. This looks very similar to the assembled board, with just a few parts missing.

Manufacturing

I only recently discovered that low-volume PCB manufacturing is accessible to hobbyists. Ordering the boards was really easy. I exported gerber and drill files according to their instructions, loaded them into a zip file, and uploaded them to a web portal.

I am glad that an online gerber viewer was available, because it allowed me to spot an error which I had not noticed in KiCad, where I had two pins labelled PA1.

The default settings are reasonable, but I chose black solder mask, a lead-free surface finish, and chose the option to exclude the manufacturer’s order number from the board.

The gerber viewer also has a tab which shows checks against their manufacturing rules.

I placed the order, and eagerly checked each day as the boards progressed through the manufacturing steps. From placing the order to getting the boards on my desk, the whole process took 7 days.

Assembly & test

Once all of the sockets arrived in the mail, I started assembling.

The standard advice I’ve read for through-hole assembly is to solder low-profile components first. Instead, I added enough components to light up the power LED.

I know that the power LED dims if anything is drawing too much current (eg. a short circuit). By getting this working first, I will know I’ve made a mistake (and can switch off the board quickly) if it doesn’t light up later.

I don’t think I’ve ever soldered more than a few pins or wires at a time, and this board had 323 pins to solder.

After a about 2 hours, I had a fully-populated board. It’s smaller and looks better than the breadboards, but it was time to move all the chips across to see if it actually worked.

Much to my surprise, the computer booted up to EhBASIC on the first attempt.

Wrap-up

This has been quite a fun project. It has proven to me that open source tools are perfectly sufficient for this type of hardware development. I’m also very happy to have reached a point where can call this project “done”, since I have a permanent home-built computer.

My plans for learning about low-level computing are not done yet, of course. I’ve kept everything at a low 1.8 MHz, so that I can continue to prototype hardware peripherals on a breadboard. I’ve also left a switch to select which half of ROM to boot from, so that I don’t brick my computer every time I make an error in an assembly language program.

I’ve uploaded the full parts list, design files, and firmware to GitHub at mike42/6502-computer.

Porting BASIC to my 6502 computer

I’ve recently been building my own 6502-based computer. After a lot of work on the hardware side, I decided that it’s time to move past “Hello World” programs, and port a BASIC interpreter.

Background

This computer is architecturally similar to a 1980’s home computer, where BASIC was a widely-used interpreted language. Porting an existing interpreter will be a fast way to get a command-prompt, and it will also add the ability to load arbitrary software, by typing out a program.

I heard about an interpreter called EhBASIC (“Enhanced BASIC”) from on this YouTube video by Chris Bird. It’s source-available (free for non-commercial use only), and seems to be well-regarded by 6502 enthusiasts.

EhBASIC was written by the late Lee Davidson. I based my port on the version v2.22, hosted on Klaus Dormann’s GitHub. I used Hans Otten’s mirror of Lee’s website as a reference, plus the EhBASIC section of the 6502.org forum.

Porting process

EhBASIC was a breeze to port.

I started with the ‘patched’ code from Klaus2m5/6502_EhBASIC_V2.22, which I placed in a local git repository, so that I could track the changes.

I first changed some of the syntax so that the code would assemble with ca65, since that is the assembler I’m using for everything else. There are other ca65 ports around, though they do not include the same patches.

The --feature labels_without_colons setting in ca65 was useful here, since the original source does not include colons after label names. The output was 10.5KiB of machine code, which will fit in the 16KiB of ROM space which I have available in my home-built computer.

I also confirmed that the code did not depend on using undocumented 6502 CPU opcodes, which would have been incompatible with my newer-generation 65C02 CPU.

Next, I dropped in the correct routines to read and write characters. I already had something similar from an earlier blog post. I needed to make some small changes to use the carry bit (SEC / CLC opcodes) to indicate whether a character had been received, and to set up the 6551 ACIA on startup. I programmed the code to an EEPROM, and it worked the first time.

All of my changes against the original code, including the Makefile and ca65 config file are in this commit. The memory addresses line up with the decoding scheme described here.

Wrap-up

BASIC is an interesting language, if only for historical reasons. Classic BASIC feels very clunky by modern standards, but the user experience is not that different to the modern Python REPL. Or at least, it is more similar than you might expect given how far computers have advanced.

I’m running EhBASIC on an 8-bit CPU, where the main alternative is plain 6502 assembly language. BASIC has allowed me to hit the ground running, and gives me access to floating point maths and user-loadable programs, on a computer that does not have an operating system or removable storage.

My home-built computer has a switch for selecting between two ROM’s. My plan is to write my own code in assembly language, but keep this port of EhBASIC in the secondary ROM, so that the computer can always boot to something that works.

Upgrades and improvements to my 6502 computer

For the past few weeks, I’ve been building a 6502-based retro computer, based originally on a tutorial and design by Ben Eater. My first major change was to add a serial port, so that I can write programs which accept text as input.

This blog post is a list of things which I’ve changed since, to try to make a computer which is a bit more suited to my intended use. My aim at the moment is to lock in a simple base system, so that I can use it as permanent platform for 6502 assembly and hardware experiments.

Re-visiting address decoding

The computer has a 32 KiB RAM chip, and a 32 KiB ROM chip. Due to the way that the glue logic was set up, only 16 KiB of the RAM can be used, but the full ROM space is available.

I wanted to flip this around, since I’m planning to run most programs from RAM, with only a loader or interpreter in the ROM. The idea would be to add a switch to select between the upper and lower half of ROM, map the full RAM in, and leave a large space available for I/O and other experiments.

I planned out a coarse, 2-chip address decoder to achieve this.

The resulting memory map is:

Address Maps to
C000-FFFF ROM – 16 KiB
A000-BFFF Not decoded – 8 KiB
8000-9FFF I/O – 8 KiB
0000-7FFF RAM

This is partly based on the info in Garth Wilson’s address decoding primer, and also Daryl Richtor’s SBC-2 computer.

I re-wired every chip select, modified my test program to use the new memory map, and also added a switch to select between the two halves of the ROM.

Implementing power-on reset

In Ben Eater’s 6502 computer, the reset line is connected to +5v through a resistor, and grounded when a momentary switch is pressed. At power-on, I needed to press the reset button before the computer would do anything useful.

I removed the pull-up resistor from the reset line, and added a DS1813-5+ supervisory IC instead. This holds the reset line low for 150ms at power-on, or when reset is pressed. This suggestion was from the 6502.org forums.

There was no KiCad symbol for the DS1813-5 in the default library, so I needed to create one.

Interrupt lines

I found that I had made an error when connecting the interrupt lines.

The 6502 has two interrupt inputs: IRQ and NMI. On my 6502 computer, the interrupt output of the 65C22S is connected to the IRQ input of the CPU, while the interrupt output of the 65C51S is connected to the NMI input of the CPU.

The 65C51N uses an open-drain IRQ output, so this configuration left the CPU’s NMI input floating most of the time, which caused many spurious interrupts. This was easily fixed by connecting the NMI line to +5v through a 3.3k resistor.

I also implemented some tips from Garth Wilson’s 6502 primer, and connected RDY and BE to +5v through 3.3k resistors, where they had previously been connected to +5v directly.

Power

Up until this point, I had been powering my computer with a breadboard power supply. I’m hoping to build something permanent, so I designed a circuit to power it through a DC barrel jack.

This takes a DC voltage, and steps it down to 5v through a voltage regulator. The other components are a power switch, plus a diode to avoid damaging anything if I connect a plug with the wrong polarity.

This computer does not draw much current, so it’s not necessary to add a heat-sink on the voltage regulator.

Getting my wires crossed

My next change was to remove the LCD screen. This had been useful for debugging, but I would prefer to free up breadboard space and I/O connections for other parts of this project.

I did this in four steps, testing after each one. First I moved the LCD off to its own breadboard, then updated the computer’s test program to stop using the LCD, ran the computer without the LCD connected, and finally re-located the UART chip into the newly-vacant space.

After the last step, the computer started producing the wrong text (eg. “H” became “D”). I had mixed up two bits in the data bus, and by some luck the UART setup code was not affected. The orange and red wires on the left of this photo are the ones which were swapped.

I was able narrow down the source of the problem quickly, since I was breaking everything down into small steps, which were easy to individually verify.

Next steps

For all this effort, not much has changed. I still just have an 8-bit computer which says “Hello” when it starts. The next part of this project will involve porting across some non-trivial software.

On the hardware side, I’ve made a lot of progress learning how to use KiCad. As long as everything works, my plan is to migrate this design from breadboard to a PCB in the near future.

A first look at programmable logic

I’ve recently been learning a bit about how computers work on a low-level by building a 6502-based retro computer from scratch. I’ve noticed that plenty of retro computer designs use simple programmable logic devices, including GAL’s, PAL’s and PLA’s.

I decided to take a closer look at these, since they may be a useful part of my toolkit. The particular part which grabbed my attention was the ATF22V10 programmable logic device (PLD).

This ticks a few important boxes for my current projects:

  • Available in DIP packaging, so it is suitable for use on a breadboard.
  • Operates at 5 volts.
  • Currently in production.
  • Has a relatively high pin count (slightly more than the ATF16V8).
  • Can be programmed using a TL866II+ programmer, which I already use for EEPROM programming.

These devices have been around for many decades, and I found mixed information online about the software and hardware requirements for programming these chips. A lot more has been written about the (discontinued) Lattice 22V10, which has similar capabilities, but a different programming process.

Setting up a programming environment

The logic functions for programming the ATF22V10 are specified using a human-readable hardware description language, which needs to be compiled into a device-specific “fuse map” for programming the device.

The only real choice for the ATF22V10 seems to be an obscure language called CUPL, which can be processed by Atmel/Microchip’s proprietary compiler, WinCUPL. This is distributed as a freeware Windows binary, downloadable from this page on the Microchip website.

I do all of my other software development on Linux, so I installed WinCUPL in a WINE environment. On Ubuntu 20.04, WINE is installed with the following command:

sudo apt-get install wine

For later parts of the process, I found that I also needed winetricks and cabextract:

sudo apt-get install cabextract
wget https://raw.githubusercontent.com/Winetricks/winetricks/master/src/winetricks
chmod +x winetricks

The install was simple:

wine awincupl.exe

I was then launching the program with the following command.

$ wine  ~/.wine/drive_c/Wincupl/WinCupl/Wincupl.exe

On first start-up, it prompted for registration info. This is a carry-over from before this tool was freeware, and the download page lists the correct key as 60008009.

I got persistent errors about missing DLL modules.

I initially thought that this was a library registration or 32/64-bit compatibility issue, though I eventually found that I needed to install mfc40 via winetrics:

$ ./winetricks mfc40

After this installer ran, I was able to launch WinCUPL. The editor was using a a proportional font, though the dialog indicated that it was attempting to use Courier New, which is monospace.

This is fixed by installing the Microsoft fonts via the ttf-mscorefonts-installer package.

sudo apt-get --reinstall install ttf-mscorefonts-installer

At this point I had a working editor. If it was not clear yet that this is ancient technology, this screen capture shows the I/O decoding example which is bundled with WinCUPL, which was apparently written in 1984.

Writing a test program

WinCUPL ships with a folder of example programs, and some documentation which includes a reference manual and tutorial. Most introductory information online seems to be from old electrical or computer engineering courses.

I read through:

The closest thing to a “Hello World” program for one of these chips is the GATES.PLD example, which shows how to apply some basic logic operations. It was written for the ATF16V8, so I looked up my own device to find its mnemonic, g22v10:

After swapping around the pins, I ended up with this program for testing the ATF22V10:

Name     MyTest;
PartNo   00;
Date     6/07/2021;
Revision 01;
Designer Mike;
Company  None;
Assembly None;
Location None;
Device   g22v10;

Pin 1 = a;
Pin 2 = b;

Pin 14 = inva;
Pin 15 = invb;
Pin 16 = and;
Pin 17 = nand;
Pin 18 = or;
Pin 19 = nor;
Pin 20 = xor;
Pin 21 = xnor;

inva = !a;
invb = !b;
and = a & b;
nand = !(a & b);
or = a # b;
nor = !(a # b);
xor = a $ b;
xnor = !(a $ b);

To get a JED file from this, I navigated to compile options, and disabled the “Simulate” and “Absolute” options. This skips the simulation step, which otherwise blocks compilation.

I did not find the simulator very intuitive, but I did return to it later after watching this recent tutorial. The devices can only be reprogrammed 100 times, so for anything complex, I’ll definitely be checking outputs in the simulator rather than relying on trial-and-error.

Programming the chip

I use TL866II+ programmer with the minipro open source programming software for other chips, and I found mixed information online about whether this would work for the ATF22V10.

In my case, it worked, but only after a few attempts. My first mistake was to read the chip instead of writing it.

$ minipro -p ATF22V10CQZ -r MyTest.jed
Found TL866II+ 04.2.86 (0x256)
Warning: Firmware is out of date.
  Expected  04.2.123 (0x27b)
  Found     04.2.86 (0x256)
Reading device...  0.54Sec  OK

When I plugged this into my test circuit (shown later), the blank chip warmed up, and produced erratic output.

After re-generating the JED file, I programmed the chip again, but got some warnings and verification errors. The chip still produced heat and erratic output in my test circuit.

$ minipro -p ATF22V10CQZ -w MyTest.jed
Found TL866II+ 04.2.86 (0x256)
Warning: Firmware is out of date.
  Expected  04.2.123 (0x27b)
  Found     04.2.86 (0x256)

VPP=12V
Warning! JED file doesn't match the selected device!

Declared fuse checksum: 0x7005 Calculated: 0x7005 ... OK
Declared file checksum: 0x6DDD Calculated: 0x6DDD ... OK
JED file parsed OK

Use -P to skip write protect

Erasing... 0.82Sec OK
Writing jedec file...  4.97Sec  OK
Reading device...  0.54Sec  OK
Writing lock bit... 0.26Sec OK
Verification failed at address 0x0006: File=0x00, Device=0x01

Reading back the device produced a completely different JED file (different length, different contents).

Detour: Firmware update

I started digging into the warnings, and decided to attempt a firmware update. The best instructions I could find for this were from this GitLab issue.

The short version is that once you have a firmware file from the hardware vendor, they can be applied by minipro.

$ minipro -F updateII.dat
Found TL866II+ 04.2.86 (0x256)
Warning: Firmware is out of date.
  Expected  04.2.123 (0x27b)
  Found     04.2.86 (0x256)
updateII.dat contains firmware version 4.2.126 (newer)

Do you want to continue with firmware update? y/n:y
Switching to bootloader... failed!

I got this error, probably because the device was restarted. On my setup, I am passing the USB device through to a VM, and needed to press a button to redirect it again.

$ minipro -F updateII.dat 
Found TL866II+ in bootloader mode!
updateII.dat contains firmware version 4.2.126 (newer)

Do you want to continue with firmware update? y/n:y
Erasing... OK
Reflashing... 100%
Resetting device... OK
Reflash... OK

Programming the chip again

With the updated firmware, I made a third attempt to write the chip, using the same JED file as before.

$ minipro -p ATF22V10CQZ -w MyTest.jed
Found TL866II+ 04.2.126 (0x27e)
Warning: Firmware is newer than expected.
  Expected  04.2.123 (0x27b)
  Found     04.2.126 (0x27e)

VPP=12V
Warning! JED file doesn't match the selected device!

Declared fuse checksum: 0x7005 Calculated: 0x7005 ... OK
Declared file checksum: 0x6DDD Calculated: 0x6DDD ... OK
JED file parsed OK

Use -P to skip write protect

Erasing... 0.33Sec OK
Writing jedec file...  5.04Sec  OK
Reading device...  0.41Sec  OK
Writing lock bit... 0.35Sec OK
Verification failed at address 0x16C6: File=0x01, Device=0x00

Reading the device returned a JED file with all 0’s, and there are still some warnings, but the circuit appeared to work correctly.

Alternatives

The firmware update seemed to fix any problems for me, but I have read plenty of accounts of tricky issues when attempting to program the ATF22V10 with the TL866II+ programmer.

An alternative programmer might be the Wellon VP-290 (mentioned here). On the software side, using the ATF22V10 really seems to require WinCUPL, though it does have a command-line.

There is plenty of software designed for the discontinued Lattice 22V10 GAL, though my best guess is that these are not compatible with the Atmel chips I’m using here. These include GALasm, which has a license which prohibits commercial use (enough for me to steer clear), plus an open source re-implementation called galette.

Wrap-up

It’s been interesting to take a look at the lowest-possible level of the software stack, even just to produce a “Hello World” program.

I am not using PLD’s in my current project, but it’s good to know that I could program one if I needed to. The main use for me would be as an alternative to discrete logic chips, so that I can keep parts count, layout space, and propagation delays under control. For prototyping, it will also be helpful to be able to replicate old circuits which use similar components, or to simulate logic chips which aren’t in my inventory.

One thing that I learned is that a fuse is not the same as a gate. The device I used has thousands of fuses (programmable connections) which are capable of configuring the inputs to a few hundred gates. If the temperature of my first two programming attempts is anything to go by, it seems to be perfectly possible to configure the fuses to produce a short circuit.

Adding a serial port to my 6502 computer

In my last blog post, I wrote about the 8-bit computer which I’ve been building, using an existing design by Ben Eater. The I/O capabilities of the original design are rather limited, so one of the first enhancements I’m making is to add a serial port.

Hardware

The main chip I am adding is the WDC 65C51N ACIA, which is a modern version of the MOS 6551. Versions of this chip have been on the market for around 40 years, and lots of classic computer designs use some version of it for serial output. I bought this one new, and the date code indicates that it was manufactured 11 years ago, so it’s a fair guess that they are not selling as fast as they used to.

I also needed a 1.8432 MHz oscillator, which I used to clock both the computer and the UART.

Lastly, I used a USB/UART module to interface with a modern computer. This module hosts a FT232RL chip, and the pins on this one are DTR, RX, TX, VCC, CTS and GND.

Address decoding

The 65C02 CPU in my computer uses memory-mapped I/O, so I needed to fit this new I/O chip into the memory map before I could start using it.

The original design uses a single-chip solution for address decoding, where the select lines for the ROM, RAM, and a 65C22 VIA chip are connected via 3 NAND gates.

This leaves an unused space between address 4000 and address 5FFF.

Address Maps to
8000-FFFF ROM
6000-7FFF I/O – 65C22 VIA
4000-5FFF Not decoded
0000-3FFF RAM

The 65C51 ACIA has two chip select inputs: one active-high and one active-low, much the same as the 65C22 VIA. All I needed to do was invert A13, and there was an unused NAND gate in the existing design which I could use for it.

This places the 65C51 in the unused address space. It’s not exactly efficient to assign 8KB of address space to a device which needs 4 bytes, but it does work.

Address Maps to
8000-FFFF ROM
6000-7FFF I/O – 65C22 VIA.
4000-5FFF UART – 65C51 ACIA
0000-3FFF RAM

While editing this blog post, I also re-read Garth Wilson’s address decoding guide for the 6502, which shows some alternative schemes for achieving this.

Wiring

I’m using the following wiring between the 65C51 and the USB/UART module.

Note: The the two clock inputs are connected to the 1.8432 MHz oscillator, which is not shown correctly here.

Software

This particular revision of the 6551 has some hardware bugs, though they are well-documented and can be worked around in software. Most of the excellent example code online is aimed at older (less buggy) revisions.

After four attempts, I was able to write an assembly-language program which could produce some output. The information on this 6502.org thread, and this 6502.org comment were the most accurate for my hardware setup.

I’m using this code to set up the ACIA for 8-N-1 communication at 19,200 bytes per second, with no interrupts.

ACIA_RX = $4000
ACIA_TX = $4000
ACIA_STATUS = $4001
ACIA_COMMAND = $4002
ACIA_CONTROL = $4003

reset:
    ; ... other stuff
    ; ACIA setup
    lda #$00
    sta ACIA_STATUS
    lda #$0b
    sta ACIA_COMMAND
    lda #$1f
    sta ACIA_CONTROL
    ; ... other stuff

To send, I needed to add a delay between bytes, since the hardware bug prevents the transmit bit in the status register from operating correctly. I found some code with nested loops, but it only worked after increasing the delay far beyond what should have been necessary. An alternative work-around is to generate a timed interrupt from the 65C22 VIA, which I’m hoping to try later.

; print A register to ACIA
; Based on http://forum.6502.org/viewtopic.php?f=4&t=2543&start=30#p29795
print_char_acia:
  pha
  lda ACIA_STATUS
  pla
  sta ACIA_TX
  jsr delay_6551
  rts

delay_6551:
    phy
    phx
delay_loop:
  ldy #6 ; inflated from numbers in original code.
minidly:
  ldx #$68
delay_1:
  dex
  bne delay_1
  dey
  bne minidly
  plx
  ply
delay_done:
  rts

I am using this routine to receive characters. It will block until the next character is received, and I will most likely need to replace this with something interrupt-driven once I start to add more complex programs.

; hang until we have a character, return it via A register.
recv_char_acia:
  lda ACIA_STATUS
  and #$08
  beq recv_char_acia
  lda ACIA_RX
  rts

I ended up with a program which prints “Hello” to both the LCD and serial port when the computer resets, then accepts text input. Any characters received over serial are then printed back to the terminal, and also to the LCD.

Mistakes were made

Since this is a learning project, I’m keeping a log of mistakes that I’m making along the way. Today’s lesson is to check everything, because it’s very difficult to debug multiple problems at once. When I ran my first test program, there were four faults.

  • I interfaced the USB/UART module to the 65C51 by matching up pin names, which does not work – RX on one side of the serial connection should go to TX on the other.
  • I incorrectly calculated the memory map, so the test program was writing to address 8000 while the 65C51 was mapped to address 4000.
  • I based my code on examples which do not work on this chip revision, because of the hardware bug noted above.
  • I also miscalculated the baud rate, so even if I didn’t have the other faults, my settings for minicom would not have worked.

Wrap-up

It’s very straightforward to modify Ben Eater’s 6502 computer design by adding a 65C51 ACIA. This upgrade will allow me to write (or port) software which uses text I/O. I’m planning a few more changes to this design before I port anything too serious though.

This is also the first time I’ve included (pieces of) schematics in a blog post. I’m drawing these with KiCad, and using a slightly modified version of Nicholas Parks Young’s 6502 KiCad library, which has saved me a bit of time.