Implementing the XMODEM protocol for file transfer

I recently built a 6502-based computer from scratch, and I’m using it as a platform for testing 6502 assembly code. The software on this computer is currently very limited, and re-programming an EEPROM chip to test changes is a slow process.

This blog post covers changes which I made to implement the XMODEM protocol, allowing me to upload programs over a serial connection.

The protocol itself

I chose XMODEM because it’s the simplest protocol which is compatible with modern terminal software. All I need right now is the ability to place some data (machine code) into RAM, so that I can execute it.

An important note that is that modern file transfer protocols would normally run over TCP/IP, while XMODEM runs over a serial connection. This works for me, since it’s the only interface I’ve got.

For what it’s worth, XMODEM is also era-appropriate technology for this retro computer, since it was first implemented in 1977. The 6502 processor came out in 1975.

Implementation

Implementing XMODEM was the easy part of this process. There are existing 6502 assembly implementations which I could have used, but I decided to work from the description on the Wikipedia article instead. This is partly to make sure I understand it, and partly to keep the software licensing of my ROM as clear as possible.

I have the beginnings of a shell for running commands from ROM, so I implemented two new commands:

  • rx, to receive a file over XMODEM and load it into memory.
  • run to jump to the program

To start the transfer, I found it useful to write a method for reading a character with a time-out. I already have routines for reading and writing characters from serial.

; Like acia_recv_char, but terminates after a short time if nothing is received
shell_rx_receive_with_timeout:
  ldy #$ff
@y_loop:
  ldx #$ff
@x_loop:
  lda ACIA_STATUS              ; check ACIA status in inner loop
  and #$08                     ; mask rx buffer status flag
  bne @rx_got_char
  dex
  cpx #0
  bne @x_loop
  dey
  cpy #0
  bne @y_loop
  clc                          ; no byte received in time
  rts
@rx_got_char:
  lda ACIA_RX                  ; get byte from ACIA data port
  sec                          ; set carry bit
  rts

The entire receive process is around 60 lines of assembly code. The process involves ASCII NAK character repeatedly until a SOH character is received, which indicates the start of a 128 byte packet. All padding around each packet is discarded, and the payload is written to memory. This process is repeated until the end of transmission (EOT) is received.

; Built-in command: rx
; recives a file over XMODEM protocol, and loads it into memory
USER_PROGRAM_START = $0400   ; Address for start of user programs
USER_PROGRAM_WRITE_PTR = $00 ; ZP address for writing user program

shell_rx_main:
  ; Set pointers
  lda #<USER_PROGRAM_START   ; Low byte first
  sta USER_PROGRAM_WRITE_PTR
  lda #>USER_PROGRAM_START   ; High byte next
  sta USER_PROGRAM_WRITE_PTR + 1
  ; Delay so that we can set up file send
  lda #1                     ; wait ~1 second
  jsr shell_rx_sleep_seconds
  ; NAK, ACK once
@shell_block_nak:
  lda #$15                   ; NAK gets started
  jsr acia_print_char
  lda SPEAKER                ; Click each time we send a NAK or ACK
  jsr shell_rx_receive_with_timeout  ; Check in loop w/ timeout
  bcc @shell_block_nak       ; Not received yet
  cmp #$01                   ; If we do have char, should be SOH
  bne @shell_rx_fail         ; Terminate transfer if we don't get SOH
@shell_rx_block:
  ; Receive one block
  jsr acia_recv_char         ; Block number
  jsr acia_recv_char         ; Inverse block number
  ldy #0                     ; Start at char 0
@shell_rx_char:
  jsr acia_recv_char
  sta (USER_PROGRAM_WRITE_PTR), Y
  iny
  cpy #128
  bne @shell_rx_char
  jsr acia_recv_char         ; Checksum - TODO verify this and jump to shell_block_nak to repeat if not mathing
  lda #$06                   ; ACK the packet
  jsr acia_print_char
  lda SPEAKER                ; Click each time we send a NAK or ACK
  jsr acia_recv_char
  cmp #$04                   ; EOT char, no more blocks
  beq @shell_rx_done
  cmp #$01                   ; SOH char, next block on the way
  bne @shell_block_nak       ; Anything else fail transfer
  lda USER_PROGRAM_WRITE_PTR ; This next part moves write pointer along by 128 bytes
  cmp #$00
  beq @block_half_advance
  lda #$00                   ; If low byte != 0, set to 0 and inc high byte
  sta USER_PROGRAM_WRITE_PTR
  inc USER_PROGRAM_WRITE_PTR + 1
  jmp @shell_rx_block
@block_half_advance:         ; If low byte = 0, set it to 128
  lda #$80
  sta USER_PROGRAM_WRITE_PTR
  jmp @shell_rx_block
@shell_rx_done:
  lda #$6                    ; ACK the EOT as well.
  jsr acia_print_char
  lda SPEAKER                ; Click each time we send a NAK or ACK
  lda #1                     ; wait a moment (printing does not work otherwise..)
  jsr shell_rx_sleep_seconds
  jsr shell_rx_print_user_program
  jsr shell_newline
  lda #0
  jmp sys_exit
@shell_rx_fail:
  lda #1
  jmp sys_exit

This code is the first time that I’ve used the “Zero Page Indirect Indexed with Y” address mode. It uses a pointer to the program’s location, which is incremented 128 each time a packet is accepted.

The index in the current packet is stored in the Y register, allowing each incoming byte to be written to memory with one operation, which is called in a tight loop.

sta (USER_PROGRAM_WRITE_PTR), Y

The other routines mentioned in this source listing are:

  • shell_rx_sleep_seconds – sleeps for the number of seconds in the A register.
  • shell_rx_print_user_program – prints the first 256 bytes of the program for verification.

The checksum is entirely ignored. Everything here is running on my desk, so I’m not too worried about transmission errors.

After a few iterations, I was able to upload placeholder text to RAM.

The screen capture shows 256 bytes of memory (2 packets in XMODEM). The ASCII SUB character (1a in hex) fills out unused bytes at the end, since everything sent over XMODEM needs to be a multiple of 128 bytes.

Assembling some programs

I thought I was almost done at this point, but it turns out that writing a “Hello World” test program to run from RAM is easier said than done.

I started by creating a new ld65 linker configuration which links code to run from address $0400 onwards, which is where programs are loaded into RAM. This system does not have a loader for re-locatable code, and absolute addresses are assigned by the linker.

This is the configuration which I used for these tests. It has some flaws, but it’s good enough to place the CODE segment in the right place.

# Test configuration for user programs
MEMORY {
    ZP:     start = $00,    size = $0100, type = rw, file = "";
    RAM:    start = $0100,  size = $0300, type = rw, file = "";
    MAIN:   start = $0400,  size = $1000, type = rw, file = %O;
    ROM:    start = $C000,  size = $4000, type = ro, file = "";
}

SEGMENTS {
    ZEROPAGE: load = ZP,  type = zp;
    BSS:      load = RAM, type = bss;
    CODE:     load = MAIN, type = ro;
    VECTORS:  load = ROM, type = ro,  start = $FFFA, optional=yes;
}

If I wanted my test program to terminate, then I needed to find the address of sys_exit in ROM. This routine hands control back to the shell after a command completes.

To get this, I re-assembled the ROM, but added the flag --debug-info to the ca65 assembler, and --dbgfile boot.dbg to the ld65 linker. This produces a text file, which contains the final address of each symbol.

sym id=45,name="sys_exit",addrsize=absolute,scope=0,def=28,ref=298+192+426+390+157,val=0xC11E,seg=0,type=lab

The simplest test program I could come up with was:

; location in ROM
sys_exit = $c11e

.segment "CODE"
main:
  jmp sys_exit

I was able to assemble this program to 3 bytes of binary and run it. It does absolutely nothing, and simply returns to the shell.

To extend this into a “Hello World” program, I needed to look up the location of two more routines in the ROM.

; locations of some functions in ROM
acia_print_char = $c020
shell_newline   = $c113
sys_exit        = $c11e

.segment "CODE"
main:
  ldx #0
@test_char:
  lda test_string, X
  beq @test_done
  jsr acia_print_char
  inx
  jmp @test_char
@test_done:
  jsr shell_newline
  lda #0
  jmp sys_exit

test_string: .asciiz "Test program"

This animation shows the process of uploading the binary and executing it via minicom.

There is nothing special about the invocation for minicom here, it’s just a bit-rate and device name, which in my case is:

minicom -b 19200 -D /dev/ttyUSB0

Automating the boring stuff

This approach is very fragile, because the code is full of hard-coded memory addresses.

Any time I change the ROM, the addresses for these routines could change, and this program will no longer work.

On any real architecture, applications call into the OS via system calls, or maybe a jump table for 8-bit systems. I don’t need a stable binary interface, and there is no userspace/kernel distinction, so I decided not to go down that path.

Instead, I wrote a Python script to generate an assembly source file, using the same debug symbols which I was manually reading. This runs after each ROM build.

The format is:

...
.export VIA_T2C_H                        = $8009
.export VIA_T2C_L                        = $8008
.export acia_delay                       := $c02b
.export acia_print_char                  := $c014
...

When this file is assembled, it doesn’t generate any code, but it does define all of the constants and labels from the ROM. When applications link against this, they can .import anything which would be available to ROM-based code.

The start of the above script is re-written as:

.import acia_print_char, shell_newline, sys_exit

This means that I have source compatibility, just not binary compatibility.

Bringing everything together

With my initial goal achieved, I was still not happy with the development experience. I started to dig into the murky world of minicom scripting to see if I could automate the process of uploading/running programs, and found that it is indeed possible.

The scripting capabilities of minicom do not appear to be widely used, and I needed to read through the source code to find ways to work around the problems I encountered.

For example, the “send” command was sending characters too quickly for the target machine. There are hard-coded delays between commands in the interpreter, but you can’t use multiple “send” commands to slow things down, because it is also hard-coded to append a line ending each time.

The fix was to use !<, which sends the output of a command verbatim. This script uses echo -n command to send each character individually. This is the exact same procedure as the animation in the previous section. First, the script types “rx”, then sends the file, then types “run”.

# confirm we have a shell prompt
send ""
sleep 0.1
# put into receive mode
expect "# "
sleep 0.1
!< echo -n 'r'
!< echo -n 'x'
send ""
sleep 0.1
# send the program and wait for prompt
! sx -q $TEST_PROGRAM 2> /dev/null
expect "# "
# run program
sleep 0.1
!< echo -n 'r'
!< echo -n 'u'
!< echo -n 'n'
send ""
sleep 0.1
# Terminate minicom when program completes (bit crazy..)
expect "# "
sleep 0.1
! killall -9 minicom

It’s also worth talking about the end of this script. When minicom scripts end, the user is returned to an interactive session. I wanted the minicom process to terminate without clearing the screen, so I run killall -9 minicom as the last line of the script.

This is not the safest thing to include in a script, but it’s certainly effective, and I am quite certain that there is no built-in way to achieve this with the current version of minicom. Realistic alternatives include patching minicom, switching to different terminal software.

The minicom invocation includes the name of the minicom script to execute, and the script will use the TEST_PROGRAM environment variable to name the binary file to run.

TEST_PROGRAM=test.bin minicom -b 19200 -S minicom_autorun.txt -D /dev/ttyUSB0

When running this command, the program executes on the 6502 computer seamlessly, though minicom leaves the terminal in a broken state when it is killed, and bash prints the “Killed” text to the screen.

To work around this, I wrapped the minicom invocation in a shell script which builds the program, suppresses all errors, and always returns 0. This avoids constant reporting that killall -9 is a problem, but it does hide any other problems.

#!/bin/bash
set -eu -o pipefail
make $1.bin
(TEST_PROGRAM=$1.bin minicom -b 19200 -S minicom_autorun.txt -D /dev/ttyUSB0 || true) 2> /dev/null

This script is invoked as:

./assemble_and_run.sh test

A few months back, I wrote an IntelliJ plugin for 6502 assembly, and I’m writing all of my code in PyCharm. The last step was to get this running from my IDE.

I added a build configuration which invokes the assemble_and_run.sh script. The script is completely generic, and I can use it to run other programs by changing the argument.

I can now write a program, click run, and watch it execute on real hardware, all without leaving my IDE. It took a lot of effort, but this is a huge improvement from where I started.

Wrap-up

A fast compile/test/run cycle is a huge gain for developer productivity on any system. I can now get fast, accurate feedback on whether my 6502 assembly programs work on real hardware. The first program I’m writing involves interfacing with an SD card, which will be a lot more approachable with this improvement.

I was surprised at how tricky it was to assemble and link a small program to run in RAM instead of ROM, and I can see that my linker configuration will need to be improved as I write programs which use more data. My basic plan is to test features by uploading them over XMODEM, then add them to the ROM when they are working.

The full source code for this project, which includes hardware and software, can be found on GitHub at mike42/6502-computer.

Porting BASIC to my 6502 computer

I’ve recently been building my own 6502-based computer. After a lot of work on the hardware side, I decided that it’s time to move past “Hello World” programs, and port a BASIC interpreter.

Background

This computer is architecturally similar to a 1980’s home computer, where BASIC was a widely-used interpreted language. Porting an existing interpreter will be a fast way to get a command-prompt, and it will also add the ability to load arbitrary software, by typing out a program.

I heard about an interpreter called EhBASIC (“Enhanced BASIC”) from on this YouTube video by Chris Bird. It’s source-available (free for non-commercial use only), and seems to be well-regarded by 6502 enthusiasts.

EhBASIC was written by the late Lee Davidson. I based my port on the version v2.22, hosted on Klaus Dormann’s GitHub. I used Hans Otten’s mirror of Lee’s website as a reference, plus the EhBASIC section of the 6502.org forum.

Porting process

EhBASIC was a breeze to port.

I started with the ‘patched’ code from Klaus2m5/6502_EhBASIC_V2.22, which I placed in a local git repository, so that I could track the changes.

I first changed some of the syntax so that the code would assemble with ca65, since that is the assembler I’m using for everything else. There are other ca65 ports around, though they do not include the same patches.

The --feature labels_without_colons setting in ca65 was useful here, since the original source does not include colons after label names. The output was 10.5KiB of machine code, which will fit in the 16KiB of ROM space which I have available in my home-built computer.

I also confirmed that the code did not depend on using undocumented 6502 CPU opcodes, which would have been incompatible with my newer-generation 65C02 CPU.

Next, I dropped in the correct routines to read and write characters. I already had something similar from an earlier blog post. I needed to make some small changes to use the carry bit (SEC / CLC opcodes) to indicate whether a character had been received, and to set up the 6551 ACIA on startup. I programmed the code to an EEPROM, and it worked the first time.

All of my changes against the original code, including the Makefile and ca65 config file are in this commit. The memory addresses line up with the decoding scheme described here.

Wrap-up

BASIC is an interesting language, if only for historical reasons. Classic BASIC feels very clunky by modern standards, but the user experience is not that different to the modern Python REPL. Or at least, it is more similar than you might expect given how far computers have advanced.

I’m running EhBASIC on an 8-bit CPU, where the main alternative is plain 6502 assembly language. BASIC has allowed me to hit the ground running, and gives me access to floating point maths and user-loadable programs, on a computer that does not have an operating system or removable storage.

My home-built computer has a switch for selecting between two ROM’s. My plan is to write my own code in assembly language, but keep this port of EhBASIC in the secondary ROM, so that the computer can always boot to something that works.

IntelliJ plugin for 6502 assembly language

I’ve been writing some 6502 assembly code in the past year, and have found the developer experience for this language to be lacking some modern conveniences.

In my last blog post, I described the development tools that I used to implement a simple NES game on Linux. I used text editor to write the code, and it couldn’t do much more than syntax highlighting.

I write most of my other code in IntelliJ, so I decided to take a look at what would be required to write a plugin for 6502 assembly support. I managed to together something that mostly works, which I published to the JetBrains Marketplace last weekend.

You can find it by searching “6502 Assembly” in most JetBrains IDE’s.

Features

Firstly, it supports syntax highlighting. I limited the scope of the plugin to ca65 assembler syntax only, since it’s the assembler that I know best.

You can navigate to any label with Ctrl+Click. This will not yet work for other types of symbols, such as constants, macros, and imports.

If the plugin sees a jump or branch statement, and can figure out where the jump goes, then it will show a gutter icon which navigates to the target.

You can find the usages of a label.

In ca65, you can use nested scopes. The plugin shows code folding buttons to collapse these scopes.

You can navigate to a symbol by name using the “Go To Symbol” action.

The plugin allows for commenting and un-commenting blocks of code, though there is no formatter, so indentation sometimes still has to be fixed manually.

It also allows you to rename a label and its usages, which is a great time-saver.

Limitations

One limitation is that the plugin does not fully parse expressions, so it will not detect errors from mis-matched brackets.

Secondly, the plugin does not understand the project structure. This means that if you re-use the same label name in different places in your project, the “Rename” and “Find Usages” function will match all of them at once, because it is not smart enough to follow imports and apply scope rules.

Future improvements

A lot of things can be done with an IDE plugin, but I’m planning to use this initial implementation for a while before attempting to add any big-ticket features.

A reasonable goal might be to have “Hello World” project skeletons for common 6502 systems, and to support launching with some common emulators. I would like to be able to set breakpoints and debug 6502 programs in an IDE, but external debugging interfaces are not commonly available for retro emulators, so this is probably not a reasonable goal.

The code is on GitHub under an MIT license. If you are using this plugin and would like to help improve it, then pull requests are welcome.

Building my first NES game: A retrospective

Last year, I spent a fair amount of time learning how to make games for the Nintendo Entertainment System. The homebrew scene for this system is very much alive, and I was quite proud that I was able to make a simple, working video game which runs on a NES emulator.

So, I present to you: 8-bit Table Tennis.

This blog post just a few notes about the tools and resources that I used for NES development on Linux, plus a few things that I learned along the way.

Development tools

The CPU in the NES is a derivative of the once-ubiquitous MOS 6502, and games for it are mostly written in 6502 assembly. I already do a fair amount of programming, but quickly found that my usual editors, compilers and debuggers were useless for this platform.

The main resources I used were:

To build my code, I used the cc65 toolchain, which includes a cross-assembler. I did most of my testing in the Nestopia emulator, and most of my editing in Gedit. All three of these are available in the official Debian repositories.

Gedit setup

Out of the box, Gedit can’t syntax highlight 6502 assmebly. I found a language spec file for it on the the 6502.org forums, and edited it slightly before using it.

At the time of writing, new language specs can be installed on Debian like this:

sudo cp asm6502.lang /usr/share/gtksourceview-4/language-specs/asm6502.lang && sudo chmod 0644 /usr/share/gtksourceview-4/language-specs/asm6502.lang

It possible to add a console and git support to with gedit, but this is still a long way from a full IDE.

Geany

Geany is a programming text editor, and is also available in the official Debian repositories.

I wasn’t able to get it to recognise 6502 assembly, but it does have good x86 assembly (NASM) support, which is similar enough for it to allow navigation through the source code using labels.

Geany can also set up projects with build scripts (such as a Makefile), which could allow for quicker testing.

FCEUX

FCEUX is a NES emulator which has an in-built debugger, and it runs well under WINE. I was also able to build and run it natively on Linux, but the debugger is only present in the Windows build.

I also briefly experimented with using the FCEUX Lua interface to pause/resume the emulator over a TCP socket for debugging, since that interface is available in the Linux build. I could only pause the emulator between frames, so I decided to abandon this.

Graphics

I created the graphics in the GNU Image Manipulation Program, and manually broke the title page down into tiles.

I also wrote a custom program to convert a 4-colour PNG file into the native NES CHR graphics format. I run this conversion as part of the build process, so that the graphics files can be stored in a modern format.

Things I learned

Programming for the NES is quite simple once the project is up and running, but even simple tasks can be a real hassle. I spent more time than I expected on mundane tasks such as collision detection, and certainly did a poor job of implementing game physics. The 6502 CPU has no built-in way to multiply, divide, or perform floating point operations. The version used for the NES additionally lacks any way to perform a binary-to-decimal conversion, which would have been very useful for displaying the player scores! In hindsight, I would have been able to improve the physics by pre-computing some lookup tables.

On the other hand, I expected it to be difficult to work within 2KiB of RAM, but due to the simplicity of the game, I only used around 25% of it, and had plenty of CHR ROM space leftover as well.

It took me three weekends to make this project, without having ever written a line of 6502 assembly before. I think that development would be a lot faster if I were able to set breakpoints from my code editor, so I will definitely be looking at other debugging emulators before attempting my next 6502 assembly project.

Conclusion

8-bit Table Tennis is available as an iNES ROM on GitHub.

I have only been running it with an emulator, so if any readers of this blog own a flash cartridge for the NES (such as an Everdrive), then please let me know if it works on the real hardware!

Benchmarking PHP code with PhpBench

This blog post is all about measuring the speed of PHP code through micro-benchmarks.

Why micro-benchmarks

I think of micro-benchmarks as a complement to tests. A well-written test would give a developer a definite pass or fail result, while well-written micro-benchmark will give the developer an indication of the duration of an operation.

To put it another way, tests can help you find out whether the code is behaving correctly, while micro-benchmarks can help you to track whether your changes are making the code faster or slower.

Make sure that your code is doing real processing work before you spend time on this: Algorithms such as sorting, compressing, or parsing are good candidates. My guess is that most of the code in a regular PHP application would not benefit significantly from having a suite of benchmarks, because web apps tend to be I/O bound (eg database, network and disk access).

Introducing PhpBench

I have previously written a standalone test script every time I’ve needed to measure the speed of something in PHP. This works up to a point, but it gets quite hard to maintain as the number of benchmarks increases.

PhpBench is one of the available tools for running micro-benchmarks from PHP, and there are a few reasons why I think it’s worth a try:

  • it’s actively maintained
  • it installs with composer
  • it runs in a similar way to PHPUnit, so the benchmarks are fully specified in PHP code and run with a PHP tool.
  • it implements familiar concepts that you would find in benchmarking tools from other languages (eg. JMH from the Java world).

Installation

Start out with a blank composer project.

$ composer init

Add some auto-loading settings to composer.yml that PHP knows to find ExampleApp classes in the src folder.

{
    "name": "mike42/php-benchmark-examples",
    "description": "Example PHP benchmark project",
    "type": "project",
    "license": "MIT",
    "minimum-stability": "dev",
    "require": {},
    "require-dev": {},
    "autoload": {
        "psr-4": {
            "ExampleApp\\": "src"
        }
    }
}

At this point, install the phpbench dev version.

$ composer require --dev phpbench/phpbench:@dev
./composer.json has been updated
Loading composer repositories with package information
Updating dependencies (including require-dev)
Package operations: 22 installs, 0 updates, 0 removals
  - Installing symfony/process (4.4.x-dev 1a42849): Cloning 1a42849a7f from cache
  - Installing symfony/options-resolver (4.4.x-dev 94cbb72): Cloning 94cbb72bb9 from cache
  - Installing symfony/finder (4.4.x-dev 1b5ec12): Cloning 1b5ec12340 from cache
  - Installing symfony/polyfill-ctype (dev-master 82ebae0): Cloning 82ebae0220 from cache
  - Installing symfony/filesystem (4.4.x-dev 5914824): Cloning 59148241f7 from cache
  - Installing psr/log (dev-master c4421fc): Cloning c4421fcac1 from cache
  - Installing symfony/debug (4.4.x-dev 8278839): Cloning 8278839457 from cache
  - Installing symfony/service-contracts (dev-master 0c81a04): Cloning 0c81a04f68 from cache
  - Installing symfony/polyfill-php73 (dev-master d1fb4ab): Cloning d1fb4abcc0 from cache
  - Installing symfony/polyfill-mbstring (dev-master fe5e94c): Cloning fe5e94c604 from cache
  - Installing symfony/console (4.4.x-dev e2fe100): Cloning e2fe1002fd from cache
  - Installing lstrojny/functional-php (1.9.0): Loading from cache
  - Installing beberlei/assert (v3.x-dev ce139b6): Cloning ce139b6bf8 from cache
  - Installing seld/jsonlint (1.7.1): Loading from cache
  - Installing psr/container (dev-master 014d250): Cloning 014d250dae from cache
  - Installing phpbench/container (1.2): Loading from cache
  - Installing webmozart/assert (1.4.0): Loading from cache
  - Installing webmozart/path-util (dev-master 95a8f7a): Cloning 95a8f7ad15 from cache
  - Installing phpbench/dom (0.2.0): Loading from cache
  - Installing doctrine/lexer (dev-master ee614dd): Cloning ee614dd93a from cache
  - Installing doctrine/annotations (1.7.x-dev 3f35255): Cloning 3f35255290 from cache
  - Installing phpbench/phpbench (dev-master dccc67d): Cloning dccc67dd52 from cache
symfony/service-contracts suggests installing symfony/service-implementation
symfony/console suggests installing symfony/event-dispatcher
symfony/console suggests installing symfony/lock
Writing lock file
Generating autoload files

Writing a benchmark

Add some code to src/ExampleThing.php to measure.

<?php

namespace ExampleApp;

class ExampleThing {
  /**
   * Inefficiently multiply numbers together
   */
  public function multiply(int $x, int $y) : int {
    $ret = 0;
    for($i = 0; $i < $x; $i++) {
      for($j = 0; $j < $y; $j++) {
        $ret++;
      }
    }
    return $ret;
  }
}

Write a benchmark for the code in benchmarks/ExampleThingBenchmark.php

<?php

use ExampleApp\ExampleThing;

/**
 * @BeforeMethods({"init"})
 * @Revs(1000)
 * @Iterations(5)
 */
class ExampleThingBenchmark {

    private static $exampleThing;

    public function init()
    {
        self::$exampleThing = new ExampleThing();
    }

    /**
     * @Subject
     */
    public function doMultiply()
    {
        self::$exampleThing -> multiply(100, 100);
    }
}

Before you run anything, you will also need a configuration file. Add this minimal configuration to phpbench.json.dist.

{
    "php_disable_ini": true,
    "bootstrap": "vendor/autoload.php",
    "path": "benchmark",
    "php_config": {
        "extension": [ "json.so" ]
    },
    "time_unit": "milliseconds"
}

Running the benchmark

The most basic way to run all benchmarks at once is phpbench run, which looks like this:

$ php vendor/bin/phpbench run
PhpBench @git_tag@. Running benchmarks.
Using configuration file: /home/mike/workspace/blog/phpbench/php-benchmarks/phpbench.json.dist

\ExampleThingBenchmark

    doMultiply..............................I4 [μ Mo]/r: 0.232 0.232 (ms) [μSD μRSD]/r: 0.000ms 0.10%

1 subjects, 5 iterations, 1,000 revs, 0 rejects, 0 failures, 0 warnings
(best [mean mode] worst) = 0.232 [0.232 0.232] 0.233 (ms)
⅀T: 1.162ms μSD/r 0.000ms μRSD/r: 0.100%

As you might expect, there are options to run specific benchmarks, or to present the results in different ways. However, there were some PhpBench-specific things which you will need to navigate to use it successfully.

Firstly, there are two different output settings to consider:

  • report options allow you to decide which data to print in a table.
  • output options allow you to decide how to format it for output

I also found it useful to add --progress=none to suppress the text displayed by default, since you will otherwise get broken HTML output.

Lastly, be aware that confusingly, the default report is not active by default. Specify it to get a table listing out each iteration.

$ php vendor/bin/phpbench run --progress=none --report=default
suite: 13415431dc0db41c12decee0fa83c26d1db0f678, date: 2019-05-31, stime: 22:01:34
+-----------------------+------------+-----+------+------+----------+-----------+--------------+----------------+
| benchmark             | subject    | set | revs | iter | mem_peak | time_rev  | comp_z_value | comp_deviation |
+-----------------------+------------+-----+------+------+----------+-----------+--------------+----------------+
| ExampleThingBenchmark | doMultiply | 0   | 1000 | 0    | 915,736b | 236.716μs | +1.68σ       | +1.08%         |
| ExampleThingBenchmark | doMultiply | 0   | 1000 | 1    | 915,736b | 231.999μs | -1.45σ       | -0.93%         |
| ExampleThingBenchmark | doMultiply | 0   | 1000 | 2    | 915,736b | 233.960μs | -0.15σ       | -0.1%          |
| ExampleThingBenchmark | doMultiply | 0   | 1000 | 3    | 915,736b | 233.878μs | -0.2σ        | -0.13%         |
| ExampleThingBenchmark | doMultiply | 0   | 1000 | 4    | 915,736b | 234.361μs | +0.12σ       | +0.08%         |
+-----------------------+------------+-----+------+------+----------+-----------+--------------+----------------+

The type of report that I found most useful is aggregate, since it provides a summary of each iteration.

$ php vendor/bin/phpbench run --progress=none --report=aggregate
suite: 1341543c02ebee4031d165665c2724c379bf9c98, date: 2019-05-31, stime: 22:01:18
+-----------------------+------------+-----+------+-----+----------+-----------+-----------+-----------+-----------+---------+--------+-------+
| benchmark             | subject    | set | revs | its | mem_peak | best      | mean      | mode      | worst     | stdev   | rstdev | diff  |
+-----------------------+------------+-----+------+-----+----------+-----------+-----------+-----------+-----------+---------+--------+-------+
| ExampleThingBenchmark | doMultiply | 0   | 1000 | 5   | 915,736b | 231.907μs | 232.614μs | 232.089μs | 233.538μs | 0.713μs | 0.31%  | 1.00x |
+-----------------------+------------+-----+------+-----+----------+-----------+-----------+-----------+-----------+---------+--------+-------+

Both of the example above use console output, which is the default.

HTML reports

To capture phpbench output from a CI environment, you can generate a HTML report as well. Add an output section to phpbench.json.dist.

{
    ...
    "outputs": {
         "html_file": {
             "extends": "html",
             "file": "benchmarks.html",
             "title": "Example benchmark report"
         }
    }
}

This new html_file output can be used with --output=html_file

$ php vendor/bin/phpbench run \
    --progress=none --report=aggregate \
    --output=console --output=html_file

From Jenkins

This is a Jenkinsfile that I use to run benchmarks. You need to install the “HTML Publisher” and “AnsiColor” Jenkins plugins.

pipeline {
  agent any

  stages {
    stage('Install') {
      steps {
        ansiColor('xterm') {
          sh 'composer install'
        }
      }
    }
    stage('Run benchmarks') {
      steps {
        ansiColor('xterm') {
          sh 'php vendor/bin/phpbench run --progress=none --report aggregate --output=console --output=html_file --ansi'
          publishHTML([
            allowMissing: false,
            alwaysLinkToLastBuild: true,
            keepAll: true,
            reportDir: '',
            reportFiles: 'benchmarks.html',
            reportName:
            'phpbench report',
            reportTitles: ''
          ])       
        }
      }
    }
  }
}

Each build is then benchmarked, and the results are available as a link on the left.

The console output of the build shows the results.

To actually view the HTML report you will need to apply this tweak.

Next steps

I’ve started to add phpbench micro-benchmarks to some PHP projects that I work on, and I’m finding it a lot more useful than standalone test scripts.

The next step would be integrate micro-benchmarks better into the development process. For example, I can already review code coverage improvements/regressions of a change on GitHub using Coveralls, which posts a comment to each pull request. This saves you from merging changes which don’t meet your project’s standards.

As far as I can tell, you would need to write custom code for more expressive phpbench reports along these lines, such as:

  • the history of a benchmark over time, or
  • the most-changed benchmarks against a baseline

phpbench does provide the building-blocks though, since you can set some options to archive the results of each run to a folder.

How to integrate Gitea and Jenkins

Gitea is a web app for hosting Git repositories. It’s open source, and very simple to get running. With some extra setup, it can also trigger Jenkins builds, and display the Jenkins build status of each commit once it has been built.

Because the documentation for the Jenkins plugin is very minimalist, I decided to write about it for future reference.

About this setup

I installed Jenkins and Gitea on the same Debian 9 server on the LAN. They communicate only over HTTP, so they could just as easily be installed separately.

To make the configuration clear, I’ve used jenkins.example.com in URLs which refer to Jenkins, and gitea.example.com for the Gitea.

Gitea installation

This command will install and start the linux-amd64 version of Gitea as the user “git”.

useradd git -r --create-home && \
  mkdir /opt/gitea && chown -R git: /opt/gitea && \
  wget -O /opt/gitea/gitea https://dl.gitea.io/gitea/1.7.0/gitea-1.7.0-linux-amd64 && \
  chmod +x /opt/gitea/gitea && \
  sudo -u git bash -c "cd /opt/gitea && ./gitea web"

Shut it down, and configure some paths at /opt/gitea/custom/conf/app.ini. These will depend on your environment.

SSH_DOMAIN       = gitea.example.com
DOMAIN           = gitea.example.com
HTTP_PORT        = 3000
ROOT_URL         = http://gitea.example.com:3000/

Start it back up as a systemd service at this point, by creating /etc/systemd/system/gitea.service with this content:

[Unit]
Description=gitea
After=network.target

[Service]
ExecStart=/opt/gitea/gitea web
WorkingDirectory=/opt/gitea
User=git
Type=simple

[Install]
WantedBy=multi-user.target

Once this is saved, start the service.

systemctl daemon-reload
systemctl start gitea

Optionally, also configure it to start on boot.

systemctl enable gitea

Jenkins installation

I installed Jenkins from the official Debian repo at jenkins-ci.org, and clicked through the initial install.

Add plugin

Open up Manage JenkinsManage Plugins. Navigate to Available and check the Gitea plugin.

Next, install the plugin and restart Jenkins.

The configuration for the plugin is located under Manage JenkinsConfigure System.

At this point you will want to tell Jenkins where to find your Gitea server.

I don’t suggest choosing Manage hooks, because it uses the same account to manage hooks across all repos, which would violate the principle of least privilege.

Set up a project in Gitea

In Gitea, create a project, then a repository under that.

Register an account in Gitea for Jenkins to use for this project.

Log out, log back in as yourself, and add Jenkins as a collaborator to the repo, with Write access.

This is the only permission you need for public repositories. If you plan to lock down your Gitea organization later, then you will also need to give this Jenkins account Read access at the organization level.

Set up a project in Jenkins

Add a new Gitea Organization Jenkins job.

Enter the name of the organization, and the account to log in with.

Add the details for the new account, and make sure it’s selected.

The other options don’t need to be changed at this stage.

When you press ‘save’, Jenkins will immediately attempt to find any repositories in the Gitea organization, and kick off any builds. Unless everything is correct, this is unlikely to work the first time, so pay attention to error logs.

These three places will show what’s happening:

  • Scan Gitea Organization Log, which lists repositories in the organization.
  • Scan Multibranch Pipeline Log for each repository, which shows the discovery of branches.
  • Console output for each build, which will show errors if the build status could not be submitted.

Problems which I’ve found here include:

  • The URL of Gitea in the Jenkins configuration must match the URL to Gitea in its own configuration.
  • The Jenkins user account must have permission to list repositories, clone, and update statuses.
  • Empty repositories, and repositories without a ‘Jenkinsfile’ are ignored.

For that last step, here is an empty Jenkinsfile that you can put in your repository to test this integration:

pipeline {
    agent any

    stages {
        stage('Do nothing') {
            steps {
                sh '/bin/true'
            }
        }
    }
}

Once this is sorted out, you can expect to see your repository in Jenkins.

Every branch with a Jenkinsfile will appear.

And each time a commit is mentioned in Gitea, it will display a small icon to indicate the build status.

Set up a web hook in Gitea

At this point, builds need to be manually triggered. To trigger them each time the repository changes, we need to get a notification out to Jenkins.

Under the repository settings, click WebhooksAdd webhookGitea.

The correct values to use are:

  • URL: http://[ your jenkins server ]/gitea-webhook/post
  • POST Content Type: application/json

Once you press Add Webhook, the path will appear with a small grey dot, indicating that it hasn’t been run before.

If you click edit, then the Test Delivery button can be used to check that it’s working.

The icon indicates the status. If things aren’t working correctly, then click the delivery UUID to expand the full request information, which should help with debugging.

Final result

With Jenkins and Gitea, you have a simple self-hosted a continuous integration environment.

In Gitea, you can store, update and review your code. Any build and test steps in a Jenkinsfile will be run automatically each time the repository changes.

The detailed output for each build is visible in Jenkins, where you can track build results with a variety of plugins.

Handling I/O errors in PHP

This blog post is all about how to handle errors from the PHP file_get_contents function, and others which work like it.

The file_get_contents function will read the contents of a file into a string. For example:

<?php
$text = file_get_contents("hello.txt");
echo $text;

You can try this out on the command-line like so:

$ echo "hello" > hello.txt
$ php test.php 
hello

This function is widely used, but I’ve observed that error handling around it is often not quite right. I’ve fixed a few bugs involving incorrect I/O error handling recently, so here are my thoughts on how it should be done.

How file_get_contents fails

For legacy reasons, this function does not throw an exception when something goes wrong. Instead, it will both log a warning, and return false.

<?php
$filename = "not-a-real-file.txt";
$text = file_get_contents($filename);
echo $text;

Which looks like this when you run it:

$ php test.php 
PHP Warning:  file_get_contents(not-a-real-file.txt): failed to open stream: No such file or directory in test.php on line 3
PHP Stack trace:
PHP   1. {main}() test.php:0
PHP   2. file_get_contents() test.php:3

Warnings are not very useful on their own, because the code will continue on without the correct data.

Error handling in four steps

If anything goes wrong when you are reading a file, your code should be throwing some type of Exception which describes the problem. This allows developers to put a try {} catch {} around it, and avoids nasty surprises where invalid data is used later.

Step 1: Detect that the file was not read

Any call to file_get_contents should be immediately followed by a check for that false return value. This is how you know that there is a problem.

<?php
$filename = "not-a-real-file.txt";
$text = file_get_contents($filename);
if($text === false) {
  throw new Exception("File was not loaded");
}
echo $text;

This now gives both a warning and an uncaught exception:

$ php test.php 
PHP Warning:  file_get_contents(not-a-real-file.txt): failed to open stream: No such file or directory in test.php on line 3
PHP Stack trace:
PHP   1. {main}() test.php:0
PHP   2. file_get_contents() test.php:3
PHP Fatal error:  Uncaught Exception: File was not loaded in test.php:5
Stack trace:
#0 {main}
  thrown in test.php on line 5

Step 2: Suppress the warning

Warnings are usually harmless, but there are several good reasons to suppress them:

  • It ensures that you are not depending on a global error handler (or the absence of one) for correct behaviour.
  • The warning might appear in the middle of the output, depending on php.ini.
  • Warnings can produce a lot of noise in the logs

Use @ to silence any warnings from a function call.

<?php
$filename = "not-a-real-file.txt";
$text = @file_get_contents($filename);
if($text === false) {
  throw new Exception("File was not loaded");
}
echo $text;

The output is now only the uncaught Exception:

$ php test.php 
PHP Fatal error:  Uncaught Exception: File was not loaded in test.php:5
Stack trace:
#0 {main}
  thrown in test.php on line 5

Step 3: Get the reason for the failure

Unfortunately, we lost the “No such file or directory” message, which is pretty important information, which should go in the Exception. This information is retrieved from the old-style error_get_last method.

This function might just return empty data, so you should check that everything is set and non-empty before you try to use it.

<?php
$filename = "not-a-real-file.txt";
error_clear_last();
$text = @file_get_contents($filename);
if($text === false) {
  $e = error_get_last();
  $error = (isset($e) && isset($e['message']) && $e['message'] != "") ?
      $e['message'] : "Check that the file exists and can be read.";
  throw new Exception("File '$filename' was not loaded. $error");
}
echo $text;

This now embeds the failure reason directly in the message.

$ php test.php 
PHP Fatal error:  Uncaught Exception: File 'not-a-real-file.txt' was not loaded. file_get_contents(not-a-real-file.txt): failed to open stream: No such file or directory in test.php:9
Stack trace:
#0 {main}
  thrown in test.php on line 9

Step 4: Add a fallback

The last time I introduced error_clear_last()/get_last_error() into a code-base, I learned out that HHVM does not have these functions.

Call to undefined function error_clear_last()

The fix for this is to write some wrapper code, to verify that each function exists.

<?php
$filename = 'not-a-real-file';
clearLastError();
$text = @file_get_contents($filename);
if ($text === false) {
    $error = getLastErrorOrDefault("Check that the file exists and can be read.");
    throw new \Exception("Could not retrieve image data from '$filename'. $error");
}
echo $text;

/**
 * Call error_clear_last() if it exists. This is dependent on which PHP runtime is used.
 */
function clearLastError()
{
    if (function_exists('error_clear_last')) {
        error_clear_last();
    }
}
/**
 * Retrieve the message from error_get_last() if possible. This is very useful for debugging, but it will not
 * always exist or return anything useful.
 */
function getLastErrorOrDefault(string $default)
{
    if (function_exists('error_clear_last')) {
        $e = error_get_last();
        if (isset($e) && isset($e['message']) && $e['message'] != "") {
            return $e['message'];
        }
    }
    return $default;
}

This does the same thing as before, but without breaking other PHP runtimes.

$ php test.php 
PHP Fatal error:  Uncaught Exception: Could not retrieve image data from 'not-a-real-file'. file_get_contents(not-a-real-file): failed to open stream: No such file or directory in test.php:7
Stack trace:
#0 {main}
  thrown in test.php on line 7

Since HHVM is dropping support for PHP, I expect that this last step will soon become unnecessary.

How not to handle errors

Some applications put a series of checks before each I/O operation, and then simply perform the operation with no error checking. An example of this would be:

<?php
$filename = "not-a-real-file.txt";
// Check everything!
if(!file_exists($filename)) {
  throw new Exception("$filename does not exist");
}
if(!is_file($filename)) {
  throw new Exception("$filename is not a file");
}
if(!is_readable($filename)) {
  throw new Exception("$filename cannot be read");
}
// Assume that nothing can possibly go wrong..
$text = @file_get_contents($filename);
echo $text;

You could probably make a reasonable-sounding argument that checks are a good idea, but I consider them to be misguided:

  • If you skip any actual error handling, then your code is going to fail in more surprising ways when you encounter an I/O problem that could not be detected.
  • If you do perform correct error handling as well, then the extra checks add nothing other than more branches to test.

Lastly, beware of false positives. For example, the above snippet will reject HTTP URL’s, which are perfectly valid for file_get_contents.

Conclusion

Most PHP code now uses try/catch/finally blocks to handle problems, but the ecosystem really values backwards compatibility, so existing functions are rarely changed.

The style of error reporting used in these I/O functions is by now a legacy quirk, and should be wrapped to consistently throw a useful Exception.

How to export a Maildir email inbox as a CSV file

I’ve often thought it would be useful to have an “Export as CSV” button on my inbox, and recently had an excuse to implement one.

I needed to parse the Subject of each email coming from an automated process, but the inbox was in the Maildir format, and I needed something more useful for data interchange.

So I wrote a utility, mail2csv, to export the Maildir as CSV file, which many existing tools can work with. It produces one row per email, and one column per email header:

The basic usage is:

$ mail2csv example/
Date,Subject,From
"Wed, 16 May 2018 20:05:16 +0000",An email,Bob <bob@example.com>
"Wed, 16 May 2018 20:07:52 +0000",Also an email,Alice <alice@example.com>

You can select any header with the --headers field, which accepts globs:

$ mail2csv example/ --headers Message-ID Date Subject
Message-ID,Date,Subject
<89125@example.com>,"Wed, 16 May 2018 20:05:16 +0000",An email
<180c6@example.com>,"Wed, 16 May 2018 20:07:52 +0000",Also an email

If you’re not sure which headers your emails have, then you can make a CSV with all the headers, and just read the first line:

$ mail2csv  example/ --headers '*' | head -n1

You can find the Python code for this tool on GitHub here.

Make better CLI progress bars with Unicode block characters

As a programmer, you might add a progress bar so that the user has feedback while they wait for a slow task.

If you are writing a console (CLI) application, then you need to make your progress bars from text. A good command-line progress bar should update in small increments, like this example:

This uses Unicode Block Elements to give the progress bar a higher resolution.

Code Character
U+2588
U+2589
U+258A
U+258B
U+258C
U+258D
U+258E
U+258F

A lot of applications use plain ASCII in their progress bars. The progress bar in wget, for example, uses ===> characters only, like this:

Progress bars made from ASCII characters like = and # signs are very common, most likely because of the historical portability issues around non-ASCII text. Nowadays, UTF-8 support is ubiquitous, and it’s pointless to adhere to such limitations.

Example: Better progress bars in python

The animation at the top of this blog post is a simple python script.

import sys
import time

from progress_bar import ProgressBar

"""
Example usage of ProgressBar class
"""
print("Doing work\n")

with ProgressBar(sys.stdout) as progress:
    for i in range(0,800):
        progress.update(i / 800)
        time.sleep(0.05)
print("\nDone.\n");

The script progress_bars.py, written for Python 3, contains a class that allows the progress bar to be created and drawn on different types of terminals.

import abc
import math
import shutil
import sys
import time

from typing import TextIO

"""
Produce progress bar with ANSI code output.
"""
class ProgressBar(object):
    def __init__(self, target: TextIO = sys.stdout):
        self._target = target
        self._text_only = not self._target.isatty()
        self._update_width()

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        if(exc_type is None):
            # Set to 100% for neatness, if no exception is thrown
            self.update(1.0)
        if not self._text_only:
            # ANSI-output should be rounded off with a newline
            self._target.write('\n')
        self._target.flush()
        pass

    def _update_width(self):
        self._width, _ = shutil.get_terminal_size((80, 20))

    def update(self, progress : float):
        # Update width in case of resize
        self._update_width()
        # Progress bar itself
        if self._width < 12:
            # No label in excessively small terminal
            percent_str = ''
            progress_bar_str = ProgressBar.progress_bar_str(progress, self._width - 2)
        elif self._width < 40:
            # No padding at smaller size
            percent_str = "{:6.2f} %".format(progress * 100)
            progress_bar_str = ProgressBar.progress_bar_str(progress, self._width - 11) + ' '
        else:
            # Standard progress bar with padding and label
            percent_str = "{:6.2f} %".format(progress * 100) + "  "
            progress_bar_str = " " * 5 + ProgressBar.progress_bar_str(progress, self._width - 21)
        # Write output
        if self._text_only:
            self._target.write(progress_bar_str + percent_str + '\n')
            self._target.flush()
        else:
            self._target.write('\033[G' + progress_bar_str + percent_str)
            self._target.flush()

    @staticmethod
    def progress_bar_str(progress : float, width : int):
        # 0 <= progress <= 1
        progress = min(1, max(0, progress))
        whole_width = math.floor(progress * width)
        remainder_width = (progress * width) % 1
        part_width = math.floor(remainder_width * 8)
        part_char = [" ", "▏", "▎", "▍", "▌", "▋", "▊", "▉"][part_width]
        if (width - whole_width - 1) < 0:
          part_char = ""
        line = "[" + "█" * whole_width + part_char + " " * (width - whole_width - 1) + "]"
        return line

Aside from the use of box drawing characters, this script includes a few other things which a good progress bar should implement:

  • Resize the progress bar when you resize the terminal
  • Simplify the progress bar on very small terminals
  • Don’t print ANSI terminal codes if the script is not connected to a terminal
  • Round off to 100% once the use of the progress bar completes without error

After writing this, I discovered that the progress pypi package can also use these box characters, so I haven’t packaged this code up. I haven’t used progress before, but you might like to evaluate it for your own applications.

How to create effective PHP project documentation with Read the Docs

Documentation is one of the ways that software projects can communicate information to their users. Effective documentation is high-quality, meaning that it’s complete, accurate, and up-to-date. At least for open source libraries, it also means that you can find it with a search engine. For many small PHP projects, the reality is very far removed from the ideal.

Read the Docs (readthedocs.io) makes it easy to host an up-to-date copy of your project’s documentation online. There are around 2,000 PHP projects which host their documentation on the site, which makes PHP the third most popular programming language for projects on the site.

This post covers the process that is used to automatically publish the documentation for the gfx-php PHP graphics library, which is one of those 2000 projects. You should consider using this setup as a template if you your project is small enough that it does not have its own infrastructure.

Basic concept

Typically, people are using Read the Docs with a tool called Sphinx. If you are writing in Python, it’s also possible to use the autodoc Sphinx plugin to add API documentation, based on docstrings in the code.

PHP programmers are already spoiled for choice if they want to produce HTML documentation from their code. These tools all have huge PHP user bases:

These will each output their own HTML, which is only useful if you want to self-host the documentation. I wanted a tool that was more like “autodoc for PHP”, so that I can have my API docs magically appear in Sphinx output that is hosted on Read the Docs.

Doxygen is the most useful tool for this purpose, because it has a stable XML output format and good-enough PHP support. I decided to write a tool which to take the Doxygen XML info and generate rest for Sphinx:

This introduces some extra tools, which looks complex at first. The stack is geared towards running within the Read the Docs build environment, so most developers can treat it as a black box after the initial setup:

This setup is entirely hosted with free cloud services, so you don’t need to run any applications on your own hardware.

Tools to install on local workstation

First, we will set up each of these tools locally, so that we know everything is working before we upload it.

  • Doxygen
  • Sphinx
  • doxyphp2sphinx

Doxygen

Doxygen can read PHP files to extract class names, documentation and method signatures. Linux and Mac install this from most package managers (apt-get, dnf or brew) under the name doxygen, while Windows users need to chase down binaries.

In your repo, make a sub-folder called docs/, and create a Doxyfile with some defaults:

mkdir docs/
doxygen -g Doxyfile

You need to edit the configuration to make it suitable or generating XML output for your PHP project. The version of Doxygen used here is 1.8.13, where you only need to change a few values to set Doxygen to:

  • Recursively search PHP files in your project’s source folder
  • Generate XML and HTML only
  • Log warnings to a file

For a typical project, these settings are:

PROJECT_NAME           = "Example Project"
INPUT                  = ../src
WARN_LOGFILE           = warnings.log
RECURSIVE              = YES
USE_MDFILE_AS_MAINPAGE = ../README.md
GENERATE_LATEX         = NO
GENERATE_XML           = YES

Once you set these in Doxyfile, you can run Doxygen to generate HTML and XML output.

$ doxygen

Doxygen will pick up most method signatures automatically, and you can add to them via docblocks, which work along the same lines as docstrings in Python. Read Doxygen: Documenting the Code to learn the syntax if you have not used a documentation generator in a curly-bracket language before.

The Doxygen HTML will never be published, but you might need to read it to see how well Doxygen understands your code.

The XML output is much more useful for our purposes. It contains the same information, and we will read it to generate pages of documentation for Sphinx to render.

Sphinx

Sphinx is the tool that we will use to render the final HTML output. If you haven’t used it before, then see the official Getting Started page.

We are using Sphinx because it can be executed by online services like Read the Docs. It uses the reStructuredText format, which is a whole lot more complex than Markdown, but supports cross-references. I’ll only describe these steps briefly, because there are existing how-to guides on making Sphinx work for manually-written PHP documentation elsewhere on the Internet, such as:

Still in the docs folder with your Doxyfile, create and render an empty Sphinx project.

pip install sphinx
sphinx-quickstart --quiet --project example_project --author example_bob
make html

The generated HTML will initially appear like this:

We need to customize this in a way that adds PHP support. The quickest way is to drop this text into requirements.txt:

Sphinx==1.7.4
sphinx-rtd-theme==0.3.0
sphinxcontrib-phpdomain==0.4.1
doxyphp2sphinx>=1.0.1

Then these two sections of config.py:

extensions = [
  "sphinxcontrib.phpdomain"
]
html_theme = 'sphinx_rtd_theme'

Add this to the end of config.py

# PHP Syntax
from sphinx.highlighting import lexers
from pygments.lexers.web import PhpLexer
lexers["php"] = PhpLexer(startinline=True, linenos=1)
lexers["php-annotations"] = PhpLexer(startinline=True, linenos=1)

# Set domain
primary_domain = "php"

And drop this contents in _templates/breadcrumbs.html (explanation)

{%- extends "sphinx_rtd_theme/breadcrumbs.html" %}

{% block breadcrumbs_aside %}
{% endblock %}

Then finally re-install dependencies and re-build:

pip install -r requirements.txt
make html

The HTML output under _build will now appear as:

This setup gives us three things:

  • The documentation looks the same as Read the Docs.
  • We can use PHP snippets and class documentation.
  • There are no ‘Edit’ links, which is important because some of the files will be generated in the next steps.

doxyphp2sphinx

The doxyphp2sphinx tool will generate .rst files from the Doxygen XML files. This was installed from your requirements.txt in the last step, but you can also install it standalone via pip:

pip install doxyphp2sphinx

The only thing you need to specify is the name of the namespace that you are documenting, using :: as a separator.

doxyphp2sphinx FooCorp::Example

This command will read the xml/ subdirectory, and will create api.rst. It will fill the api/ directory with documentation for each class in the \FooCorp\Example namespace.

To verify that this has worked, check your class structure:

$ tree ../src
../src
├── Dooverwhacky.php
└── Widget.php

You should have documentation for each of these:

$ tree xml/ -P 'class*'
xml/
├── classFooCorp_1_1Example_1_1Dooverwhacky.xml
└── classFooCorp_1_1Example_1_1Widget.xml

And if you have the correct namespace name, you will have .rst files for each as well:

$ tree api
api
├── dooverwhacky.rst
└── widget.rst

Now, add a reference to api.rst somewhere in index.rst:

.. toctree::
   :maxdepth: 2
   :caption: API Documentation

   Classes <api.rst>

And re-compile:

make html

You can now navigate to your classes in the HTML documentation.

The quality of the generated class documentation can be improved by adding docstrings. An example of the generated documentation for a method is:

Writing documentation

As you add pages to your documentation, you can include PHP snippets and reference the usage of your classes.

.. code-block:: php

   <?php
   echo "Hello World"

Lorem ipsum dolor sit :class:`Dooverwhacky`, foo bar baz :meth:`Widget::getFeatureCount`.

This will create syntax highlighting for your examples and inline links to the generated API docs.

Beyond this, you will need to learn some reStructuredText. I found this reference to be useful.

Local docs build

A full build has these dependencies:

apt-get install doxygen make python-pip
pip install -r docs/requirements.txt

And these steps:

cd docs/
doxygen 
doxyphp2sphinx FooCorp::Example
make html

Cloud tools

Next, we will take this local build, and run it on a cloud setup instead, using these services.

  • GitHub
  • Read the docs

GitHub

I will assume that you know how to use Git and GitHub, and that you are creating code that is intended for re-use.

Add these files to your .gitignore:

docs/_build/
docs/warnings.log
docs/xml/
docs/html/

Upload the remainder of your repository to GitHub. The gfx-php project is an example of a working project with all of the correct files included.

To have the initial two build steps execute on Read the Docs, add this to the end of docs/conf.py. Don’t forget to update the namespace to match the command you were running locally.

# Regenerate API docs via doxygen + doxyphp2sphinx
import subprocess, os
read_the_docs_build = os.environ.get('READTHEDOCS', None) == 'True'
if read_the_docs_build:
    subprocess.call(['doxygen', 'Doxyfile'])
    subprocess.call(['doxyphp2sphinx', 'FooCorp::Example'])

Read the Docs

After all this setup, Read the Docs should be able to build the project. Create the project on Read the Docs by using the Import from GitHub option. There are only two settings which need to be set:

The requirements file location must be docs/requirements.txt:

And the programming language should be PHP.

After this, you can go ahead and run a build.

As a last step, you will want to ensure that you have a Webhook set up on GitHub to trigger the builds automatically in future.

Conclusion

It is emerging as a best practice for small libraries to host their documentation with Read the Docs, but this is not yet common in the PHP world. Larger PHP projects tend to self-host documentation on the project website, but smaller projects will often have no hosted API documentation.

Once you write your docs, publishing them should be easy! Hopefully this provides a good example of what’s possible.

Acknowledgements

Credit where credit is due: The Breathe project fills this niche for C++ developers using Doxygen, and has been around for some time. Breathe is in the early stages of adding PHP support, but is not yet ready at the time of writing.