Make better CLI progress bars with Unicode block characters

As a programmer, you might add a progress bar so that the user has feedback while they wait for a slow task.

If you are writing a console (CLI) application, then you need to make your progress bars from text. A good command-line progress bar should update in small increments, like this example:

This uses Unicode Block Elements to give the progress bar a higher resolution.

Code Character
U+2588
U+2589
U+258A
U+258B
U+258C
U+258D
U+258E
U+258F

A lot of applications use plain ASCII in their progress bars. The progress bar in wget, for example, uses ===> characters only, like this:

Progress bars made from ASCII characters like = and # signs are very common, most likely because of the historical portability issues around non-ASCII text. Nowadays, UTF-8 support is ubiquitous, and it’s pointless to adhere to such limitations.

Example: Better progress bars in python

The animation at the top of this blog post is a simple python script.

import sys
import time

from progress_bar import ProgressBar

"""
Example usage of ProgressBar class
"""
print("Doing work\n")

with ProgressBar(sys.stdout) as progress:
    for i in range(0,800):
        progress.update(i / 800)
        time.sleep(0.05)
print("\nDone.\n");

The script progress_bars.py, written for Python 3, contains a class that allows the progress bar to be created and drawn on different types of terminals.

import abc
import math
import shutil
import sys
import time

from typing import TextIO

"""
Produce progress bar with ANSI code output.
"""
class ProgressBar(object):
    def __init__(self, target: TextIO = sys.stdout):
        self._target = target
        self._text_only = not self._target.isatty()
        self._update_width()

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        if(exc_type is None):
            # Set to 100% for neatness, if no exception is thrown
            self.update(1.0)
        if not self._text_only:
            # ANSI-output should be rounded off with a newline
            self._target.write('\n')
        self._target.flush()
        pass

    def _update_width(self):
        self._width, _ = shutil.get_terminal_size((80, 20))

    def update(self, progress : float):
        # Update width in case of resize
        self._update_width()
        # Progress bar itself
        if self._width < 12:
            # No label in excessively small terminal
            percent_str = ''
            progress_bar_str = ProgressBar.progress_bar_str(progress, self._width - 2)
        elif self._width < 40:
            # No padding at smaller size
            percent_str = "{:6.2f} %".format(progress * 100)
            progress_bar_str = ProgressBar.progress_bar_str(progress, self._width - 11) + ' '
        else:
            # Standard progress bar with padding and label
            percent_str = "{:6.2f} %".format(progress * 100) + "  "
            progress_bar_str = " " * 5 + ProgressBar.progress_bar_str(progress, self._width - 21)
        # Write output
        if self._text_only:
            self._target.write(progress_bar_str + percent_str + '\n')
            self._target.flush()
        else:
            self._target.write('\033[G' + progress_bar_str + percent_str)
            self._target.flush()

    @staticmethod
    def progress_bar_str(progress : float, width : int):
        # 0 <= progress <= 1
        progress = min(1, max(0, progress))
        whole_width = math.floor(progress * width)
        remainder_width = (progress * width) % 1
        part_width = math.floor(remainder_width * 8)
        part_char = [" ", "▏", "▎", "▍", "▌", "▋", "▊", "▉"][part_width]
        if (width - whole_width - 1) < 0:
          part_char = ""
        line = "[" + "█" * whole_width + part_char + " " * (width - whole_width - 1) + "]"
        return line

Aside from the use of box drawing characters, this script includes a few other things which a good progress bar should implement:

  • Resize the progress bar when you resize the terminal
  • Simplify the progress bar on very small terminals
  • Don’t print ANSI terminal codes if the script is not connected to a terminal
  • Round off to 100% once the use of the progress bar completes without error

After writing this, I discovered that the progress pypi package can also use these box characters, so I haven’t packaged this code up. I haven’t used progress before, but you might like to evaluate it for your own applications.

How to create effective PHP project documentation with Read the Docs

Documentation is one of the ways that software projects can communicate information to their users. Effective documentation is high-quality, meaning that it’s complete, accurate, and up-to-date. At least for open source libraries, it also means that you can find it with a search engine. For many small PHP projects, the reality is very far removed from the ideal.

Read the Docs (readthedocs.io) makes it easy to host an up-to-date copy of your project’s documentation online. There are around 2,000 PHP projects which host their documentation on the site, which makes PHP the third most popular programming language for projects on the site.

This post covers the process that is used to automatically publish the documentation for the gfx-php PHP graphics library, which is one of those 2000 projects. You should consider using this setup as a template if you your project is small enough that it does not have its own infrastructure.

Basic concept

Typically, people are using Read the Docs with a tool called Sphinx. If you are writing in Python, it’s also possible to use the autodoc Sphinx plugin to add API documentation, based on docstrings in the code.

PHP programmers are already spoiled for choice if they want to produce HTML documentation from their code. These tools all have huge PHP user bases:

These will each output their own HTML, which is only useful if you want to self-host the documentation. I wanted a tool that was more like “autodoc for PHP”, so that I can have my API docs magically appear in Sphinx output that is hosted on Read the Docs.

Doxygen is the most useful tool for this purpose, because it has a stable XML output format and good-enough PHP support. I decided to write a tool which to take the Doxygen XML info and generate rest for Sphinx:

This introduces some extra tools, which looks complex at first. The stack is geared towards running within the Read the Docs build environment, so most developers can treat it as a black box after the initial setup:

This setup is entirely hosted with free cloud services, so you don’t need to run any applications on your own hardware.

Tools to install on local workstation

First, we will set up each of these tools locally, so that we know everything is working before we upload it.

  • Doxygen
  • Sphinx
  • doxyphp2sphinx

Doxygen

Doxygen can read PHP files to extract class names, documentation and method signatures. Linux and Mac install this from most package managers (apt-get, dnf or brew) under the name doxygen, while Windows users need to chase down binaries.

In your repo, make a sub-folder called docs/, and create a Doxyfile with some defaults:

mkdir docs/
doxygen -g Doxyfile

You need to edit the configuration to make it suitable or generating XML output for your PHP project. The version of Doxygen used here is 1.8.13, where you only need to change a few values to set Doxygen to:

  • Recursively search PHP files in your project’s source folder
  • Generate XML and HTML only
  • Log warnings to a file

For a typical project, these settings are:

PROJECT_NAME           = "Example Project"
INPUT                  = ../src
WARN_LOGFILE           = warnings.log
RECURSIVE              = YES
USE_MDFILE_AS_MAINPAGE = ../README.md
GENERATE_LATEX         = NO
GENERATE_XML           = YES

Once you set these in Doxyfile, you can run Doxygen to generate HTML and XML output.

$ doxygen

Doxygen will pick up most method signatures automatically, and you can add to them via docblocks, which work along the same lines as docstrings in Python. Read Doxygen: Documenting the Code to learn the syntax if you have not used a documentation generator in a curly-bracket language before.

The Doxygen HTML will never be published, but you might need to read see how well Doxygen understands your code.

The XML output is much more useful for our purposes. It contains the same information, and we will read it to generate pages of documentation for Sphinx to render.

Sphinx

Sphinx is the tool that we will use to render the final HTML output. If you haven’t used it before, then see the official Getting Started page.

We are using Sphinx because it can be executed by online services like Read the Docs. It uses the reStructuredText format, which is a whole lot more complex than Markdown, but supports cross-references. I’ll only describe these steps briefly, because there are existing how-to guides on making Sphinx work for manually-written PHP documentation elsewhere on the Internet, such as:

Still in the docs folder with your Doxyfile, create and render an empty Sphinx project.

pip install sphinx
sphinx-quickstart --quiet --project example_project --author example_bob
make html

The generated HTML will initially appear like this:

We need to customize this in a way that adds PHP support. The quickest way is to drop this text into requirements.txt:

Sphinx==1.7.4
sphinx-rtd-theme==0.3.0
sphinxcontrib-phpdomain==0.4.1
doxyphp2sphinx>=1.0.1

Then these two sections of config.py:

extensions = [
  "sphinxcontrib.phpdomain"
]
html_theme = 'sphinx_rtd_theme'

Add this to the end of config.py

# PHP Syntax
from sphinx.highlighting import lexers
from pygments.lexers.web import PhpLexer
lexers["php"] = PhpLexer(startinline=True, linenos=1)
lexers["php-annotations"] = PhpLexer(startinline=True, linenos=1)

# Set domain
primary_domain = "php"

And drop this contents in _templates/breadcrumbs.html (explanation)

{%- extends "sphinx_rtd_theme/breadcrumbs.html" %}

{% block breadcrumbs_aside %}
{% endblock %}

Then finally re-install dependencies and re-build:

pip install -r requirements.txt
make html

The HTML output under _build will now appear as:

This setup gives us three things:

  • The documentation looks the same as Read the Docs.
  • We can use PHP snippets and class documentation.
  • There are no ‘Edit’ links, which is important because some of the files will be generated in the next steps.

doxyphp2sphinx

The doxyphp2sphinx tool will generate .rst files from the Doxygen XML files. This was installed from your requirements.txt in the last step, but you can also install it standalone via pip:

pip install doxyphp2sphinx

The only thing you need to specify is the name of the namespace that you are documenting, using :: as a separator.

doxyphp2sphinx FooCorp::Example

This command will read the xml/ subdirectory, and will create api.rst. It will fill the api/ directory with documentation for each class in the \FooCorp\Example namespace.

To verify that this has worked, check your class structure:

$ tree ../src
../src
├── Dooverwhacky.php
└── Widget.php

You should have documentation for each of these:

$ tree xml/ -P 'class*'
xml/
├── classFooCorp_1_1Example_1_1Dooverwhacky.xml
└── classFooCorp_1_1Example_1_1Widget.xml

And if you have the correct namespace name, you will have .rst files for each as well:

$ tree api
api
├── dooverwhacky.rst
└── widget.rst

Now, add a reference to api.rst somewhere in index.rst:

.. toctree::
   :maxdepth: 2
   :caption: API Documentation

   Classes <api.rst>

And re-compile:

make html

You can now navigate to your classes in the HTML documentation.

The quality of the generated class documentation can be improved by adding docstrings. An example of the generated documentation for a method is:

Writing documentation

As you add pages to your documentation, you can include PHP snippets and reference the usage of your classes.

.. code-block:: php

   <?php
   echo "Hello World"

Lorem ipsum dolor sit :class:`Dooverwhacky`, foo bar baz :meth:`Widget::getFeatureCount`.

This will create syntax highlighting for your examples and inline links to the generated API docs.

Beyond this, you will need to learn some reStructuredText. I found this reference to be useful.

Local docs build

A full build has these dependencies:

apt-get install doxygen make python-pip
pip install -r docs/requirements.txt

And these steps:

cd docs/
doxygen 
doxyphp2sphinx FooCorp::Example
make html

Cloud tools

Next, we will take this local build, and run it on a cloud setup instead, using these services.

  • GitHub
  • Read the docs

GitHub

I will assume that you know how to use Git and GitHub, and that you are creating code that is intended for re-use.

Add these files to your .gitignore:

docs/_build/
docs/warnings.log
docs/xml/
docs/html/

Upload the remainder of your repository to GitHub. The gfx-php project is an example of a working project with all of the correct files included.

To have the initial two build steps execute on Read the Docs, add this to the end of docs/conf.py. Don’t forget to update the namespace to match the command you were running locally.

# Regenerate API docs via doxygen + doxyphp2sphinx
import subprocess, os
read_the_docs_build = os.environ.get('READTHEDOCS', None) == 'True'
if read_the_docs_build:
    subprocess.call(['doxygen', 'Doxyfile'])
    subprocess.call(['doxyphp2sphinx', 'FooCorp::Example'])

Read the Docs

After all this setup, Read the Docs should be able to build the project. Create the project on Read the Docs by using the Import from GitHub option. There are only two settings which need to be set:

The requirements file location must be docs/requirements.txt:

And the programming language should be PHP.

After this, you can go ahead and run a build.

As a last step, you will want to ensure that you have a Webhook set up on GitHub to trigger the builds automatically in future.

Conclusion

It is emerging as a best practice for small libraries to host their documentation with Read the Docs, but this is not yet common in the PHP world. Larger PHP projects tend to self-host documentation on the project website, but smaller projects will often have no hosted API documentation.

Once you write your docs, publishing them should be easy! Hopefully this provides a good example of what’s possible.

Acknowledgements

Credit where credit is due: The Breathe project fills this niche for C++ developers using Doxygen, and has been around for some time. Breathe is in the early stages of adding PHP support, but is not yet ready at the time of writing.

How to get a notification when your site appears on HackerNews

A few weeks ago, somebody listed this article about GNU parallel to HackerNews, and I got a small wave of new visitors trawling my blog.

I don’t actively monitor referrers to this site, so I was oblivious to this until a few days afterward. Aware of the Slashdot Effect, I thought I should set up some free tools to remind me to log in and check the site’s health if it happens again:

Hacker News RSS feeds

Hacker news does not publish it’s own RSS feeds, so I had to use a third-party service. I found a URL that would feed me the feed to the latest articles off this site, by searching the “url” attribute:

https://hnrss.org/newest?q=mike42.me&search_attrs=url

This URL gives an RSS feed, as you might expect:

IFTTT

To save me installing and checking a local reader, I set up IFTTT to send me an email when new articles are published to this feed.

The “RSS feed to email” applet is perfect for this kind of consumer-grade automation.

I set it up with the URL, and well, nothing interesting happens. Only new articles are emailed, so this is expected.

Example email

Since I also use this IFTTT applet to get notifications for other RSS feeds, I do know that it works. Within an hour or two of a new article appearing in the feed, the applet gives you an email from RSS Feed via IFTTT <action@ifttt.com>:

It’s not exactly a real-time notification, but it’s a good start. At this point, I know when my posts are being linked from a specific high-traffic site, which is a good start.

For any site bigger than a personal blog, you might be interested in handling extra traffic rather than just be vaguely aware of it, but I’ll save that for a future post.

Quick statistics about documentation hosted on Read the Docs

Many open source projects host their online documentation with Read The Docs. Since I’m currently developing a tool for Read The Docs users, I recently took a look at the types of projects that are hosted there.

The data in this post is derived from the list of projects available in the public Read The Docs API, and has its limitations. In particular, spam, inactive, or abandoned projects are not filtered.

Totals

This data is current as of May 20, 2018. At that time, there were 90,129 total projects on readthedocs.io.

Starting out, I had assumed that the majority of projects had all four of these attributes:

  • Hosted in Git
  • Programmed in Python
  • Documented in English
  • Documentation generated with Sphinx

As it turned out, this particular combination is only used on 35.8% (32,218) of projects, so let’s take a look at how each of these vary.

Programming languages

The two main conclusions I drew by looking at the programming languages are:

  • Python is the largest developer community that is using the Read the Docs platform
  • A lot of projects are hosting documentation that is not tagged with a programming language
# % Projects Programming language
1 39.92% 35978 Not code (“just words”)
2 39.27% 35391 Python
3 9.27% 8354 (No language listed)
4 2.83% 2551 Javascript
5 2.29% 2060 PHP
6 1.15% 1040 C++
7 1.13% 1018 Java
8 0.85% 769 Other
9 0.77% 690 C#
10 0.71% 640 C
11 0.42% 380 CSS
12 0.34% 304 Go
13 0.31% 283 Ruby
14 0.31% 277 Julia
15 0.11% 96 Perl
16 0.06% 55 Objective C
17 0.06% 54 R
18 0.04% 36 TypeScript
19 0.03% 31 Scala
20 0.03% 30 Haskell
21 0.03% 29 Lua
22 0.02% 22 Swift
23 0.02% 15 CoffeeScript
24 0.02% 14 Visual Basic
25 0.01% 12 Groovy

Documentation languages

You might have guessed that English dominates in software documentation, but here is the data:

# % Projects Language
1 93.1% 83888 English (en)
2 2.4% 2178 Indonesian (id)
3 1.4% 1287 Chinese (zh, zh_TW, zh_CN)
4 0.5% 456 Russian (ru)
5 0.5% 425 Spanish (es)
6 0.5% 412 French (fr)
7 0.3% 254 Portuguese (pt)
8 0.3% 233 German (de)
9 0.2% 209 Italian (it)
10 0.2% 208 Japanese (jp)
0.6% 579 All other languages

In total, documentation on Read the Docs has been published in 54 languages. The “All other languages” item here represents 44 other languages.

Documentation build tool

The majority are using the Sphinx documentation generator. The table here counts sphinx, sphinx_htmldir and sphinx_singlehtml together.

# % Projects Language
1 92.5% 83364 Sphinx
2 6.9% 6229 MkDocs
3 0.6% 536 Auto-select

Version control

Git is the version control system used for the vast majority of projects.

# % Projects Version control
1 96.5% 86988 Git
2 2.1% 1879 Mercurial
3 0.9% 816 SVN
4 0.5% 446 Bazaar

Recovering text from a receipt with escpos-tools

I have written previously about how to generate receipts for printers which understand ESC/POS. Today, I thought I would write about the opposite process.

Unlike PostScript, the ESC/POS binary language is not commonly understood by software. I wrote a few utilities last year to help change that, called escpos-tools.

Today, I’ll step through an example ESC/POS binary file that an escpos-tools user sent to me, and show how we can turn it back into a usable format. The tools we are using are:

You might need this sort of process if you need to email a copy of your receipts, or to archive them for audit use.

Printing the file

Binary print files are generated from drivers. I can feed this one back to my printer like this:

cat receipt.bin > /dev/usb/lp0

My Epson TM-T20 receipt printer understands ESC/POS, and prints this out:

Installing escpos-tools

escpos-tools is not packaged yet, so you need git and composer (from the PHP eco-system) to use it.

$ git clone https://github.com/receipt-print-hq/escpos-tools
$ cd escpos-tools
$ composer install

Inspecting the file

There is text in the file, so the first thing you should try to do is esc2text. In this case, which works like this:

$ php esc2text.php receipt.bin

In this case, I got no output, so I switch to -v to show the commands being found.

$ php esc2text.php receipt.bin  -v
[DEBUG] SetRelativeVerticalPrintPositionCmd 
[DEBUG] GraphicsDataCmd 
[DEBUG] GraphicsDataCmd 
[DEBUG] SetRelativeVerticalPrintPositionCmd 
...

This indicates that there is no text being sent to the receipt, only images. We know from the print-out that the images contain text, so we need a few more utilities.

Recovering images from the receipt

To extract the images, use escimages. It runs like this:

$ php escimages.php --file receipt.bin
[ Image 1: 576x56 ]
[ Image 2: 576x56 ]
[ Image 3: 576x56 ]
[ Image 4: 576x56 ]
[ Image 5: 576x56 ]
[ Image 6: 576x56 ]
[ Image 7: 576x56 ]
[ Image 8: 576x52 ]

This gave us 8 narrow images:

Using ImageMagick’s convert command, these can be combined into one image like this:

convert -append receipt-*.png -density 70 -units PixelsPerInch receipt.png

The result is now the same as what our printer would output:

Recovering text from the receipt

Lastly, tesseract is an open source OCR engine which can recover text from images. This image is a lossless copy of what we sent to the printer, which is an “easy” input for OCR.

$ tesseract receipt.png -
Estimating resolution as 279
Test Receipt for USB Printer 1

Mar 17, 2018
10:12 PM



Ticket: 01



Item $0,00

Total $0.00

This quality of output is fairly accurate for an untrained reader.

Conclusion

The escpos-tools family of utilities gives some visibility into the contents of ESC/POS binary files.

If you have a use case which requires working with this type of file, then I would encourage you to consider contributing code or example files to the project, so that the utilities can be improved over time.

Get the code

View on GitHub →

First impressions of the Rust programming language

I needed to write a small command-line utility recently, and thought that it would be a good chance to finally try out the Rust programming language.

I was hoping to learn the basics and implement this program in a few hours, but I found that it wasn’t trivial to just pick up Rust and start writing programs immediately. The compiler is quite strict about how you use variables, so progress is slow if you don’t really know what you are doing. Still, Rust allows you to produce programs with some valuable properties, so I’m planning to set aside some time to learn it properly.

In the meantime, these are a few of the first impressions that I had of Rust, as somebody completely new to this particular software ecosystem.

Initial installation is easy

I needed to install the rustc compiler and cargo dependency manager. I found that the command packages for Debian testing were up-to-date, so I installed directly from apt.

sudo apt-get install rustc cargo

From there, it took all of one minute to create a file called hello.rs, drop the “Hello world” program, and see it run:

fn main() {
  println!("Hello");
}

The rustc compiler has a simple-enough invocation:

$ rustc hello.rs

And the resulting program ran as expected.

$ ./hello 
Hello

There were no surprises here. It seemed to work just like C so-far.

Output files are large

The output binary file was much larger than I expected, at 246K.

$ ls -Ahl hello
-rwxr-xr-x 1 mike mike 246K Apr 13 21:55 hello

For comparison, I wrote out a similar program in C and it was 8K.

#include <stdio.h>

int main() {
  printf("Hello\n");
  return 0;
}
$ gcc hello.c -o hello
$ ./hello 
Hello
$ ls -Ahl hello
-rwxr-xr-x 1 mike mike 8.3K Apr 13 21:57 hello

The FAQ explained a few specific reasons why this was the case.

Compile is slow

Compilation took around a quarter of a second. This does not seem like much, but it certainty slower than I expected for such trivial code.

$ time rustc hello.rs
real    0m0.298s

Compared with the C program:

$ time gcc hello.c -o hello
real    0m0.044s

Again, the FAQ points out that zero-cost abstractions incur a compile-time penalty.

Since compilation is a notoriously time-consuming activity with existing languages, I’m hoping that the compile times for Rust are acceptable for small projects.

Legendary compile errors

I’ve read about rustc being very strict, but printing good errors. So, I deleted a ! character and tried to re-compile.

fn main() {
  println("Hello");
}
$ rustc hello.rs 
error[E0423]: expected function, found macro `println`
 --> hello.rs:2:3
  |
2 |   println("Hello");
  |   ^^^^^^^ did you mean `println!(...)`?

error: aborting due to previous error

This error message suggested the correct code to use. This is better than what you get from most compilers, so I’m pretty happy with that.

Editor support is good

Next, I needed a better Rust editor. Eclipse support for Rust was present, but had a patchy maintenance status.

So I tried adding the Rust plugin for IntelliJ IDEA community edition, which is still a fully open source stack. This was point-and-click. The basic steps follow-

Open IntelliJ and click Configure:

Then Plugins:

Search for “rust”, and click Search in Repositories:

Then click RustInstall, and wait:

Once it installs, restart the IDE.

The New Project dialog now shows Rust as an option.

I needed the rust-src package to fill out these options:

$ sudo apt-get install rust-src
$ apt-file list rust-src
/usr/src/rustc-1.24.1/src/ ..

In the IntelliJ Rust environment, most editor features worked, but the debugger was not supported.

So, IDE support is mostly there, but it’s not easy to start hacking as something like Java.

Dependency management

I noticed that Cargo produces .lock files which contain checksums and exact versions of dependencies.

As a composer and yarn user, this was familiar.

“Clippy”

I couldn’t use the clippy linter to check my code, because it does not work on the stable Rust releases. The ‘nightly’ Rust releases are not available in the Debian archive.

So, for installing rustc, I should have used rustup, not apt-get. Lesson learned!

Documentation

I followed the “Guessing game tutorial” from the official Rust book in the IDE, which taught basic I/O, variable scope and control structures in Rust.

The go-to source reference for Rust is ‘The Rust Book’, which is also collaboratively built and freely licensed.

There is also an immense amount of high-quality discussion taking place between Rust users, which I found to be surprisingly accessible as a novice.

Having never used the language before, I didn’t have to ask any questions to get these basics working, which suggests that the available resources are pretty good.

How to print the characters in an ESC/POS printer code page

I’ve been working on software that interacts with ESC/POS receipt printers for some time, and a constant source of trouble is the archaic character encoding scheme used on these printers.

Most commonly, non-ASCII characters are accessed by swapping the extended range to a different 128-character code page. The main open source drivers (escpos-php and python-escpos) are both capable of auto-selecting an encoding, but they need a good database of known encodings to power this feature for each individual printer.

Today, I’ll share a small utility that can print out the contents of a code page, like this:

A printer’s documentation vaguely labeled this encoding as “[1] Katakana”. By printing it out, I can see that if I map single-byte half-width Katakana from Code Page 932, it will appear correctly in this code page. That’s type of information you need when you’re asked about it on an issue tracker!

Usage

You will generally find a list of code pages with a corresponding number for each one (0-255) in an ESC/POS printer’s documentation.

This command-line tool then takes a list of code pages to inspect, and will output raw binary that generates a table like the one above when sent to the printer:

php escpos-caracter-table.php NUMBER ...

So to print the code pages 1, 2 and 3 to a binary file, the command would be:

php escpos-character-tables.php 1 2 3 > code-page-1.bin

Next, you need to know how to do raw printing. Raw USB printing on Linux typically works like this:

cat code-page-1.bin > /dev/usb/lp0

For other platforms, it will be different! You will need to do a bit of research on raw printing for your platform if you haven’t tried it before.

The code: escpos-character-tables.php

<?php
/**
 * This standalone script can be used to print the contents of a code page
 * for troubleshooting.
 *
 * Usage: php escpos-caracter-table.php NUMBER ...
 *
 * Code pages are numbered 0-255.
 *
 * The ESC/POS binary will be send to stdout, and should be redirected to a
 * file or printer:
 *
 *   php escpos-caracter-table.php 20 > /dev/usb/lp0
 *
 * @author Michael Billington < michael.billington@gmail.com >
 * @license MIT
 */

/* Sanity check */
if(php_sapi_name() !== "cli") {
    die("This is a command-line script, invoke via php.exen");
}
if(count($argv) < 2) {
    die("At least one code page number must be specifiedn");
}
array_shift($argv);
foreach($argv as $codePage) {
    if(!is_numeric($codePage) || $codePage < 0 || $codePage > 255) {
        die("Code pages must be numbered 0-255");
    }
}

/* Reset */
$str = "x1b@";
foreach($argv as $codePage) {
    /* Print header, switch code page */
    $str .= "x1bt" . chr($codePage);
    $str .= "x1bEx01Code page $codePagex1bEx00n";
    $str .= "x1bEx01  0123456789ABCDEF0123456789ABCDEFx1bEx00n";
    $chars = str_repeat(' ', 128);
    for ($i = 0; $i < 128; $i++) {
        $chars[$i] = chr($i + 128);
    }
    for ($y = 0; $y < 4; $y++) {
        $row = "" . " ";
        $rowHeader = "x1bEx01" . strtoupper(dechex($y + 8)) . "x1bEx00";
        $row = substr($chars, $y * 32, 32);
        $str .= "$rowHeader $rown";
    }
}

/* Cut */
$str .= "x1dVx41x03";

/* Output to STDOUT */
file_put_contents("php://stdout", $str);

Optimization: How I made my PHP code run 100 times faster

I’ve had a PHP wikitext parser as a dependency of some of my projects since 2012. It has always been a minor performance bottleneck, and I recently decided to do something about it.

I prepared an update to the code over the course of a month, and achieved a speedup of 142 times over the original code.

Before: 20.65 seconds, After: 0.145 seconds

A lot of the information that I could find online about PHP performance was very outdated, so I decided to write a bit about what I’ve learned. This post walks through the process I used, and the things which were were slowing down my code.

This is a long read — I’ve included examples which show the sorts of things that were slowing down my code, and what I replaced them with. If you’re a serious PHP programmer, then read on!

Lesson 1: Know when to optimize

Conventional wisdom seems to dictate that making your code faster is a waste of developer time.

I think it makes you a better programmer to occasionally optimize something in a language that you normally work with. Having a well-calibrated intuition about how your code will run is part of becoming proficient in a language, and you will tend to create fewer performance problems if you’ve got that intuition.

But you do need to be selective about it. This code has survived in the wild for over five years, and I think I will still be using it in another five. This code is also a good candidate because it does not access external resources, so there is only one component to examine.

Lesson 2: Write tests

In the spirit of MakeItWorkMakeItRightMakeItFast, I started by checking my test suite so that I could refactor the code with confidence.

In my case, I haven’t got good unit tests, but I have input files that I can feed through the parser to compare with known-good HTML output, which serves the same purpose:

php example.php > out.txt
diff good.txt out.txt

I ran this after every change to the code, so that I could be sure that the changes were not affecting the output.

Lesson 3: Profile your code & Question your assumptions

Code profiling allows you see how each part of your program is contributing to its overall run-time. This helps you to target your optimization efforts.

The two main debuggers for PHP are Zend and Xdebug, which can both profile your code. I have xdebug installed, which is the free debugger, and I use the Eclipse IDE, which is the free IDE. Unfortunately, the built-in profiling tool in Eclipse seems to only support the Zend debugger, so I have to profile my scripts on the command-line.

The best sources of information for this are:

On Debian or Ubuntu, xdebug is installed via apt-get:

sudo apt-get install php-cli php-xdebug

On Fedora, the package is called php-pecl-xdebug, and is installed as:

sudo dnf install php-pecl-xdebug

Next, I executed a slow-running example script with profiling enabled:

php -dxdebug.profiler_enable=1 -dxdebug.profiler_output_dir=. example.php

This produces a profile file, which you can use any valgrind-compatible tools to inspect. I used kcachegrind

sudo apt-get install kcachegrind

And for fedora:

sudo dnf install kcachegrind

You can locate and open the profile on the command-line like so:

ls
kcachegrind cachegrind.out.13385

Before profiling, I had guessed that the best way to speed up the code would be to reduce the amount of string concatenation. I have lots of tight loops which append characters one-at-a-time:

$buffer .= "$c"

Xdebug showed me that my guess was wrong, and I would have wasted a lot of time if I tried to remove string concatenation.

kcachegrind screen capture

Instead, it was clear that I was

  • Calculating the same thing hundreds of times over.
  • Doing it inefficiently.

Lesson 4: Avoid multibyte string functions

I had used functions from the mbstring extension (mb_strlen, mb_substr) to replace strlen and substr throughout my code. This is the simplest way to add UTF-8 support when iterating strings, is commonly suggested, and is a bad idea.

What people do

If you have an ASCII string in PHP and want to iterate over each byte, the idiomatic way to do it is with a for loop which indexes into the string, something like this:

<?php
// Make a test string
$testString = str_repeat('a', 60000);
// Loop through test string
$len = strlen($testString);
for($i = 0; $i < $len; $i++) {
  $c = substr($testString, $i, 1);
  // Do work on $c
  // ...
}

I’ve used substr here so that I can show that it has the same usage as mb_substr, which generally operates on UTF-8 characters. The idiomatic PHP for iterating over a multi-byte string one character at a time would be:

<?php
// Make a test string
$testString = str_repeat('a', 60000);
// Loop through test string
$len = mb_strlen($testString);
for($i = 0; $i < $len; $i++) {
  $c = mb_substr($testString, $i, 1);
  // Do work on $c
  // ...
}

Since mb_substr needs to parse UTF-8 from the start of the string each time it is called, the second snippet runs in polynomial time, where the snippet that calls substr in a loop is linear.

With a few kilobytes of input, this makes mb_substr unacceptably slow.

substr: 0.03 seconds, mb_substr: 4.23 seconds

Averaging over 10 runs, the mb_substr snippet takes 4.23 seconds, while the snippet using substr takes 0.03 seconds.

What people should do

Split your strings into bytes or characters before you iterate, and write methods which operate on arrays rather than strings.

You can use str_split to iterate over bytes:

<?php
// Make a test string
$testString = str_repeat('a', 60000);
// Loop through test string
$testArray = str_split($testString);
$len = count($testArray);
for($i = 0; $i < $len; $i++) {
  $c = $testArray[$i];
  // Do work on $c
  // ...
}

And for unicode strings, use preg_split. I learned about this trick from StackOverflow, but it might not be the fastest way. Please leave a comment if you have an alternative!

<?php
// Make a test string
$testString = str_repeat('a', 60000);
// Loop through test string
$testArray = preg_split('//u', $testString, -1, PREG_SPLIT_NO_EMPTY);
$len = count($testArray);
for($i = 0; $i < $len; $i++) {
  $c = $testArray[$i];
  // Do work on $c
  // ...
}

By converting the string to an array, you can pay the penalty of decoding the UTF-8 up-front. This is a few milliseconds at the start of the script, rather than a few milliseconds each time you need to read a character.

str_split: 0.0097s, preg_split: 0.0160s

After discovering this faster alternative to mb_substr, I systematically removed every mb_substr and mb_strlen from the code I was working on.

Lesson 5: Optimize for the most common case

Around 50% of the remaining runtime was spent in a method which expanded templates.

To parse wikitext, you first need to expand templates, which involves detecting tags like {{ template-name | foo=bar }} and <noinclude></noinclude>.

My 40 kilobyte test file had fewer than 100 instances of { and <, | and =, so I added a short-circuit to the code to skip most of the processing, for most of the characters.

<?php
self::$preprocessorChars = [
    '<' => true,
    '=' => true,
    '|' => true,
    '{' => true
];

// ...
for ($i = 0; $i < $len; $i++) {
    $c = $textChars[$i];
    if (!isset(self::$preprocessorChars[$c])) {
        /* Fast exit for characters that do not start a tag. */
        $parsed .= $c;
        continue;
    }
   // ... Slower processing 
}

The slower processing is now avoided 99.75% of the time.

Checking for the presence of a key in a map is very fast. To illustrate, here are two examples which each branch on { and <, | and =.

This one uses a map to check each character:

<?php
// Make a test string
$testString = str_repeat('a', 600000);
$chars = [
    '<' => true,
    '=' => true,
    '|' => true,
    '{' => true
];
// Loop through test string
$testArray = preg_split('//u', $testString, -1, PREG_SPLIT_NO_EMPTY);
$len = count($testArray);
$parsed = "";
for($i = 0; $i < $len; $i++) {
  $c = $testArray[$i];
  if(!isset($chars[$c])) {
    $parsed .= $c;
    continue;
  }
  // Never executed
}

While one uses no map, and has four !== checks instead:

<?php
// Make a test string
$testString = str_repeat('a', 600000);
// Loop through test string
$testArray = preg_split('//u', $testString, -1, PREG_SPLIT_NO_EMPTY);
$len = count($testArray);
$parsed = "";
for($i = 0; $i < $len; $i++) {
  $c = $testArray[$i];
  if($c !== "<" && $c !== "=" && $c !== "|" && $c !== "{") {
    $parsed .= $c;
    continue;
  }
  // Never executed
}

Even though the run time of each script includes the generation of a 600kB test string, the difference is still visible pronounced:

0.29 seconds with map, 0.37 seconds without map

Averaging over 10 runs, the code took 0.29 seconds when using a map, while it took 0.37 seconds to run the example with used four !== statements.

I was a little surprised by this result, but I’ll let the data speak for itself rather than try to explain why this is the case.

Lesson 6: Share data between functions

The next item to appear in the profiler was the copious use of array_slice.
My code uses recursive descent, and was constantly slicing up the input to pass around information. The array slicing had replaced earlier string slicing, which was even slower.

I refactored the code to pass around the entire string with indexes rather than actually cutting it up.

As a contrived example, these scripts each use a (very unnecessary) recursive-descent parser to take words from the dictionary and transform them like this:

example --> (example!)

The first example slices up the input array at each recursion step:

<?php
function handleWord(string $word) {
  return "($word!)\n";
}

/**
 * Parse a word up to the next newline.
 */
function parseWord(array $textChars) {
  $parsed = "";
  $len = count($textChars);
  for($i = 0; $i < $len; $i++) {
    $c = $textChars[$i];
    if($c === "\n") {
      // Word is finished because we hit a newline
      $start = $i + 1; // Past the newline
      $remainderChars = array_slice($textChars, $start , $len - $start);
      return array('parsed' => handleWord($parsed), 'remainderChars' => $remainderChars);
    }
    $parsed .= $c;
  }
  // Word is finished because we hit the end of the input
  return array('parsed' => handleWord($parsed), 'remainderChars' => []);
}

/**
 * Accept newline-delimited dictionary
 */
function parseDictionary(array $textChars) {
  $parsed = "";
  $len = count($textChars);
  for($i = 0; $i < $len; $i++) {
    $c = $textChars[$i];
    if($c === "\n") {
      // Not a word...
      continue;
    }
    // This is part of a word
    $start = $i;
    $remainderChars = array_slice($textChars, $start, $len - $start);
    $result = parseWord($remainderChars);
    $textChars = $result['remainderChars'];
    $len = count($textChars);
    $i = -1;
    $parsed .= $result['parsed'];
  }
  return array('parsed' => $parsed, 'remainderChars' => []);
}

// Load file, split into characters, parse, print result
$testString = file_get_contents("words");
$testArray = preg_split('//u', $testString, -1, PREG_SPLIT_NO_EMPTY);
$ret = parseDictionary($testArray);
file_put_contents("words2", $ret['parsed']);

While the second one always takes an index into the input array:

<?php
function handleWord(string $word) {
  return "($word!)\n";
}

/**
 * Parse a word up to the next newline.
 */
function parseWord(array $textChars, int $idxFrom = 0) {
  $parsed = "";
  $len = count($textChars);
  for($i = $idxFrom; $i < $len; $i++) {
    $c = $textChars[$i];
    if($c === "\n") {
      // Word is finished because we hit a newline
      $start = $i + 1; // Past the newline
      return array('parsed' => handleWord($parsed), 'remainderIdx' => $start);
    }
    $parsed .= $c;
  }
  // Word is finished because we hit the end of the input
  return array('parsed' => handleWord($parsed), $i);
}

/**
 * Accept newline-delimited dictionary
 */
function parseDictionary(array $textChars, int $idxFrom = 0) {
  $parsed = "";
  $len = count($textChars);
  for($i = $idxFrom; $i < $len; $i++) {
    $c = $textChars[$i];
    if($c === "\n") {
      // Not a word...
      continue;
    }
    // This is part of a word
    $start = $i;
    $result = parseWord($textChars, $start);
    $i = $result['remainderIdx'] - 1;
    $parsed .= $result['parsed'];
  }
  return array('parsed' => $parsed, 'remainderChars' => []);
}

// Load file, split into characters, parse, print result
$testString = file_get_contents("words");
$testArray = preg_split('//u', $testString, -1, PREG_SPLIT_NO_EMPTY);
$ret = parseDictionary($testArray);
file_put_contents("words2", $ret['parsed']);

The run-time difference between these examples is again very pronounced:

3.04s with slicing, 0.0302s with no slicing

Averaging over 10 runs, the code snippet which extracts sub-arrays took 3.04 seconds, while the code that passes around the entire array ran in 0.0302 seconds.

It’s amazing that such an obvious inefficiency in my code had been hiding behind larger problems before.

Lesson 7: Use scalar type hinting

Scalar type hinting looks like this:

function foo(int bar, string $baz) {
...
}

This is the secret PHP performance feature that they don’t tell you about. It does not actually change the speed of your code, but does ensure that it can’t be run on the slower PHP releases before 7.0.

PHP 7.0 was released in 2015, and it’s been established that it is twice as fast as PHP 5.6 in many cases.

I think it’s reasonable to have a dependency on a supported version of your platform, and performance improvements like this are a good reason to update your minimum supported version of PHP.

By breaking compatibility with scalar type hints, you ensure that your software does not appear to “work” in a degraded state.

Simplifying the compatibility landscape will also make performance a more tractable problem.

Lesson 8: Ignore advice that isn’t backed up by (relevant) data

While I was updating this code, I found a lot of out-dated claims about PHP performance, which did not hold true for the versions that I support.

To call out a few myths that still seem to be alive:

  • Style of quotes impacts performance.
  • Use of by-val is slower than by-ref for variable passing.
  • String concatenation is bad for performance.

I attempted to implement each of these, and wasted a lot of time. Thankfully, I was measuring the run-time and using version control, so it was easy for me to identify and discard changes which had a negligible or negative performance impact.

If somebody makes a claim about something being fast or slow in PHP, best assume that it doesn’t apply to you, unless you see some example code with timings on a recent PHP version.

Conclusion

If you’ve read this far, then I hope you’ve seen that modern PHP is not an intrinsically slow language. A significant speed-up is probably achievable on any real-world code-base with such endemic performance issues.

Before: 20.65 seconds, After: 0.145 seconds

To repeat the graphic from the introduction: the test file could be parsed in 20.65 seconds on the old code, and 0.145 seconds on the improved code (averaging over 10 runs, as before).

At this point I declared my efforts “good enough” and moved on. The job is done, since although another pass could speed it up further, this code is no longer slow enough to justify the effort.

Since I’ve closed the book on PHP 5, I would be interested in knowing whether there is a faster way to parse UTF-8 with the new IntlChar functions, but I’ll have to save that for the next project.

Now that you’ve seen some very inefficient ways to write PHP, I also hope that you will be able to avoid introducing similar problems in your own projects!

Using a receipt printer with the Amazon Echo

Today, I’m going to share this write-up by fellow developer Chris, who used the escpos-php thermal printing library as part of a setup which printed shopping lists via voice commands, using the Alexa Voice Service API to send the lists back to a Raspberry Pi.

Naturally, the easiest solution […] is to print it in thermal paper… Now, combine this with a voice interface, such as ALEXA, and you made yourself a voice controlled list printer.

I found out about this one through a blog comment, and it’s a recommended read for anybody who is interested in how all of these parts integrate.

When I first uploaded this printing library four years ago, the Amazon Echo did not exist yet, and I was solving a very specific problem. For an old technology, it’s interesting to see that new applications for thermal printers are still appearing, and I’m certainly glad to see my software popping up in cool projects like this.

Raster to vector conversion tips

I have recently been working with some low-resolution bitmap fonts for a few projects, which needed to be re-sized for different uses.

I’ll share here a few tricks that I use to get the detail out of each letter as a vector, so that it can be rendered at a higher resolution.

Example

A good example might be this picture of a hieroglyph from the WikiHiero MediaWiki extension, which is 28 pixels wide:

Scaled up, it looks like this:

So small images become very pixellated when you resize them. The good news is that even from a small image, there is quite a lot of detail which we can use. If we’re smart about it, the glyph can be rendered like this:

You can still see some artifacts because of the low resolution of the input, but it’s clearly an improvement.

You will need

  • ImageMagick for raster operations.
  • potrace to trace the image.
  • Inkscape to produce a high-quality raster.

Steps

Prepare

The tracing program will convert the image to pure black & white as its first step. These transformations make sure that the detail is preserved for tracing.

  • Padding by 10px on every side to reduce distortion around the edges.
  • Scaling by 10x with interpolation
  • Convert transparency to white
convert hiero_A1.png -bordercolor white -border 10x10 \
    -resize 1000% -flatten hiero_A1.pnm

The 29×38 grey+alpha input becomes a blurry 490 x 580 greymap surrounded by whitespace.

This preparation is important, because a large blurry graymap will retain a lot more detail than the original image when a threshold is applied to convert it pure black and white:

Trace

The potrace program will threshold and trace the input image. Here, we will produce an SVG so that we can make it transparent.

The k value affects the threshold operation. It can be increased for a bolder, darker glyph, or reduced for a finer one.

potrace hiero_A1.pnm -k 0.30 --svg

This gives you an SVG with padding:

If you only want a vector, then you can stop here. The next steps will reproduce a smaller PNG with transparency and the correct padding.

I couldn’t find a reliable way to programmatically crop the image back to its original padding as an SVG, but in my case I needed to convert it back to a bitmap anyway, so I cropped it later.

Render

Use Inkscape to convert the SVG back to a large PNG. The output size here is twice as large as the file we traced, just to leave plenty of pixels to work with.

inkscape -z -e hiero_A1_big.png hiero_A1.svg -w 980

Like the SVG, there is still a lot of whitespace here. The image is now a PNG with a transparent background, and unlike the file we traced, the edges of the curves are now anti-aliased.

Crop

Everything is 20x its original size, to get the image, we need to drop 200px of padding from the left and top, then read 580×760 pixels (20 times the 29×38 start).

convert hiero_A1_big.png -crop 580x760+200+200 hiero_A1_cropped.png

This produces a 580×760 image in the same aspect ratio as the original input file.

Scale down

In my case, I only needed to double the resolution of the input file, so I scaled this file down from there.

hiero_A1_cropped.png -resize 58x76 hiero_A1_outp1.png

Success!

As a script

I got these steps from a script that I wrote for doubling the size of a PNG image so that it can be re-used on newer displays.

Usage:

./tracepng.sh foo.png

Where tracepng.sh is:

#!/bin/bash
# A script to upscale small bitmaps in PNG format.
# Pad, upscale, trace, render, crop then downscale.
if [ $# != 1 ]; then
  echo "Usage $0 input.png"
  exit
fi
set -exu
# Names of all the files we will produce
INP_FILE=$1
SVG_FILE="${INP_FILE%.*}.svg"
PNM_FILE="${INP_FILE%.*}.pnm"
LARGE_FILE="${INP_FILE%.*}_big.png"
LARGE_FILE_CROPPED="${INP_FILE%.*}_cropped.png"
OUTP_FILE="${INP_FILE%.*}_outp1.png"
COMPARISON_FILE="${INP_FILE%.*}_outp2.png"

# Width originally
# https://stackoverflow.com/questions/4670013/fast-way-to-get-image-dimensions-not-filesize
IMG_WIDTH=$(identify -format "%w" "$INP_FILE")
IMG_HEIGHT=$(identify -format "%h" "$INP_FILE")
TARGET_WIDTH=$((IMG_WIDTH * 2))
TARGET_HEIGHT=$((IMG_HEIGHT * 2))

# Make huge raster w/ border (whitespace is your friend for black/white interpolation and tracing), then convert to SVG
convert ${INP_FILE} -bordercolor white -border 10x10 -resize 1000% -flatten ${PNM_FILE}

# https://en.wikipedia.org/wiki/Potrace
potrace ${PNM_FILE} -k 0.30 --svg > ${SVG_FILE}

# Target width for intermediate file
EXPANDED_WIDTH=$(((IMG_WIDTH + 20) * 20))
EXPANDED_HEIGHT=$(((IMG_HEIGHT + 20) * 20))
INNER_WIDTH=$((IMG_WIDTH * 20))
INNER_HEIGHT=$((IMG_HEIGHT * 20))
# https://stackoverflow.com/questions/9853325/how-to-convert-a-svg-to-a-png-with-image-magick
inkscape -z -e ${LARGE_FILE} ${SVG_FILE} -w ${EXPANDED_WIDTH}

# Cut new edges off
# http://www.imagemagick.org/Usage/crop/
convert ${LARGE_FILE} -crop ${INNER_WIDTH}x${INNER_HEIGHT}+200+200 ${LARGE_FILE_CROPPED}
convert ${LARGE_FILE_CROPPED} -resize ${TARGET_WIDTH}x${TARGET_HEIGHT} ${OUTP_FILE}

Acknowledgment

The images here are from WikiHiero, and can be remixed under the GNU General Public License 2.0.