Benchmarking PHP code with PhpBench

This blog post is all about measuring the speed of PHP code through micro-benchmarks.

Why micro-benchmarks

I think of micro-benchmarks as a complement to tests. A well-written test would give a developer a definite pass or fail result, while well-written micro-benchmark will give the developer an indication of the duration of an operation.

To put it another way, tests can help you find out whether the code is behaving correctly, while micro-benchmarks can help you to track whether your changes are making the code faster or slower.

Make sure that your code is doing real processing work before you spend time on this: Algorithms such as sorting, compressing, or parsing are good candidates. My guess is that most of the code in a regular PHP application would not benefit significantly from having a suite of benchmarks, because web apps tend to be I/O bound (eg database, network and disk access).

Introducing PhpBench

I have previously written a standalone test script every time I’ve needed to measure the speed of something in PHP. This works up to a point, but it gets quite hard to maintain as the number of benchmarks increases.

PhpBench is one of the available tools for running micro-benchmarks from PHP, and there are a few reasons why I think it’s worth a try:

  • it’s actively maintained
  • it installs with composer
  • it runs in a similar way to PHPUnit, so the benchmarks are fully specified in PHP code and run with a PHP tool.
  • it implements familiar concepts that you would find in benchmarking tools from other languages (eg. JMH from the Java world).

Installation

Start out with a blank composer project.

$ composer init

Add some auto-loading settings to composer.yml that PHP knows to find ExampleApp classes in the src folder.

{
    "name": "mike42/php-benchmark-examples",
    "description": "Example PHP benchmark project",
    "type": "project",
    "license": "MIT",
    "minimum-stability": "dev",
    "require": {},
    "require-dev": {},
    "autoload": {
        "psr-4": {
            "ExampleApp\\": "src"
        }
    }
}

At this point, install the phpbench dev version.

$ composer require --dev phpbench/phpbench:@dev
./composer.json has been updated
Loading composer repositories with package information
Updating dependencies (including require-dev)
Package operations: 22 installs, 0 updates, 0 removals
  - Installing symfony/process (4.4.x-dev 1a42849): Cloning 1a42849a7f from cache
  - Installing symfony/options-resolver (4.4.x-dev 94cbb72): Cloning 94cbb72bb9 from cache
  - Installing symfony/finder (4.4.x-dev 1b5ec12): Cloning 1b5ec12340 from cache
  - Installing symfony/polyfill-ctype (dev-master 82ebae0): Cloning 82ebae0220 from cache
  - Installing symfony/filesystem (4.4.x-dev 5914824): Cloning 59148241f7 from cache
  - Installing psr/log (dev-master c4421fc): Cloning c4421fcac1 from cache
  - Installing symfony/debug (4.4.x-dev 8278839): Cloning 8278839457 from cache
  - Installing symfony/service-contracts (dev-master 0c81a04): Cloning 0c81a04f68 from cache
  - Installing symfony/polyfill-php73 (dev-master d1fb4ab): Cloning d1fb4abcc0 from cache
  - Installing symfony/polyfill-mbstring (dev-master fe5e94c): Cloning fe5e94c604 from cache
  - Installing symfony/console (4.4.x-dev e2fe100): Cloning e2fe1002fd from cache
  - Installing lstrojny/functional-php (1.9.0): Loading from cache
  - Installing beberlei/assert (v3.x-dev ce139b6): Cloning ce139b6bf8 from cache
  - Installing seld/jsonlint (1.7.1): Loading from cache
  - Installing psr/container (dev-master 014d250): Cloning 014d250dae from cache
  - Installing phpbench/container (1.2): Loading from cache
  - Installing webmozart/assert (1.4.0): Loading from cache
  - Installing webmozart/path-util (dev-master 95a8f7a): Cloning 95a8f7ad15 from cache
  - Installing phpbench/dom (0.2.0): Loading from cache
  - Installing doctrine/lexer (dev-master ee614dd): Cloning ee614dd93a from cache
  - Installing doctrine/annotations (1.7.x-dev 3f35255): Cloning 3f35255290 from cache
  - Installing phpbench/phpbench (dev-master dccc67d): Cloning dccc67dd52 from cache
symfony/service-contracts suggests installing symfony/service-implementation
symfony/console suggests installing symfony/event-dispatcher
symfony/console suggests installing symfony/lock
Writing lock file
Generating autoload files

Writing a benchmark

Add some code to src/ExampleThing.php to measure.

<?php

namespace ExampleApp;

class ExampleThing {
  /**
   * Inefficiently multiply numbers together
   */
  public function multiply(int $x, int $y) : int {
    $ret = 0;
    for($i = 0; $i < $x; $i++) {
      for($j = 0; $j < $y; $j++) {
        $ret++;
      }
    }
    return $ret;
  }
}

Write a benchmark for the code in benchmarks/ExampleThingBenchmark.php

<?php

use ExampleApp\ExampleThing;

/**
 * @BeforeMethods({"init"})
 * @Revs(1000)
 * @Iterations(5)
 */
class ExampleThingBenchmark {

    private static $exampleThing;

    public function init()
    {
        self::$exampleThing = new ExampleThing();
    }

    /**
     * @Subject
     */
    public function doMultiply()
    {
        self::$exampleThing -> multiply(100, 100);
    }
}

Before you run anything, you will also need a configuration file. Add this minimal configuration to phpbench.json.dist.

{
    "php_disable_ini": true,
    "bootstrap": "vendor/autoload.php",
    "path": "benchmark",
    "php_config": {
        "extension": [ "json.so" ]
    },
    "time_unit": "milliseconds"
}

Running the benchmark

The most basic way to run all benchmarks at once is phpbench run, which looks like this:

$ php vendor/bin/phpbench run
PhpBench @git_tag@. Running benchmarks.
Using configuration file: /home/mike/workspace/blog/phpbench/php-benchmarks/phpbench.json.dist

\ExampleThingBenchmark

    doMultiply..............................I4 [μ Mo]/r: 0.232 0.232 (ms) [μSD μRSD]/r: 0.000ms 0.10%

1 subjects, 5 iterations, 1,000 revs, 0 rejects, 0 failures, 0 warnings
(best [mean mode] worst) = 0.232 [0.232 0.232] 0.233 (ms)
⅀T: 1.162ms μSD/r 0.000ms μRSD/r: 0.100%

As you might expect, there are options to run specific benchmarks, or to present the results in different ways. However, there were some PhpBench-specific things which you will need to navigate to use it successfully.

Firstly, there are two different output settings to consider:

  • report options allow you to decide which data to print in a table.
  • output options allow you to decide how to format it for output

I also found it useful to add --progress=none to suppress the text displayed by default, since you will otherwise get broken HTML output.

Lastly, be aware that confusingly, the default report is not active by default. Specify it to get a table listing out each iteration.

$ php vendor/bin/phpbench run --progress=none --report=default
suite: 13415431dc0db41c12decee0fa83c26d1db0f678, date: 2019-05-31, stime: 22:01:34
+-----------------------+------------+-----+------+------+----------+-----------+--------------+----------------+
| benchmark             | subject    | set | revs | iter | mem_peak | time_rev  | comp_z_value | comp_deviation |
+-----------------------+------------+-----+------+------+----------+-----------+--------------+----------------+
| ExampleThingBenchmark | doMultiply | 0   | 1000 | 0    | 915,736b | 236.716μs | +1.68σ       | +1.08%         |
| ExampleThingBenchmark | doMultiply | 0   | 1000 | 1    | 915,736b | 231.999μs | -1.45σ       | -0.93%         |
| ExampleThingBenchmark | doMultiply | 0   | 1000 | 2    | 915,736b | 233.960μs | -0.15σ       | -0.1%          |
| ExampleThingBenchmark | doMultiply | 0   | 1000 | 3    | 915,736b | 233.878μs | -0.2σ        | -0.13%         |
| ExampleThingBenchmark | doMultiply | 0   | 1000 | 4    | 915,736b | 234.361μs | +0.12σ       | +0.08%         |
+-----------------------+------------+-----+------+------+----------+-----------+--------------+----------------+

The type of report that I found most useful is aggregate, since it provides a summary of each iteration.

$ php vendor/bin/phpbench run --progress=none --report=aggregate
suite: 1341543c02ebee4031d165665c2724c379bf9c98, date: 2019-05-31, stime: 22:01:18
+-----------------------+------------+-----+------+-----+----------+-----------+-----------+-----------+-----------+---------+--------+-------+
| benchmark             | subject    | set | revs | its | mem_peak | best      | mean      | mode      | worst     | stdev   | rstdev | diff  |
+-----------------------+------------+-----+------+-----+----------+-----------+-----------+-----------+-----------+---------+--------+-------+
| ExampleThingBenchmark | doMultiply | 0   | 1000 | 5   | 915,736b | 231.907μs | 232.614μs | 232.089μs | 233.538μs | 0.713μs | 0.31%  | 1.00x |
+-----------------------+------------+-----+------+-----+----------+-----------+-----------+-----------+-----------+---------+--------+-------+

Both of the example above use console output, which is the default.

HTML reports

To capture phpbench output from a CI environment, you can generate a HTML report as well. Add an output section to phpbench.json.dist.

{
    ...
    "outputs": {
         "html_file": {
             "extends": "html",
             "file": "benchmarks.html",
             "title": "Example benchmark report"
         }
    }
}

This new html_file output can be used with --output=html_file

$ php vendor/bin/phpbench run \
    --progress=none --report=aggregate \
    --output=console --output=html_file

From Jenkins

This is a Jenkinsfile that I use to run benchmarks. You need to install the “HTML Publisher” and “AnsiColor” Jenkins plugins.

pipeline {
  agent any

  stages {
    stage('Install') {
      steps {
        ansiColor('xterm') {
          sh 'composer install'
        }
      }
    }
    stage('Run benchmarks') {
      steps {
        ansiColor('xterm') {
          sh 'php vendor/bin/phpbench run --progress=none --report aggregate --output=console --output=html_file --ansi'
          publishHTML([
            allowMissing: false,
            alwaysLinkToLastBuild: true,
            keepAll: true,
            reportDir: '',
            reportFiles: 'benchmarks.html',
            reportName:
            'phpbench report',
            reportTitles: ''
          ])       
        }
      }
    }
  }
}

Each build is then benchmarked, and the results are available as a link on the left.

The console output of the build shows the results.

To actually view the HTML report you will need to apply this tweak.

Next steps

I’ve started to add phpbench micro-benchmarks to some PHP projects that I work on, and I’m finding it a lot more useful than standalone test scripts.

The next step would be integrate micro-benchmarks better into the development process. For example, I can already review code coverage improvements/regressions of a change on GitHub using Coveralls, which posts a comment to each pull request. This saves you from merging changes which don’t meet your project’s standards.

As far as I can tell, you would need to write custom code for more expressive phpbench reports along these lines, such as:

  • the history of a benchmark over time, or
  • the most-changed benchmarks against a baseline

phpbench does provide the building-blocks though, since you can set some options to archive the results of each run to a folder.

How to integrate Gitea and Jenkins

Gitea is a web app for hosting Git repositories. It’s open source, and very simple to get running. With some extra setup, it can also trigger Jenkins builds, and display the Jenkins build status of each commit once it has been built.

Because the documentation for the Jenkins plugin is very minimalist, I decided to write about it for future reference.

About this setup

I installed Jenkins and Gitea on the same Debian 9 server on the LAN. They communicate only over HTTP, so they could just as easily be installed separately.

To make the configuration clear, I’ve used jenkins.example.com in URLs which refer to Jenkins, and gitea.example.com for the Gitea.

Gitea installation

This command will install and start the linux-amd64 version of Gitea as the user “git”.

useradd git -r --create-home && \
  mkdir /opt/gitea && chown -R git: /opt/gitea && \
  wget -O /opt/gitea/gitea https://dl.gitea.io/gitea/1.7.0/gitea-1.7.0-linux-amd64 && \
  chmod +x /opt/gitea/gitea && \
  sudo -u git bash -c "cd /opt/gitea && ./gitea web"

Shut it down, and configure some paths at /opt/gitea/custom/conf/app.ini. These will depend on your environment.

SSH_DOMAIN       = gitea.example.com
DOMAIN           = gitea.example.com
HTTP_PORT        = 3000
ROOT_URL         = http://gitea.example.com:3000/

Start it back up as a systemd service at this point, by creating /etc/systemd/system/gitea.service with this content:

[Unit]
Description=gitea
After=network.target

[Service]
ExecStart=/opt/gitea/gitea web
WorkingDirectory=/opt/gitea
User=git
Type=simple

[Install]
WantedBy=multi-user.target

Once this is saved, start the service.

systemctl daemon-reload
systemctl start gitea

Optionally, also configure it to start on boot.

systemctl enable gitea

Jenkins installation

I installed Jenkins from the official Debian repo at jenkins-ci.org, and clicked through the initial install.

Add plugin

Open up Manage JenkinsManage Plugins. Navigate to Available and check the Gitea plugin.

Next, install the plugin and restart Jenkins.

The configuration for the plugin is located under Manage JenkinsConfigure System.

At this point you will want to tell Jenkins where to find your Gitea server.

I don’t suggest choosing Manage hooks, because it uses the same account to manage hooks across all repos, which would violate the principle of least privilege.

Set up a project in Gitea

In Gitea, create a project, then a repository under that.

Register an account in Gitea for Jenkins to use for this project.

Log out, log back in as yourself, and add Jenkins as a collaborator to the repo, with Write access.

This is the only permission you need for public repositories. If you plan to lock down your Gitea organization later, then you will also need to give this Jenkins account Read access at the organization level.

Set up a project in Jenkins

Add a new Gitea Organization Jenkins job.

Enter the name of the organization, and the account to log in with.

Add the details for the new account, and make sure it’s selected.

The other options don’t need to be changed at this stage.

When you press ‘save’, Jenkins will immediately attempt to find any repositories in the Gitea organization, and kick off any builds. Unless everything is correct, this is unlikely to work the first time, so pay attention to error logs.

These three places will show what’s happening:

  • Scan Gitea Organization Log, which lists repositories in the organization.
  • Scan Multibranch Pipeline Log for each repository, which shows the discovery of branches.
  • Console output for each build, which will show errors if the build status could not be submitted.

Problems which I’ve found here include:

  • The URL of Gitea in the Jenkins configuration must match the URL to Gitea in its own configuration.
  • The Jenkins user account must have permission to list repositories, clone, and update statuses.
  • Empty repositories, and repositories without a ‘Jenkinsfile’ are ignored.

For that last step, here is an empty Jenkinsfile that you can put in your repository to test this integration:

pipeline {
    agent any

    stages {
        stage('Do nothing') {
            steps {
                sh '/bin/true'
            }
        }
    }
}

Once this is sorted out, you can expect to see your repository in Jenkins.

Every branch with a Jenkinsfile will appear.

And each time a commit is mentioned in Gitea, it will display a small icon to indicate the build status.

Set up a web hook in Gitea

At this point, builds need to be manually triggered. To trigger them each time the repository changes, we need to get a notification out to Jenkins.

Under the repository settings, click WebhooksAdd webhookGitea.

The correct values to use are:

  • URL: http://[ your jenkins server ]/gitea-webhook/post
  • POST Content Type: application/json

Once you press Add Webhook, the path will appear with a small grey dot, indicating that it hasn’t been run before.

If you click edit, then the Test Delivery button can be used to check that it’s working.

The icon indicates the status. If things aren’t working correctly, then click the delivery UUID to expand the full request information, which should help with debugging.

Final result

With Jenkins and Gitea, you have a simple self-hosted a continuous integration environment.

In Gitea, you can store, update and review your code. Any build and test steps in a Jenkinsfile will be run automatically each time the repository changes.

The detailed output for each build is visible in Jenkins, where you can track build results with a variety of plugins.

Handling I/O errors in PHP

This blog post is all about how to handle errors from the PHP file_get_contents function, and others which work like it.

The file_get_contents function will read the contents of a file into a string. For example:

<?php
$text = file_get_contents("hello.txt");
echo $text;

You can try this out on the command-line like so:

$ echo "hello" > hello.txt
$ php test.php 
hello

This function is widely used, but I’ve observed that error handling around it is often not quite right. I’ve fixed a few bugs involving incorrect I/O error handling recently, so here are my thoughts on how it should be done.

How file_get_contents fails

For legacy reasons, this function does not throw an exception when something goes wrong. Instead, it will both log a warning, and return false.

<?php
$filename = "not-a-real-file.txt";
$text = file_get_contents($filename);
echo $text;

Which looks like this when you run it:

$ php test.php 
PHP Warning:  file_get_contents(not-a-real-file.txt): failed to open stream: No such file or directory in test.php on line 3
PHP Stack trace:
PHP   1. {main}() test.php:0
PHP   2. file_get_contents() test.php:3

Warnings are not very useful on their own, because the code will continue on without the correct data.

Error handling in four steps

If anything goes wrong when you are reading a file, your code should be throwing some type of Exception which describes the problem. This allows developers to put a try {} catch {} around it, and avoids nasty surprises where invalid data is used later.

Step 1: Detect that the file was not read

Any call to file_get_contents should be immediately followed by a check for that false return value. This is how you know that there is a problem.

<?php
$filename = "not-a-real-file.txt";
$text = file_get_contents($filename);
if($text === false) {
  throw new Exception("File was not loaded");
}
echo $text;

This now gives both a warning and an uncaught exception:

$ php test.php 
PHP Warning:  file_get_contents(not-a-real-file.txt): failed to open stream: No such file or directory in test.php on line 3
PHP Stack trace:
PHP   1. {main}() test.php:0
PHP   2. file_get_contents() test.php:3
PHP Fatal error:  Uncaught Exception: File was not loaded in test.php:5
Stack trace:
#0 {main}
  thrown in test.php on line 5

Step 2: Suppress the warning

Warnings are usually harmless, but there are several good reasons to suppress them:

  • It ensures that you are not depending on a global error handler (or the absence of one) for correct behaviour.
  • The warning might appear in the middle of the output, depending on php.ini.
  • Warnings can produce a lot of noise in the logs

Use @ to silence any warnings from a function call.

<?php
$filename = "not-a-real-file.txt";
$text = @file_get_contents($filename);
if($text === false) {
  throw new Exception("File was not loaded");
}
echo $text;

The output is now only the uncaught Exception:

$ php test.php 
PHP Fatal error:  Uncaught Exception: File was not loaded in test.php:5
Stack trace:
#0 {main}
  thrown in test.php on line 5

Step 3: Get the reason for the failure

Unfortunately, we lost the “No such file or directory” message, which is pretty important information, which should go in the Exception. This information is retrieved from the old-style error_get_last method.

This function might just return empty data, so you should check that everything is set and non-empty before you try to use it.

<?php
$filename = "not-a-real-file.txt";
error_clear_last();
$text = @file_get_contents($filename);
if($text === false) {
  $e = error_get_last();
  $error = (isset($e) && isset($e['message']) && $e['message'] != "") ?
      $e['message'] : "Check that the file exists and can be read.";
  throw new Exception("File '$filename' was not loaded. $error");
}
echo $text;

This now embeds the failure reason directly in the message.

$ php test.php 
PHP Fatal error:  Uncaught Exception: File 'not-a-real-file.txt' was not loaded. file_get_contents(not-a-real-file.txt): failed to open stream: No such file or directory in test.php:9
Stack trace:
#0 {main}
  thrown in test.php on line 9

Step 4: Add a fallback

The last time I introduced error_clear_last()/get_last_error() into a code-base, I learned out that HHVM does not have these functions.

Call to undefined function error_clear_last()

The fix for this is to write some wrapper code, to verify that each function exists.

<?php
$filename = 'not-a-real-file';
clearLastError();
$text = @file_get_contents($filename);
if ($text === false) {
    $error = getLastErrorOrDefault("Check that the file exists and can be read.");
    throw new \Exception("Could not retrieve image data from '$filename'. $error");
}
echo $text;

/**
 * Call error_clear_last() if it exists. This is dependent on which PHP runtime is used.
 */
function clearLastError()
{
    if (function_exists('error_clear_last')) {
        error_clear_last();
    }
}
/**
 * Retrieve the message from error_get_last() if possible. This is very useful for debugging, but it will not
 * always exist or return anything useful.
 */
function getLastErrorOrDefault(string $default)
{
    if (function_exists('error_clear_last')) {
        $e = error_get_last();
        if (isset($e) && isset($e['message']) && $e['message'] != "") {
            return $e['message'];
        }
    }
    return $default;
}

This does the same thing as before, but without breaking other PHP runtimes.

$ php test.php 
PHP Fatal error:  Uncaught Exception: Could not retrieve image data from 'not-a-real-file'. file_get_contents(not-a-real-file): failed to open stream: No such file or directory in test.php:7
Stack trace:
#0 {main}
  thrown in test.php on line 7

Since HHVM is dropping support for PHP, I expect that this last step will soon become unnecessary.

How not to handle errors

Some applications put a series of checks before each I/O operation, and then simply perform the operation with no error checking. An example of this would be:

<?php
$filename = "not-a-real-file.txt";
// Check everything!
if(!file_exists($filename)) {
  throw new Exception("$filename does not exist");
}
if(!is_file($filename)) {
  throw new Exception("$filename is not a file");
}
if(!is_readable($filename)) {
  throw new Exception("$filename cannot be read");
}
// Assume that nothing can possibly go wrong..
$text = @file_get_contents($filename);
echo $text;

You could probably make a reasonable-sounding argument that checks are a good idea, but I consider them to be misguided:

  • If you skip any actual error handling, then your code is going to fail in more surprising ways when you encounter an I/O problem that could not be detected.
  • If you do perform correct error handling as well, then the extra checks add nothing other than more branches to test.

Lastly, beware of false positives. For example, the above snippet will reject HTTP URL’s, which are perfectly valid for file_get_contents.

Conclusion

Most PHP code now uses try/catch/finally blocks to handle problems, but the ecosystem really values backwards compatibility, so existing functions are rarely changed.

The style of error reporting used in these I/O functions is by now a legacy quirk, and should be wrapped to consistently throw a useful Exception.

How to get a notification when your site appears on HackerNews

A few weeks ago, somebody listed this article about GNU parallel to HackerNews, and I got a small wave of new visitors trawling my blog.

I don’t actively monitor referrers to this site, so I was oblivious to this until a few days afterward. Aware of the Slashdot Effect, I thought I should set up some free tools to remind me to log in and check the site’s health if it happens again:

Hacker News RSS feeds

Hacker news does not publish it’s own RSS feeds, so I had to use a third-party service. I found a URL that would feed me the feed to the latest articles off this site, by searching the “url” attribute:

https://hnrss.org/newest?q=mike42.me&search_attrs=url

This URL gives an RSS feed, as you might expect:

IFTTT

To save me installing and checking a local reader, I set up IFTTT to send me an email when new articles are published to this feed.

The “RSS feed to email” applet is perfect for this kind of consumer-grade automation.

I set it up with the URL, and well, nothing interesting happens. Only new articles are emailed, so this is expected.

Example email

Since I also use this IFTTT applet to get notifications for other RSS feeds, I do know that it works. Within an hour or two of a new article appearing in the feed, the applet gives you an email from RSS Feed via IFTTT <action@ifttt.com>:

It’s not exactly a real-time notification, but it’s a good start. At this point, I know when my posts are being linked from a specific high-traffic site, which is a good start.

For any site bigger than a personal blog, you might be interested in handling extra traffic rather than just be vaguely aware of it, but I’ll save that for a future post.

Recovering text from a receipt with escpos-tools

I have written previously about how to generate receipts for printers which understand ESC/POS. Today, I thought I would write about the opposite process.

Unlike PostScript, the ESC/POS binary language is not commonly understood by software. I wrote a few utilities last year to help change that, called escpos-tools.

In this post, I’ll step through an example ESC/POS binary file that an escpos-tools user sent to me, and show how we can turn it back into a usable format. The tools we are using are:

You might need this sort of process if you need to email a copy of your receipts, or to archive them for audit use.

Printing the file

Binary print files are generated from drivers. I can feed this one back to my printer like this:

cat receipt.bin > /dev/usb/lp0

My Epson TM-T20 receipt printer understands ESC/POS, and prints this out:

Installing escpos-tools

escpos-tools is not packaged yet, so you need git and composer (from the PHP eco-system) to use it.

$ git clone https://github.com/receipt-print-hq/escpos-tools
$ cd escpos-tools
$ composer install

Inspecting the file

There is text in the file, so the first thing you should try to do is esc2text. In this case, which works like this:

$ php esc2text.php receipt.bin

In this case, I got no output, so I switch to -v to show the commands being found.

$ php esc2text.php receipt.bin  -v
[DEBUG] SetRelativeVerticalPrintPositionCmd 
[DEBUG] GraphicsDataCmd 
[DEBUG] GraphicsDataCmd 
[DEBUG] SetRelativeVerticalPrintPositionCmd 
...

This indicates that there is no text being sent to the receipt, only images. We know from the print-out that the images contain text, so we need a few more utilities.

Recovering images from the receipt

To extract the images, use escimages. It runs like this:

$ php escimages.php --file receipt.bin
[ Image 1: 576x56 ]
[ Image 2: 576x56 ]
[ Image 3: 576x56 ]
[ Image 4: 576x56 ]
[ Image 5: 576x56 ]
[ Image 6: 576x56 ]
[ Image 7: 576x56 ]
[ Image 8: 576x52 ]

This gave us 8 narrow images:

Using ImageMagick’s convert command, these can be combined into one image like this:

convert -append receipt-*.png -density 70 -units PixelsPerInch receipt.png

The result is now the same as what our printer would output:

Recovering text from the receipt

Lastly, tesseract is an open source OCR engine which can recover text from images. This image is a lossless copy of what we sent to the printer, which is an “easy” input for OCR.

$ tesseract receipt.png -
Estimating resolution as 279
Test Receipt for USB Printer 1

Mar 17, 2018
10:12 PM



Ticket: 01



Item $0,00

Total $0.00

This quality of output is fairly accurate for an untrained reader.

Conclusion

The escpos-tools family of utilities gives some visibility into the contents of ESC/POS binary files.

If you have a use case which requires working with this type of file, then I would encourage you to consider contributing code or example files to the project, so that the utilities can be improved over time.

Get the code

View on GitHub →

Optimization: How I made my PHP code run 100 times faster

I’ve had a PHP wikitext parser as a dependency of some of my projects since 2012. It has always been a minor performance bottleneck, and I recently decided to do something about it.

I prepared an update to the code over the course of a month, and achieved a speedup of 142 times over the original code.

Before: 20.65 seconds, After: 0.145 seconds

A lot of the information that I could find online about PHP performance was very outdated, so I decided to write a bit about what I’ve learned. This post walks through the process I used, and the things which were were slowing down my code.

This is a long read — I’ve included examples which show slow code, and what I replaced them with. If you’re a serious PHP programmer, then read on!

Lesson 1: Know when to optimize

Conventional wisdom seems to dictate that making your code faster is a waste of developer time.

I think it makes you a better programmer to occasionally optimize something in a language that you normally work with. Having a well-calibrated intuition about how your code will run is part of becoming proficient in a language, and you will tend to create fewer performance problems if you’ve got that intuition.

But you do need to be selective about it. This code has survived in the wild for over five years, and I think I will still be using it in another five. This code is also a good candidate because it does not access external resources, so there is only one component to examine.

Lesson 2: Write tests

In the spirit of MakeItWorkMakeItRightMakeItFast, I started by checking my test suite so that I could refactor the code with confidence.

In my case, I haven’t got good unit tests, but I have input files that I can feed through the parser to compare with known-good HTML output, which serves the same purpose:

php example.php > out.txt
diff good.txt out.txt

I ran this after every change to the code, so that I could be sure that the changes were not affecting the output.

Lesson 3: Profile your code & Question your assumptions

Code profiling allows you see how each part of your program is contributing to its overall run-time. This helps you to target your optimization efforts.

The two main debuggers for PHP are Zend and Xdebug, which can both profile your code. I have xdebug installed, which is the free debugger, and I use the Eclipse IDE, which is the free IDE. Unfortunately, the built-in profiling tool in Eclipse seems to only support the Zend debugger, so I have to profile my scripts on the command-line.

The best sources of information for this are:

On Debian or Ubuntu, xdebug is installed via apt-get:

sudo apt-get install php-cli php-xdebug

On Fedora, the package is called php-pecl-xdebug, and is installed as:

sudo dnf install php-pecl-xdebug

Next, I executed a slow-running example script with profiling enabled:

php -dxdebug.profiler_enable=1 -dxdebug.profiler_output_dir=. example.php

This produces a profile file, which you can use any valgrind-compatible tools to inspect. I used kcachegrind

sudo apt-get install kcachegrind

And for fedora:

sudo dnf install kcachegrind

You can locate and open the profile on the command-line like so:

ls
kcachegrind cachegrind.out.13385

Before profiling, I had guessed that the best way to speed up the code would be to reduce the amount of string concatenation. I have lots of tight loops which append characters one-at-a-time:

$buffer .= "$c"

Xdebug showed me that my guess was wrong, and I would have wasted a lot of time if I tried to remove string concatenation.

kcachegrind screen capture

Instead, it was clear that I was

  • Calculating the same thing hundreds of times over.
  • Doing it inefficiently.

Lesson 4: Avoid multibyte string functions

I had used functions from the mbstring extension (mb_strlen, mb_substr) to replace strlen and substr throughout my code. This is the simplest way to add UTF-8 support when iterating strings, is commonly suggested, and is a bad idea.

What people do

If you have an ASCII string in PHP and want to iterate over each byte, the idiomatic way to do it is with a for loop which indexes into the string, something like this:

<?php
// Make a test string
$testString = str_repeat('a', 60000);
// Loop through test string
$len = strlen($testString);
for($i = 0; $i < $len; $i++) {
  $c = substr($testString, $i, 1);
  // Do work on $c
  // ...
}

I’ve used substr here so that I can show that it has the same usage as mb_substr, which generally operates on UTF-8 characters. The idiomatic PHP for iterating over a multi-byte string one character at a time would be:

<?php
// Make a test string
$testString = str_repeat('a', 60000);
// Loop through test string
$len = mb_strlen($testString);
for($i = 0; $i < $len; $i++) {
  $c = mb_substr($testString, $i, 1);
  // Do work on $c
  // ...
}

Since mb_substr needs to parse UTF-8 from the start of the string each time it is called, the second snippet runs in polynomial time, where the snippet that calls substr in a loop is linear.

With a few kilobytes of input, this makes mb_substr unacceptably slow.

substr: 0.03 seconds, mb_substr: 4.23 seconds

Averaging over 10 runs, the mb_substr snippet takes 4.23 seconds, while the snippet using substr takes 0.03 seconds.

What people should do

Split your strings into bytes or characters before you iterate, and write methods which operate on arrays rather than strings.

You can use str_split to iterate over bytes:

<?php
// Make a test string
$testString = str_repeat('a', 60000);
// Loop through test string
$testArray = str_split($testString);
$len = count($testArray);
for($i = 0; $i < $len; $i++) {
  $c = $testArray[$i];
  // Do work on $c
  // ...
}

And for unicode strings, use preg_split. I learned about this trick from StackOverflow, but it might not be the fastest way. Please leave a comment if you have an alternative!

<?php
// Make a test string
$testString = str_repeat('a', 60000);
// Loop through test string
$testArray = preg_split('//u', $testString, -1, PREG_SPLIT_NO_EMPTY);
$len = count($testArray);
for($i = 0; $i < $len; $i++) {
  $c = $testArray[$i];
  // Do work on $c
  // ...
}

By converting the string to an array, you can pay the penalty of decoding the UTF-8 up-front. This is a few milliseconds at the start of the script, rather than a few milliseconds each time you need to read a character.

str_split: 0.0097s, preg_split: 0.0160s

After discovering this faster alternative to mb_substr, I systematically removed every mb_substr and mb_strlen from the code I was working on.

Lesson 5: Optimize for the most common case

Around 50% of the remaining runtime was spent in a method which expanded templates.

To parse wikitext, you first need to expand templates, which involves detecting tags like {{ template-name | foo=bar }} and <noinclude></noinclude>.

My 40 kilobyte test file had fewer than 100 instances of { and <, | and =, so I added a short-circuit to the code to skip most of the processing, for most of the characters.

<?php
self::$preprocessorChars = [
    '<' => true,
    '=' => true,
    '|' => true,
    '{' => true
];

// ...
for ($i = 0; $i < $len; $i++) {
    $c = $textChars[$i];
    if (!isset(self::$preprocessorChars[$c])) {
        /* Fast exit for characters that do not start a tag. */
        $parsed .= $c;
        continue;
    }
   // ... Slower processing 
}

The slower processing is now avoided 99.75% of the time.

Checking for the presence of a key in a map is very fast. To illustrate, here are two examples which each branch on { and <, | and =.

This one uses a map to check each character:

<?php
// Make a test string
$testString = str_repeat('a', 600000);
$chars = [
    '<' => true,
    '=' => true,
    '|' => true,
    '{' => true
];
// Loop through test string
$testArray = preg_split('//u', $testString, -1, PREG_SPLIT_NO_EMPTY);
$len = count($testArray);
$parsed = "";
for($i = 0; $i < $len; $i++) {
  $c = $testArray[$i];
  if(!isset($chars[$c])) {
    $parsed .= $c;
    continue;
  }
  // Never executed
}

While one uses no map, and has four !== checks instead:

<?php
// Make a test string
$testString = str_repeat('a', 600000);
// Loop through test string
$testArray = preg_split('//u', $testString, -1, PREG_SPLIT_NO_EMPTY);
$len = count($testArray);
$parsed = "";
for($i = 0; $i < $len; $i++) {
  $c = $testArray[$i];
  if($c !== "<" && $c !== "=" && $c !== "|" && $c !== "{") {
    $parsed .= $c;
    continue;
  }
  // Never executed
}

Even though the run time of each script includes the generation of a 600kB test string, the difference is still visible:

0.29 seconds with map, 0.37 seconds without map

Averaging over 10 runs, the code took 0.29 seconds when using a map, while it took 0.37 seconds to run the example with used four !== statements.

I was a little surprised by this result, but I’ll let the data speak for itself rather than try to explain why this is the case.

Lesson 6: Share data between functions

The next item to appear in the profiler was the copious use of array_slice.
My code uses recursive descent, and was constantly slicing up the input to pass around information. The array slicing had replaced earlier string slicing, which was even slower.

I refactored the code to pass around the entire string with indices rather than actually cutting it up.

As a contrived example, these scripts each use a (very unnecessary) recursive-descent parser to take words from the dictionary and transform them like this:

example --> (example!)

The first example slices up the input array at each recursion step:

<?php
function handleWord(string $word) {
  return "($word!)\n";
}

/**
 * Parse a word up to the next newline.
 */
function parseWord(array $textChars) {
  $parsed = "";
  $len = count($textChars);
  for($i = 0; $i < $len; $i++) {
    $c = $textChars[$i];
    if($c === "\n") {
      // Word is finished because we hit a newline
      $start = $i + 1; // Past the newline
      $remainderChars = array_slice($textChars, $start , $len - $start);
      return array('parsed' => handleWord($parsed), 'remainderChars' => $remainderChars);
    }
    $parsed .= $c;
  }
  // Word is finished because we hit the end of the input
  return array('parsed' => handleWord($parsed), 'remainderChars' => []);
}

/**
 * Accept newline-delimited dictionary
 */
function parseDictionary(array $textChars) {
  $parsed = "";
  $len = count($textChars);
  for($i = 0; $i < $len; $i++) {
    $c = $textChars[$i];
    if($c === "\n") {
      // Not a word...
      continue;
    }
    // This is part of a word
    $start = $i;
    $remainderChars = array_slice($textChars, $start, $len - $start);
    $result = parseWord($remainderChars);
    $textChars = $result['remainderChars'];
    $len = count($textChars);
    $i = -1;
    $parsed .= $result['parsed'];
  }
  return array('parsed' => $parsed, 'remainderChars' => []);
}

// Load file, split into characters, parse, print result
$testString = file_get_contents("words");
$testArray = preg_split('//u', $testString, -1, PREG_SPLIT_NO_EMPTY);
$ret = parseDictionary($testArray);
file_put_contents("words2", $ret['parsed']);

While the second one always takes an index into the input array:

<?php
function handleWord(string $word) {
  return "($word!)\n";
}

/**
 * Parse a word up to the next newline.
 */
function parseWord(array $textChars, int $idxFrom = 0) {
  $parsed = "";
  $len = count($textChars);
  for($i = $idxFrom; $i < $len; $i++) {
    $c = $textChars[$i];
    if($c === "\n") {
      // Word is finished because we hit a newline
      $start = $i + 1; // Past the newline
      return array('parsed' => handleWord($parsed), 'remainderIdx' => $start);
    }
    $parsed .= $c;
  }
  // Word is finished because we hit the end of the input
  return array('parsed' => handleWord($parsed), $i);
}

/**
 * Accept newline-delimited dictionary
 */
function parseDictionary(array $textChars, int $idxFrom = 0) {
  $parsed = "";
  $len = count($textChars);
  for($i = $idxFrom; $i < $len; $i++) {
    $c = $textChars[$i];
    if($c === "\n") {
      // Not a word...
      continue;
    }
    // This is part of a word
    $start = $i;
    $result = parseWord($textChars, $start);
    $i = $result['remainderIdx'] - 1;
    $parsed .= $result['parsed'];
  }
  return array('parsed' => $parsed, 'remainderChars' => []);
}

// Load file, split into characters, parse, print result
$testString = file_get_contents("words");
$testArray = preg_split('//u', $testString, -1, PREG_SPLIT_NO_EMPTY);
$ret = parseDictionary($testArray);
file_put_contents("words2", $ret['parsed']);

The run-time difference between these examples is again very pronounced:

3.04s with slicing, 0.0302s with no slicing

Averaging over 10 runs, the code snippet which extracts sub-arrays took 3.04 seconds, while the code that passes around the entire array ran in 0.0302 seconds.

It’s amazing that such an obvious inefficiency in my code had been hiding behind larger problems before.

Lesson 7: Use scalar type hinting

Scalar type hinting looks like this:

function foo(int bar, string $baz) {
...
}

This is the secret PHP performance feature that they don’t tell you about. It does not actually change the speed of your code, but does ensure that it can’t be run on the slower PHP releases before 7.0.

PHP 7.0 was released in 2015, and it’s been established that it is twice as fast as PHP 5.6 in many cases.

I think it’s reasonable to have a dependency on a supported version of your platform, and performance improvements like this are a good reason to update your minimum supported version of PHP.

By breaking compatibility with scalar type hints, you ensure that your software does not appear to “work” in a degraded state.

Simplifying the compatibility landscape will also make performance a more tractable problem.

Lesson 8: Ignore advice that isn’t backed up by (relevant) data

While I was updating this code, I found a lot of out-dated claims about PHP performance, which did not hold true for the versions that I support.

To call out a few myths that still seem to be alive:

  • Style of quotes impacts performance.
  • Use of by-val is slower than by-ref for variable passing.
  • String concatenation is bad for performance.

I attempted to implement each of these, and wasted a lot of time. Thankfully, I was measuring the run-time and using version control, so it was easy for me to identify and discard changes which had a negligible or negative performance impact.

If somebody makes a claim about something being fast or slow in PHP, best assume that it doesn’t apply to you, unless you see some example code with timings on a recent PHP version.

Conclusion

If you’ve read this far, then I hope you’ve seen that modern PHP is not an intrinsically slow language. A significant speed-up is probably achievable on any real-world code-base with such endemic performance issues.

Before: 20.65 seconds, After: 0.145 seconds

To repeat the graphic from the introduction: the test file could be parsed in 20.65 seconds on the old code, and 0.145 seconds on the improved code (averaging over 10 runs, as before).

At this point I declared my efforts “good enough” and moved on. The job is done: although another pass could speed it up further, this code is no longer slow enough to justify the effort.

Since I’ve closed the book on PHP 5, I would be interested in knowing whether there is a faster way to parse UTF-8 with the new IntlChar functions, but I’ll have to save that for the next project.

Now that you’ve seen some very inefficient ways to write PHP, I also hope that you will be able to avoid introducing similar problems in your own projects!

Raster to vector conversion tips

I have recently been working with some low-resolution bitmap fonts for a few projects, which needed to be re-sized for different uses.

I’ll share here a few tricks that I use to get the detail out of each letter as a vector, so that it can be rendered at a higher resolution.

Example

A good example might be this picture of a hieroglyph from the WikiHiero MediaWiki extension, which is 28 pixels wide:

Scaled up, it looks like this:

So small images become very pixellated when you resize them. The good news is that even from a small image, there is quite a lot of detail which we can use. If we’re smart about it, the glyph can be rendered like this:

You can still see some artifacts because of the low resolution of the input, but it’s clearly an improvement.

You will need

  • ImageMagick for raster operations.
  • potrace to trace the image.
  • Inkscape to produce a high-quality raster.

Steps

Prepare

The tracing program will convert the image to pure black & white as its first step. These transformations make sure that the detail is preserved for tracing.

  • Padding by 10px on every side to reduce distortion around the edges.
  • Scaling by 10x with interpolation
  • Convert transparency to white
convert hiero_A1.png -bordercolor white -border 10x10 \
    -resize 1000% -flatten hiero_A1.pnm

The 29×38 grey+alpha input becomes a blurry 490 x 580 greymap surrounded by whitespace.

This preparation is important, because a large blurry graymap will retain a lot more detail than the original image when a threshold is applied to convert it pure black and white:

Trace

The potrace program will threshold and trace the input image. Here, we will produce an SVG so that we can make it transparent.

The k value affects the threshold operation. It can be increased for a bolder, darker glyph, or reduced for a finer one.

potrace hiero_A1.pnm -k 0.30 --svg

This gives you an SVG with padding:

If you only want a vector, then you can stop here. The next steps will reproduce a smaller PNG with transparency and the correct padding.

I couldn’t find a reliable way to programmatically crop the image back to its original padding as an SVG, but in my case I needed to convert it back to a bitmap anyway, so I cropped it later.

Render

Use Inkscape to convert the SVG back to a large PNG. The output size here is twice as large as the file we traced, just to leave plenty of pixels to work with.

inkscape -z -e hiero_A1_big.png hiero_A1.svg -w 980

Like the SVG, there is still a lot of whitespace here. The image is now a PNG with a transparent background, and unlike the file we traced, the edges of the curves are now anti-aliased.

Crop

Everything is 20x its original size, to get the image, we need to drop 200px of padding from the left and top, then read 580×760 pixels (20 times the 29×38 start).

convert hiero_A1_big.png -crop 580x760+200+200 hiero_A1_cropped.png

This produces a 580×760 image in the same aspect ratio as the original input file.

Scale down

In my case, I only needed to double the resolution of the input file, so I scaled this file down from there.

hiero_A1_cropped.png -resize 58x76 hiero_A1_outp1.png

Success!

As a script

I got these steps from a script that I wrote for doubling the size of a PNG image so that it can be re-used on newer displays.

Usage:

./tracepng.sh foo.png

Where tracepng.sh is:

#!/bin/bash
# A script to upscale small bitmaps in PNG format.
# Pad, upscale, trace, render, crop then downscale.
if [ $# != 1 ]; then
  echo "Usage $0 input.png"
  exit
fi
set -exu
# Names of all the files we will produce
INP_FILE=$1
SVG_FILE="${INP_FILE%.*}.svg"
PNM_FILE="${INP_FILE%.*}.pnm"
LARGE_FILE="${INP_FILE%.*}_big.png"
LARGE_FILE_CROPPED="${INP_FILE%.*}_cropped.png"
OUTP_FILE="${INP_FILE%.*}_outp1.png"
COMPARISON_FILE="${INP_FILE%.*}_outp2.png"

# Width originally
# https://stackoverflow.com/questions/4670013/fast-way-to-get-image-dimensions-not-filesize
IMG_WIDTH=$(identify -format "%w" "$INP_FILE")
IMG_HEIGHT=$(identify -format "%h" "$INP_FILE")
TARGET_WIDTH=$((IMG_WIDTH * 2))
TARGET_HEIGHT=$((IMG_HEIGHT * 2))

# Make huge raster w/ border (whitespace is your friend for black/white interpolation and tracing), then convert to SVG
convert ${INP_FILE} -bordercolor white -border 10x10 -resize 1000% -flatten ${PNM_FILE}

# https://en.wikipedia.org/wiki/Potrace
potrace ${PNM_FILE} -k 0.30 --svg > ${SVG_FILE}

# Target width for intermediate file
EXPANDED_WIDTH=$(((IMG_WIDTH + 20) * 20))
EXPANDED_HEIGHT=$(((IMG_HEIGHT + 20) * 20))
INNER_WIDTH=$((IMG_WIDTH * 20))
INNER_HEIGHT=$((IMG_HEIGHT * 20))
# https://stackoverflow.com/questions/9853325/how-to-convert-a-svg-to-a-png-with-image-magick
inkscape -z -e ${LARGE_FILE} ${SVG_FILE} -w ${EXPANDED_WIDTH}

# Cut new edges off
# http://www.imagemagick.org/Usage/crop/
convert ${LARGE_FILE} -crop ${INNER_WIDTH}x${INNER_HEIGHT}+200+200 ${LARGE_FILE_CROPPED}
convert ${LARGE_FILE_CROPPED} -resize ${TARGET_WIDTH}x${TARGET_HEIGHT} ${OUTP_FILE}

Acknowledgment

The images here are from WikiHiero, and can be remixed under the GNU General Public License 2.0.

How to communicate with USB and networked devices from in-browser Javascript

I recently combined a few tools on Linux to create a local Websocket listener, which could forward raw data to a USB printer, so that it could be accessed using Javascript in a web browser.

Why would you want this? I have point of sale applications (POS) in mind, which need to send raw data to a printer. For these applications, the browser and operating system print systems are not appropriate, since they prompt, spool, and badly render pages by converting them to low-fidelity raster images.

Web interfaces are becoming common for point-of-sale applications. The web page could be served from somewhere outside your local network, which is why we need to get the client-side Javascript involved.

The tools

To run on the client computer:

And to generate the print data on the webserver:

We will use these tools to provide some plumbing, so that we can retrieve the print data, and send it off to the printer from client-side Javascript.

Client computer

The client computer was a Linux desktop system. Both of the tools we need are available in the Debian repositories:

sudo apt-get install websockify socat

Listen for websocket connections on port 5555 and pass them to localhost:7000:

websockify 5555 localhost:7000

Listen for TCP connections on localhost port 7000 and pass them to the USB device (more advanced version of this previous post):

socat -u TCP-LISTEN:7000,fork,reuseaddr,bind=127.0.0.1 OPEN:/dev/usb/lp0

Web page

I made a self-contained web-page to provide a button which requested a print file from the network and passed it to the local websocket.

This is slightly modified from a similar example that I used for a previous project.

<html>
<head>
    <meta charset="UTF-8">
    <title>Web-based raw printing example</title>
</head>
<body>
<h1>Web-based raw printing example</h1>

<p>This snippet forwards raw data to a local websocket.</p>

<form>
  <input type="button" onclick="directPrintBytes(printSocket, [0x1b, 0x40, 0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x77, 0x6f, 0x72, 0x6c, 0x64, 0x0a, 0x1d, 0x56, 0x41, 0x03]);" value="Print test string"/>
  <input type="button" onclick="directPrintFile(printSocket, 'receipt-with-logo.bin');" value="Load and print 'receipt-with-logo'" />
</form>

<script type="text/javascript">
/**
 * Retrieve binary data via XMLHttpRequest and print it.
 */
function directPrintFile(socket, path) {
  // Get binary data
  var req = new XMLHttpRequest();
  req.open("GET", path, true);
  req.responseType = "arraybuffer";
  console.log("directPrintFile(): Making request for binary file");
  req.onload = function (oEvent) {
    console.log("directPrintFile(): Response received");
    var arrayBuffer = req.response; // Note: not req.responseText
    if (arrayBuffer) {
      var result = directPrint(socket, arrayBuffer);
      if(!result) {
        alert('Failed, check the console for more info.');
      }
    }
  };
  req.send(null);
}

/**
 * Extract binary data from a byte array print it.
 */
function directPrintBytes(socket, bytes) {
  var result = directPrint(socket, new Uint8Array(bytes).buffer);
  if(!result) {
    alert('Failed, check the console for more info.');
  }
}

/**
 * Send ArrayBuffer of binary data.
 */
function directPrint(socket, printData) {
  // Type check
  if (!(printData instanceof ArrayBuffer)) {
    console.log("directPrint(): Argument type must be ArrayBuffer.")
    return false;
  }
  if(printSocket.readyState !== printSocket.OPEN) {
    console.log("directPrint(): Socket is not open!");
    return false;
  }
  // Serialise, send.
  console.log("Sending " + printData.byteLength + " bytes of print data.");
  printSocket.send(printData);
  return true;
}

/**
 * Connect to print server on startup.
 */
var printSocket = new WebSocket("ws://localhost:5555", ["binary"]);
printSocket.binaryType = 'arraybuffer';
printSocket.onopen = function (event) {
  console.log("Socket is connected.");
}
printSocket.onerror = function(event) {
  console.log('Socket error', event);
};
printSocket.onclose = function(event) {
  console.log('Socket is closed');
}
</script>
</body>
</html>

Webserver

On a Apache HTTP webserver, I uploaded the above webpage, and a file with some raw print data, called receipt-with-logo.bin. This file was generated with escpos-php and is available in the repository:

For reference, the test file receipt-with-logo.bin contains this content:

Test

I opened up the web page on the client computer with socat, websockify and an Epson TM-T20II connected. After clicking the “Print” button, the file was sent to my printer. Success!

Because I wasn’t closing the websocket connection, only one browser window could access the printer at a time. Still, it’s a good demo of the basic idea.

To take this from an example to something you might deploy, you would basically just need to keep socat and websockify running in the background as a service (via systemd), close the socket when it’s not being used, and integrate it into a real app.

Different printers, different forwarding

The socat tool can connect to USB, Serial, or Ethernet printers fairly easily.

USB

Forward TCP connections from port 7000 to the receipt printer at /dev/usb/lp0:

socat TCP4-LISTEN:7000,fork /dev/usb/lp0

You can also access the device files directly under /sys/bus/usb/devices/

Serial

Forward TCP connections from port 7000 to the receipt printer at /dev/usb/ttyS0:

socat TCP4-LISTEN:7000,fork /dev/usb/ttyS0

Network

Forward TCP connections from port 7000 to the receipt printer at 10.1.2.3:9100:

socat -u TCP-LISTEN:7000,fork,reuseaddr,bind=127.0.0.1 TCP4-CONNECT:10.1.2.3:9100

You can forward websocket connections directly to an Ethernet printer with websockify:

socat -u TCP-LISTEN:7000,fork,reuseaddr,bind=127.0.0.1 localhost:7000

Other types of printer

If you have another type of printer, such as one accessible only via smbclient or lpr, then you will need to write a helper script.

Direct printing is faster, so I don’t use this method. Check the socat EXEC documentation or man socat if you want to try this.

Future

I’ve had a lot of questions on the escpos-php bug tracker from users who are attempting to print from cloud-hosted apps, which is why I tried this setup.

The browser is a moving target. I have previously written receipt-print-hq/chrome-raw-print, a dedicated app for forwarding WebSocket connections to USB, but that will stop working in a few months when Chrome apps are discontinued. Some time later, WebUSB should become available to make this type of printer available in the browser, which should be infinitely useful for connecting to accessories in point-of-sale setups.

The available tools for generating ESC/POS (receipt printer) binary from the browser are a long way off reaching feature parity with the likes of escpos-php and python-escpos. If you are looking for a side-project, then this a good choice.

Lastly, the socat -u flag makes this all unidirectional, but many types of devices (not just printers) can respond to commands. I couldn’t the end-to-end path to work without this flag, so don’t expect to be able to read from the printer without doing some extra work.

Useful links

Some links that I found while setting this up-

Get the code

View on GitHub →

How to use HiDPI displays on Debian 9

I recently added a 4K monitor to my Debian box, and had to set a few things to make it display things at a good size. These high-density moniotors that are becoming common on laptops and desktops are known as “HiDPI” displays.

Currently I get the best results with:

  • Window scaling factor of 2
  • Font scaling 0.90 to make text slightly smaller

Note that “window scaling” is not “upscaling” (stretching an image). In this version of Gnome, it means “single/double/triple DPI”. The implementations are in the process of changing: Soon you should be able to set any scaling factor.

This post assumes a Gnome version around 3.26, which is what you would get as a default if you installed Debian 9 today.

Apply to one user

Under Settings → Devices → Displays, set the Scale to 200%.

Under Tweaks → Fonts, set the Scaling Factor to 0.90.

Next, add these variables to ~/bashrc to apply similar scaling to QT apps.

QT_AUTO_SCREEN_SCALE_FACTOR=0
QT_SCALE_FACTOR=2

Log out and back in to ensure that the settings have applied everywhere.

Apply to any user

If you have a shared system (eg. domain accounts), or want to style the login box as well, then you can set the same settings as below.

These steps are based on answers to the Ask Ubuntu question: Adjust text scaling factor for all users.

nano /usr/share/glib-2.0/schemas/org.gnome.desktop.interface.gschema.xml

Set the text-scaling-factor to 0.9, and the scaling-factor to 2.

<key name="text-scaling-factor" type="d">
  <range min="0.5" max="3.0"/>
  <default>0.9</default>
  <summary>Text scaling factor</summary>
  <description>
    Factor used to enlarge or reduce text display, without changing font si$
  </description>
</key>
<key name="scaling-factor" type="u">
  <default>2</default>
  <summary>Window scaling factor</summary>
  <description>
    Integer factor used to scale windows by. For use on high-dpi screens.
    0 means pick automatically based on monitor.
  </description>
</key>

Re-compile the schemas:

glib-compile-schemas /usr/share/glib-2.0/schemas

Next drop some similar environent variables for QT apps in /etc/profile.d/hidpi.sh to apply it to all users:

export QT_AUTO_SCREEN_SCALE_FACTOR=0
export QT_SCALE_FACTOR=2

After this, reboot. If the setting has applied, then the gdm3 login box will be scaled as well.

How to boot Debian in 4 seconds

This blog post is a throwback to “Booting Debian in 14 seconds” from debian-administration.org, where the author went through some fairly advanced steps to get his low-spec Debian laptop to boot quickly. Debian was version 4.0 at the time, and I recall it taking around 40 seconds to boot on a default desktop install.

In a rare exception to Wirth’s law, waiting for a computer to boot is no longer “a thing”. A default desktop install of Debian includes systemd, and uses a multi-core CPU and SSD quite efficiently. Also, sleep/wake works more reliably than it used to, so boot speed is not as important as it used to be.

On a modern desktop PC, booting Debian 9 (default desktop install) takes me 14 seconds with no extra configuration, so that’s our new low water mark.

Mainly to illustrate how far open source operating systems have come, I’m going to step through a boot process speed-up, the way it looks in 2018.

Summary

Out

You will read about some of these older tricks if you search for Linux Boot speed, and they are all quite irrelevant in 2018, in my humble opinion-

  • Swapping the /bin/sh shell to dash (already the default, also, init scripts are no longer used).
  • Using readahead (gains are tiny unless you have a HDD).
  • noatime” setting on mounts (“relatime” is a default mount option since Linux 2.6).
In

New things that you wont find in pre-systemd guides:

  • systemd-analyze to instrument the boot
  • systemctl to exclude processes from boot
Still relevant
  • bootchart is still useful for drawing pretty graphs
  • Configure GRUB & UEFI not to prompt for input
  • Don’t enable services you don’t need

Process

Remove bootloader delay

Between UEFI and the OS, you will get the bootloader, which will wait for 5 seconds by default to see if you want to select a different item. Start by switching the grub timeout from 5 seconds to 0.

sudo nano /etc/default/grub

Set GRUB_TIMEOUT=0.

Run:

sudo update-grub2
Look at systemd

Use the tool systemd-analyze to draw a picture:

systemd-analyze plot > plot.svg

In my case, it was clear that 9 seconds of the boot was an optional “waiting for network” step.

So, (thank you askubuntu), I disabled that service and rebooted:

$ sudo systemctl disable NetworkManager-wait-online.service
Removed /etc/systemd/system/network-online.target.wants/NetworkManager-wait-online.service.
systemd-analyze plot > plot2.svg

The boot was still taking 4.4 seconds, so, more analysis was in order:

The systemd-timesyncd service was holding things up.

This service runs early in the boot process, reads an old time from a file, and tries to update time over the network. Since I have a working RTC, this is all unnecessary for me, so I removed it and replaced it with chronyd, which is happy to operate in the background.

sudo systemctl disable systemd-timesyncd.service
sudo apt-get install chronyd
sudo systemctl enable chrony

After another reboot:

systemd-analyze plot > plot4.svg

There we go, down to 4.096 seconds with a few minutes of effort. I think that’s acceptable.

The systemd developers are quite certain that you can boot in under 2 seconds, but I wasn’t willing to customise my system to that extent.