I develop a printer driver for ESC/POS receipt printers, and we regularly get feature requests for encoding text in the Chinese, Japanese and Korean languages (“CJK”).
I have recently been looking for a way to add support for these on receipt printers that have no native ability to render them, and thought I would write a bit about some progress so far.
I previously wrote a bit about printing individual bitmaps for each character, where here I am aiming to print entire scripts.
Background
Programmers usually deal with text in UTF-8, but receipt printers don’t. Instead, they still use a series of legacy code pages to represent non-ASCII text. Mapping arbitrary text to something understood by these printers is a huge challenge.
The escpos-php driver will automatically map a lot of western scripts to these code pages. However, if you attempt to send an example string like “日本語” to escpos-php currently, the driver will substitute it with “???”, since it doesn’t know how to convert them to ESC/POS.
On some printers, there are native commands to print Japanese, but for a driver project, we need something with broad compatibility. So, I decided to try to get this working on an Epson TM-T20 variant which has no CJK fonts.
I started by making a new standalone test script, which converts text input into ESC/POS using a cut-down version of the escpos-php printer driver.
$text = file_get_contents("php://stdin");
$connector = new FilePrintConnector("php://stdout");
$printer = new Printer($connector);
$printer -> text($text);
$printer -> cut();
$printer -> close();
I then modified this to print arbitrary UTF-8 text with a local bitmap font. These next sections go through some of the things I had to write to get it all working.
Character representation
I decided to start with the GNU Unifont project, because it ships fixed-width binary fonts in a text format that can be parsed without a font library, is freely licensed, and has excellent coverage.
So the first issue to solve was to do with font sizes:
- Unifont contains characters that are 8 or 16 pixels wide, that cover the entire Unicode Basic Multilingual Plane (BMP), at 16 characters tall.
- ESC/POS supports a fixed 12×24 or a smaller 9×17 font.
- ESC/POS fonts are submitted in a 24 pixel tall format regardless of print area.
Since the characters would be surrounded by too much whitespace in the “Font A” (12×24) representation, I settled on printing in “Font B” (9×17), leaving a one-pixel space underneath, and to the right of each character. These pictures show how the glpyhs (grey) are laid out in the available print area (unused print area in white), in the available memory (unused memory in red).
Note that wider characters have a two-pixel dead-zone on the right. The non-printable 7 pixels at the bottom of the images are ignored by the printer.
The format on the printer for each character stores bits in a column-major format, while most raster formats are row-major, so I wrote a quick converter to rotate the bits. The converter code is not very concise, so I’ll just share a screen capture here. The full code is linked at the end of this post.
Lastly, the output size on paper was tiny, so I set the printer to double the size, which results in text that is around 50% larger than the default output.
Storage of fonts
There is only space for 95 single-width characters in an ESC/POS font, but the scripts are much larger than this.
I treated the font as a queue in this implementation. During the print-out, new characters are added to the font as necessary, and the font is re-written from the front as space runs out. This is also known as a FIFO cache eviction policy.
Input
I converted the string input to an array of Unicode code points to avoid canonicalisaton issues.
$chrArray = preg_split('//u', $text, -1, PREG_SPLIT_NO_EMPTY);
$codePoints = array_map("IntlChar::ord", $chrArray);
foreach($codePoints as $char) {
$this -> writeChar($char);
}
The IntlChar class is provided by an extension which is very useful but not widespread, which limits the portability of this code.
Result
I got the list of languages from the sidebar of a Wikipedia article to use as a test string, since it contains short strings in a large number of scripts.
cat test.txt | php unifont-example.php > /dev/usb/lp0
The output contains a large number of correctly rendered scripts, including the CJK output, which was not previously possible on my printer.
Success!
Advantages
Previously, I have tried generating small images from system fonts to send text to the printer. This is quite costly in terms of processing and data transfer, and the printer is unable to format or wrap the text for you.
Storing glyphs in the custom font area involves transferring less raster data, and allows most text formatting commands to be used.
Limits
These characters are a different size to the native printer fonts, so we can’t mix them on the same line. This means that we can’t use this code to implement an automatic fallback in escpos-php. However, it may appear in a future version as an alternative “PrintBuffer”, which can be explicitly enabled by developers who are not interested in using the native fonts.
The esc2html utility is not able to emulate custom fonts, so the output cannot currently be rendered without an Epson printer.
Also, we simply printed a stream of characters, which is not really how text works. To implement Unicode, we need to be able to join and compose characters, and respect bi-directional text. Unicode text layout is not trivial at all.
Get the code
The full script is available in the escpos-snippets repo on GitHub, where I store prototypes of new functionality that is not yet ready for prime-time.
your link on the third paragraph is dead, I believe it is now linking to this URL: https://mike42.me/blog/2018-03-how-to-print-custom-currency-symbols-on-a-receipt-printer
year-month was added to the second segment of the URL
Thanks @Samsul, corrected!