Site icon Mike's Software Blog

Recovering text from a receipt with escpos-tools

I have written previously about how to generate receipts for printers which understand ESC/POS. Today, I thought I would write about the opposite process.

Unlike PostScript, the ESC/POS binary language is not commonly understood by software. I wrote a few utilities last year to help change that, called escpos-tools.

In this post, I’ll step through an example ESC/POS binary file that an escpos-tools user sent to me, and show how we can turn it back into a usable format. The tools we are using are:

You might need this sort of process if you need to email a copy of your receipts, or to archive them for audit use.

Printing the file

Binary print files are generated from drivers. I can feed this one back to my printer like this:

cat receipt.bin > /dev/usb/lp0

My Epson TM-T20 receipt printer understands ESC/POS, and prints this out:

Installing escpos-tools

escpos-tools is not packaged yet, so you need git and composer (from the PHP eco-system) to use it.

$ git clone https://github.com/receipt-print-hq/escpos-tools
$ cd escpos-tools
$ composer install

Inspecting the file

There is text in the file, so the first thing you should try to do is esc2text. In this case, which works like this:

$ php esc2text.php receipt.bin

In this case, I got no output, so I switch to -v to show the commands being found.

$ php esc2text.php receipt.bin  -v
[DEBUG] SetRelativeVerticalPrintPositionCmd 
[DEBUG] GraphicsDataCmd 
[DEBUG] GraphicsDataCmd 
[DEBUG] SetRelativeVerticalPrintPositionCmd 
...

This indicates that there is no text being sent to the receipt, only images. We know from the print-out that the images contain text, so we need a few more utilities.

Recovering images from the receipt

To extract the images, use escimages. It runs like this:

$ php escimages.php --file receipt.bin
[ Image 1: 576x56 ]
[ Image 2: 576x56 ]
[ Image 3: 576x56 ]
[ Image 4: 576x56 ]
[ Image 5: 576x56 ]
[ Image 6: 576x56 ]
[ Image 7: 576x56 ]
[ Image 8: 576x52 ]

This gave us 8 narrow images:

Using ImageMagick’s convert command, these can be combined into one image like this:

convert -append receipt-*.png -density 70 -units PixelsPerInch receipt.png

The result is now the same as what our printer would output:

Recovering text from the receipt

Lastly, tesseract is an open source OCR engine which can recover text from images. This image is a lossless copy of what we sent to the printer, which is an “easy” input for OCR.

$ tesseract receipt.png -
Estimating resolution as 279
Test Receipt for USB Printer 1

Mar 17, 2018
10:12 PM



Ticket: 01



Item $0,00

Total $0.00

This quality of output is fairly accurate for an untrained reader.

Conclusion

The escpos-tools family of utilities gives some visibility into the contents of ESC/POS binary files.

If you have a use case which requires working with this type of file, then I would encourage you to consider contributing code or example files to the project, so that the utilities can be improved over time.

Get the code

View on GitHub →

Exit mobile version