How to generate professional-quality PDF files from PHP

There are a few ways to go about making PDF files from your PHP web app. Your options are basically-

  1. Put all of your text in a 210mm column and get the user to save it as PDF.
  2. Learn a purpose-built library, such as FPDF (free) or pdflib (proprietary).
  3. Use PHP for generating markup which can be saved to PDF. This is of course LaTeX

This article assumes an intermediate knowledge of both PHP and LaTeX, and that your server is not running Windows.

The software mix

PHP is an open-source server package which generates HTML pages, usually based on some sort of dynamic data. It is equally good at (but less well known for) generating other types of markup.

LaTeX is an open source document typesetting system, which will take a markup file in .tex format, and output a printable document, such as a PDF. The engine I will use here is XeLaTeX, because it supports modern trimmings such as Unicode and OpenType fonts.

Naturally, this post will use PHP to populate a .tex file, and then xelatex to create a PDF for the user.

This sounds straightforward enough, but it may not work with all shared hosts. Check your setup before you read on:

  1. Your server needs PHP, with safe mode disabled, so that it can run commands.
  2. This server needs xelatex, or a suitable substitute such as pdflatex.

A bit about markup

We will be working with .tex templates, which will be valid LaTeX files. The basic rules are:

  1. Define a \newcommand for every variable, so that you can compile the document without PHP.
  2. Drop PHP code in comments, which will print out code to override those variables.

So you will end up with code like this:

% Make placeholders visible
\newcommand{\placeholder}[1]{\textbf{$<$ #1 $>$}}

% Defaults for each variable
\newcommand{\test}{\placeholder{Data here}}

% Fill in
% <?php echo "\n" . "\\renewcommand{\\test}{" . LatexTemplate::escape($data['test']) . "}\n"; ?>

Look messy? A multi-line block of PHP is a little easier to follow. This example is from the body of a table, see if you can figure out the syntax:

%<?php                                                                      /*
% */ foreach($data['invoiceItem'] as $invoiceItem) {                        /*
% */    echo "\n" . LatexTemplate::escape($invoiceItem['item']) . " & " .   /*
% */        LatexTemplate::escape($invoiceItem['qty']) . " & " .            /*
% */        LatexTemplate::escape($invoiceItem['price']) . " & " .          /*
% */        LatexTemplate::escape($invoiceItem['total']) . "\\\\\n";        /*
% */ } ?>

So what about this LatexTemplate::escape() business? In LaTeX, just about every symbol seems to be part of the syntax, so it is sadly not very simple to escape.

I have settled on the following series of str_replace() calls to sanitise information for display. It is crude but effective. Generating LaTex is much like generating SQL, HTML or LDIF from your website: it is quite important to make a habit of wrapping every piece of data with a function to prevent users from writing (‘injecting’) arbitrary code into your document:

/**
 * Series of substitutions to sanitise text for use in LaTeX.
 *
 * http://stackoverflow.com/questions/2627135/how-do-i-sanitize-latex-input
 * Target document should \usepackage{textcomp}
 */
public static function escape($text) {
	// Prepare backslash/newline handling
	$text = str_replace("\n", "\\\\", $text); // Rescue newlines
	$text = preg_replace('/[\x00-\x1F\x7F-\xFF]/', '', $text); // Strip all non-printables
	$text = str_replace("\\\\", "\n", $text); // Re-insert newlines and clear \\
	$text = str_replace("\\", "\\\\", $text); // Use double-backslash to signal a backslash in the input (escaped in the final step).

	// Symbols which are used in LaTeX syntax
	$text = str_replace("{", "\\{", $text);
	$text = str_replace("}", "\\}", $text);
	$text = str_replace("$", "\\$", $text);
	$text = str_replace("&", "\\&", $text);
	$text = str_replace("#", "\\#", $text);
	$text = str_replace("^", "\\textasciicircum{}", $text);
	$text = str_replace("_", "\\_", $text);
	$text = str_replace("~", "\\textasciitilde{}", $text);
	$text = str_replace("%", "\\%", $text);

	// Brackets & pipes
	$text = str_replace("<", "\\textless{}", $text);
	$text = str_replace(">", "\\textgreater{}", $text);
	$text = str_replace("|", "\\textbar{}", $text);

	// Quotes
	$text = str_replace("\"", "\\textquotedbl{}", $text);
	$text = str_replace("'", "\\textquotesingle{}", $text);
	$text = str_replace("`", "\\textasciigrave{}", $text);

	// Clean up backslashes from before
	$text = str_replace("\\\\", "\\textbackslash{}", $text); // Substitute backslashes from first step.
	$text = str_replace("\n", "\\\\", trim($text)); // Replace newlines (trim is in case of leading \\)
	return $text;
}

We then have a template which we can include() from PHP, or run xelatex over. Below is minimal.tex, a minimal example of a PHP-latex template in this form:

% This file is a valid PHP file and also a valid LaTeX file
% When processed with LaTeX, it will generate a blank template
% Loading with PHP will fill it with details

\documentclass{article}
% Required for proper escaping
\usepackage{textcomp} % Symbols
\usepackage[T1]{fontenc} % Input format

% Because Unicode etc.
\usepackage{fontspec} % For loading fonts
\setmainfont{Liberation Serif} % Has a lot more symbols than Computer Modern

% Make placeholders visible
\newcommand{\placeholder}[1]{\textbf{$<$ #1 $>$}}

% Defaults for each variable
\newcommand{\test}{\placeholder{Data here}}

% Fill in
% <?php echo "\n" . "\\renewcommand{\\test}{" . LatexTemplate::escape($data['test']) . "}\n"; ?>

\begin{document}
	\section{Data From PHP}
	\test{}
\end{document}

Generate a PDF on the server

Here is where the fun begins. There is no plugin for compiling a LaTeX document, so we need to directly execute the command on a file.

Looks like we need to save the output somewhere then. You would generate your filled-in LaTeX code in a temporary file by doing something like this:

/**
 * Generate a PDF file using xelatex and pass it to the user
 */
public static function download($data, $template_file, $outp_file) {
	// Pre-flight checks
	if(!file_exists($template_file)) {
		throw new Exception("Could not open template");
	}
	if(($f = tempnam(sys_get_temp_dir(), 'tex-')) === false) {
		throw new Exception("Failed to create temporary file");
	}

	$tex_f = $f . ".tex";
	$aux_f = $f . ".aux";
	$log_f = $f . ".log";
	$pdf_f = $f . ".pdf";

	// Perform substitution of variables
	ob_start();
	include($template_file);
	file_put_contents($tex_f, ob_get_clean());

The next step is to execute your engine of choice on the output files:

	// Run xelatex (Used because of native unicode and TTF font support)
	$cmd = sprintf("xelatex -interaction nonstopmode -halt-on-error %s",
			escapeshellarg($tex_f));
	chdir(sys_get_temp_dir());
	exec($cmd, $foo, $ret);

Once this is done, you can delete a lot of the extra LaTeX files, and check if a .pdf appeared as expected:

	// No need for these files anymore
	@unlink($tex_f);
	@unlink($aux_f);
	@unlink($log_f);

	// Test here
	if(!file_exists($pdf_f)) {
		@unlink($f);
		throw new Exception("Output was not generated and latex returned: $ret.");
	}

And of course, send the completed file back via HTTP:

	// Send through output
	$fp = fopen($pdf_f, 'rb');
	header('Content-Type: application/pdf');
	header('Content-Disposition: attachment; filename="' . $outp_file . '"' );
	header('Content-Length: ' . filesize($pdf_f));
	fpassthru($fp);

	// Final cleanup
	@unlink($pdf_f);
	@unlink($f);
}

The static functions escape($text) and download($data, $template_file, $outp_file) are together placed into a class called LatexTemplate for the remainder of the example (complete file on GitHub).

Gluing it all together

With the library and template, it is quite easy to set up a PHP script which triggers the above code:

<?php
require_once('../LatexTemplate.php');

$test = "";
if(isset($_GET['t'])) {
	// Make the LaTeX file and send it through
	$test = $_GET['t'];
	if($test =="") {
		// Test pattern to show symbol handling
		for($i = 0; $i < 256; $i++) {
			$test .= chr($i) . " . ";
		}
	}

	try {
		LatexTemplate::download(array('test' => $test), 'minimal.tex', 'foobar.pdf');
	} catch(Exception $e) {
		echo $e -> getMessage();
	}

}
?>
<html>
<head>
<title>LaTeX test (minimal)</title>
</head>
</html>
<body>
	<p>Enter some text to be placed on the output:</p>
	<form>
		<input type="text" name="t" /><input type="submit" value="Generate" />
	</form>
</body>
</html>

The above code will show a form, which asks for input. When it gets some text, it will generate a PDF containing the text. If no text is given, it will output an ASCII table, simply to show that it can handle the symbols.

Once the template code is hidden away, this powerful technique is easily applied.

Results

This is only a minimal example. In any real application, your template would be more extensive.

Compiling the template directly creates this PDF:

From the web, a form is presented to fill this single field:

Which results in a PDF containing the user data:

Tips

  1. The text after \end{document} is not even parsed in latex. Use this area to write <?php ?> with
    fewer constraints.
  2. Consult the github repository for this code to see the complete example.
  3. Comment out the line @unlink($tex_f); of you want to preserve (for debugging, etc) the generated markup.

How to query Microsoft SQL Server from PHP

This post is for anybody who runs a GNU/Linux server and needs to query a MSSQL database. This setup will work on Debian and its relatives. As it’s a dense mix of technologies, so I’ve included all of the details which worked for me.

An obvious note: Microsoft SQL is not an ideal choice of database to pair with a GNU/Linux server, but may be acceptable if you are writing something which needs to import some data from external application which has a better reason to be using it.

A command-line alternative to this setup would be sqsh, which will let you running scheduled queries without PHP, if that’s what you’re after.

Prerequisites

Once you have PHP, the required libraries can be fetched with:

sudo apt-get install unixodbc php5-odbc tdsodbc

MSSQL is accessed with the FreeTDS driver. Once the above packages are installed, you need to tell ODBC where to find this driver, by adding the following block to /etc/odbcinst.ini:

[FreeTDS]
Description=MSSQL DB
Driver=/usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so
UsageCount=1

The path is different on platforms other than amd64. Check the file list for the tdsodbc package on your architecture if you lose track of the path.

The next step requires that you know the database server address, version, and database name. Add a block for your database to the end of /etc/odbc.ini:

[foodb]
Driver = FreeTDS
Description = Foo Database
Trace = Yes
TraceFile = /tmp/sql.log
ForceTrace = yes
Server = 10.x.x.x
Port = 1433
Database = FooDB
TDS_Version = 8.0

Experiment with TDS_Version values if you have issues connecting. Different versions of MSSQL require different values. The name of the data source (‘foodb’), the Database, Description and Server are all bogus values which you will need to fill.

An example

For new PHP scripts, database grunt-work is invariably done via PHP Data Objects (PDO). The good news is, it is easy to use it with MSSQL from here.

The below file takes a query on standard input, throws it at the database, and returns the result as comma-separated values.

Save this as query.php and fill in your data source (‘odbc:foodb’ here), username, and password.

#!/usr/bin/env php
<?php
$query = file_get_contents("php://stdin");
$user = 'baz;
$pass = 'super secret password here';

$dbh = new PDO('odbc:foodb', $user, $pass);
$dbh -> setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$sth = $dbh -> prepare($query);
$sth -> execute();
$results = $sth -> fetchAll(PDO::FETCH_ASSOC);

/* Quick exit if there are no rows */
if(count($results) == 0) {
	return 0;
}
$f = fopen("php://stdout", "w");

/* Output header */
$a = $results[0];
$header = array();
foreach($a as $key => $val) {
	$header[] = $key;
}
fputcsv($f, $header);

/* Output rows */
foreach($results as $result) {
	fputcsv($f, $result);
}

fclose($f);

To test the new script, first make it executable:

chmod +x query.php

To run a simple test query:

echo "SELECT Name from sys.tables ORDER BY Name;" | ./query.php

Refining the setup

The above script has some cool features: It’s short, actually useful, and it sets PDO::ERRMODE_EXCEPTION. This means that if something breaks, it will fail loudly and tell you why.

Hopefully, if your setup has issues, you can track down the cause with the error, and solve it by scrolling through this how-to again.

If you encounter a MSSQL datablase with an unknown schema, then you may want to list al rows and columns. This is achieved with:

SELECT tables.name AS tbl, columns.name AS col FROM sys.columns JOIN sys.tables ON columns.object_id = tables.object_id ORDER BY tbl, col;

The catch

I’ve run into some bizarre limitations using this. Be sure to run it on a server which you can update at the drop of a hat.

A mini-list of issues I’ve seen with this combination of software (no sources as I never tracked down the causes):

  • An old version of the driver would segfault PHP, apparently when non-ASCII content appeared in a text field.
  • Substituting non-text values fails in the version I am using, although Google suggests that updating the ODBC driver fixes this.

Winning 2048 game with key-mashing?

This new, simple, addictive game is out, called 2048. You need to slide two numbers together, resulting in a bigger number, in ever-increasing powers of two. You get 2048, and you win.

2014-03-2048-1

I noticed that somebody already wrote neat AI for it, although it does run quite slowly. But then I also noticed a friend mashing keys in a simple pattern, and thought I should test whether this was more effective. The answer: it kinda is.

2014-03-2048-2

At least in the first half of the game, a simple key-mashing pattern is a much faster way to build high numbers. The PHP script below will usually get to 512 without much trouble, but rarely to 1024. I would suggest running it for a while, and then taking over with some strategy.

The script

This script spits out commands which can be piped to xte for some automatic key-mashing on GNU / Linux. Save as 2048.php

#!/usr/bin/env php
mouseclick 1
<?php
for($i = 0; $i < 10; $i++) {
	move("Left", 1);
	move("Right", 1);
}

while(1) {
	move("Down", 1);
	move("Left", 1);
	move("Down", 1);
	move("Right", 1);
}

function move($dir, $count) {
	for($i = 0; $i < $count; $i++) {
		echo "key $dir\nsleep 0.025\n";
		usleep(25000);
	}
}

And then in a terminal, run this then click over to a 2048 game:

sleep 3; php 2048.php | xte

Good luck!

How to liberate your myki data

myki logo

myki is the public transport ticketing system in Melbourne. If you register your myki, you can view the usage history online. Unfortunately, you are limited to paging through HTML, or downloading a PDF.

This post will show you how to get your myki history into a CSV file on a GNU/Linux computer, so that you can analyse it with your favourite spreadsheet/database program.

Get your data as PDFs

Firstly, you need to register your myki, log in, and export your history. The web interface seemed to give you the right data if you chose blocks of 1 month.

Export myki data for each month

Once you do this, organise these into a folder filled with statements.

A folder filled with myki statements

You need the pdftotext utility to go on. In debian, this is in the poppler-utils package.

The manual steps below run you through how to extract the data, and at the bottom of the screen there are some scripts I’ve put together to do this automatically.

Manual steps to extract your data

These steps are basically a crash course in "scraping" PDF files.

To convert all of the PDF’s to text, run:

for i in *.pdf; do pdftotext -layout -nopgbrk $i; done

This preserves the line-based layout. The next step is to filter out the lines which don’t contain data. Each line we’re interested in begins with a date, followed by the word “Touch On”, “Touch Off”, or “Top Up”

18/08/2013 13:41:20   T...

We can filter all of the text files using grep, and a regex to match this:

cat *.txt | grep "^[0-3][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] *T"

The output looks like:
Filtered output, showing data

So what are we looking at?

  1. One row per line
  2. Fields delimited by multiple spaces

To collapse every double-space into a tab, we use unexpand. Then, to collapse duplicate tabs, we use tr:

cat filtered-data.txt | unexpand -t 2 | tr -s '\t'

Finally, some fields need to be quoted, and tabs need to be converted to CSV. The PHP script below will do that step.

Scripts to get your data

myki2csv.sh is a script which performs the above manual steps:

#!/bin/bash
# Convert myki history from PDF to CSV
#	(c) Michael Billington < michael.billington@gmail.com >
#	MIT Licence
hash pdftotext || exit 1
hash unexpand || exit 1
pdftotext -layout -nopgbrk $1 - | \
	grep "^[0-3][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] *T" | \
	unexpand -t2 | \
	tr -s '\t' | \
	./tab2csv.php > ${1%.pdf}.csv

tab2csv.php is called at the end of the above script, to turn the result into a well-formed CSV file:

#!/usr/bin/env php
<?php
/* Generate well-formed CSV from dodgy tab-delimitted data
	(c) Michael Billington < michael.billington@gmail.com >
	MIT Licence */
$in = fopen("php://stdin", "r");
$out = fopen("php://stdout", "w");
while($line = fgets($in)) {
	$a = explode("\t", $line);
	foreach($a as $key => $value) {
		$a[$key]=trim($value);
		/* Quote out ",", and escape "" */
		if(!(strpos($value, "\"") === false &&
				strpos($value, ",") === false)) {
			$a[$key] = "\"".str_replace("\"", "\"\"", $a[$key])."\"";
		}
	}
	$line = implode(",", $a) . "\r\n";
	fwrite($out, $line);
}

Invocation

Call script on a single foo.pdf to get foo.csv:

./myki2csv.sh foo.pdf

Convert all PDF’s to CSV and then join them:

for i in *.pdf; do ./myki2csv.sh $i; done
tac *.csv > my-myki-data.csv

Importing into LibreOffice

The first field must be marked as a DD/MM/YYYY date, and the “zones” need to be marked as text (so that “1/2” isn’t treated as a fraction!)

These are my import settings:

Options to import the myki data into LibreOffice

Happy data analysis!

Update 2013-09-18: The -nopgbrk option was added to the above instructions, to prevent page break characters causing grep to skip one valid line per page

Update 2014-05-04: The code for the above, as well as this follow-up post are now available on github.

Making an XKCD-style password generator in C++

I’m learning C++ at the moment, and I don’t find long tutorials or studying the standard template library particularly fun.

Making this type of password-generator is not new, but it is a nice practical exercise to start out in any language.

1. Get a list of common English words

Googling “common English words” yielded this list, purporting to contain 5,000 words. Unfortunately it contains almost 1,000 duplicates and numerous non-words! Wiktionary has a much higher-quality list of words compiled from Project Gutenberg, but the markup looks a bit like this:

==== 1 - 1000 ====
===== 1 - 100 =====
[[the]] = 56271872
[[of]] = 33950064
[[and]] = 29944184
[[to]] = 25956096
[[in]] = 17420636
[[I]] = 11764797  

Noting the wikilinks surrounding each word, I put together this PHP script to extract the link destinations and called it get-wikilinks.php:

#!/usr/bin/php
<?php
/* Return list of wikilinked words from input text */
$text = explode("[[", file_get_contents("php://stdin"));
foreach($text as $link) {
	$rbrace = strpos($link, "]]");
	if(!$rbrace === false) {
		/* Also escape on [[foo|bar]] links */
		$pipe = strpos($link, "|");
		if(!$pipe === false && $pipe < $rbrace) {
			$rbrace = $pipe;
		}
		$word = trim(substr($link, 0, $rbrace))."n";
		if(strpos($word, "'") === false && !is_numeric(substr($word, 0, 1))) {
			/* Leave out words with apostrophes or starting with numbers */
			echo $word;
		}
	}
}

The output of this script is much more workable:

$ chmod +x get-wikilinks.php
$ cat wikt.txt | ./get-wikilinks.php
the
of
and
to
in
I

Using sort and uniq makes a top-notch list of common words, ready for an app to digest:

$ cat wikt.txt | ./get-wikilinks.php | sort | uniq > wordlist.txt

2. Write some C++

There are two problems being solved here:

  • Reading a file into memory
    • An ifstream is used to access the file, and getline() will return false when EOF has been reached
    • Each line is loaded into a vector (roughly the same type of container as an ArrayList in Java), which is resized dynamically and accessed like an array.
  • Choosing random numbers
    • These are seeded from a random_device, being more cross-platform than reading from a file like /dev/urandom.
    • Note that random is new to C++11.
pw.cpp
#include <fstream>
#include <vector>
#include <string>
#include <iostream>
#include <random>
#include <cstdlib>

using namespace std;

int main(int argc, char* argv[]) {
    const char* fname = "wordlist.txt";

    /* Parse command-line arguments */
    int max = 1;
    if(argc == 2) {
        max = atoi(argv[1]);
    }

    /* Open word list file */
    ifstream input;
    input.open(fname);
    if(input.fail()) {
        cerr << "ERROR: Failed to open " << fname << endl;
    }

    /* Read to end and load words */
    vector<string> wordList;
    string line;
    while(getline(input, line)) {
        wordList.push_back(line);
    }

    /* Seed from random device */
    random_device rd;
    default_random_engine gen;
    gen.seed(rd());
    uniform_int_distribution<int> dist(0, wordList.size() - 1);

    /* Output as many passwords as required */
    const int pwLen = 4;
    int wordId, i, j;
    for(i = 0; i < max; i++) {
        for(j = 0; j < pwLen; j++) {
            cout << wordList[dist(gen)] << ((j != pwLen - 1) ? " " : "");
        }
        cout << endl;
    }

    return 0;
}

3. Compile

Lots of projects in compiled languages have a Makefile, so that you can compile them without having to type all the compiler options manually.

Makefiles are a bit heavy to learn properly, but for a project this tiny, something simple is fine:

default:
	g++ pw.cpp -o pw -std=c++11

clean:
	rm -f pw

Now we can compile and run the generator:

make
./pw

The output looks like this for ./pw 30 ("generate 30 passwords"):

Moodlification

Moodle is an alright piece of software, but if you ever try to discuss code, it will mangle it and create a horrible mess.

To mitigate this, write your posts in HTML view, and paste your code as it appears after running it through this moodlify.php script:

#!/usr/bin/php
<?php
/* moodlify.php -- whitespace fixes for code to post on moodle */
if(!isset($argv[1])) {
    echo "Usage: " . $argv[0] . " [file]\n";
    exit(0);
}
$file = $argv[1];
if(!file_exists($file) || !$code = file_get_contents($file)) {
    echo "Could not open file: ".$argv[1]."\n";
    exit(1);
}

/* Series of find-replaces to strip whitespace */
$code = htmlentities($code); /* < > " ' etc */
$code = str_replace("\t", "    ", $code); /* Tabs (4 spaces)*/

$code = str_replace("  ", "&nbsp;&nbsp;", $code); /* Double-spaces */
$code = str_replace("\n", "<br />", $code); /* Newlines */
$code = "<div style=\"padding-left:4em;\"><code>".$code."</code></div>\n";

/* Output options */
if(isset($argv[2])) {
    file_put_contents($argv[2], $code);
} else {
    echo $code;
}?>

And moodle will never mangle your code again!

Alphanumeric phone numbers

Some popular phone numbers (eg. 1300-FOOBAR) are not numbers at all. If you are running a VOIP server then you may be interested in this snippet of PHP code to convert them into actual numbers, allowing users to dial by typing the more familiar form.

function normaliseTelephoneNumber($start) {
	/* Return an extension with numbers substituted in place of letters for dialling */
	$map = array(	"A" => "2", "B" => "2", "C" => "2",
			"D" => "3", "E" => "3", "F" => "3",
			"G" => "4", "H" => "4", "I" => "4",
			"J" => "5", "J" => "5", "L" => "5",
			"M" => "6", "N" => "6", "O" => "6",
			"P" => "7", "Q" => "7", "R" => "7", "S" => "7",
			"T" => "8", "U" => "8", "V" => "8",
			"W" => "9", "X" => "9", "Y" => "9", "Z" => "9",
			"+" => "+", "*" => "*", "#" => "#");
	$new = "";
	$hasnumber = false;
	$ext = strtoupper($start);
	for($i = 0; $i < strlen($ext); $i++) {
		$c = substr($ext, $i, 1);
		if(isset($map[$c])) {
			$new .= $map[$c];
			if($hasnumber == false) {
				/* No numbers before letters */
				return $start;
			}
		} else if(is_numeric($c)) {
			$new .= $c;
			$hasnumber = true;
		}
	}

	if($hasnumber == true) {
		return $new; /* Return numeric version as appropriate */
	} else {
		return $start; /* Leaves full words like "joe" or "bazza" unchanged */
	}
	return $new;
}

Note that this will only alter the number if it begins with numbers. This is to make sure that (at least in my case) the local network extensions don't get messed with:

echo normaliseTelephoneNumber("1300-FOOBAR")."n"; /* 1300366227 */
echo normaliseTelephoneNumber("mike")."n"; /* mike */

Beautiful QR Codes

The verdict is in. QR codes are ugly. But they don’t have to be. Check out the modified code featured on this Wikipedia article. You don’t need to be a QR Code expert to do something like that.

The basic idea is that QR codes have error Correction. We can generate codes which store the data in multiple places, so that scanners will still read them if they are damaged.

Scripting the whole operation

Tedious image editing is not my cup of tea, so I made a PHP class to apply some templates to QR codes and add a centred logo for us.

Note: If you don’t have ImageMagick for PHP, read this page. On Ubuntu, apt-get install php5-imagick worked fine for me.

This is the basic formula:

QR Code + Template + Logo = Pretty PR Code

First we need the QR code. It is best generated with the phpqrcode library. Adding the logo wont work unless we use high error correction (H):

QRcode::png("http://bitrevision.com", "code.png", 'H', 8, 0);

After that, code.png looks like this:

Now the templates. Note that we used 8×8 pixels per block and 0 for the border above. The resulting code may line up with one of these templates, adding white lines over the image. Save these images to a ‘template’ folder:

Next, the logo. We have one of those:

Time for the code. This class will do most of the work. Just check the template folder contains an overlay that fits your code.

class QR_Pretty {
	public $template_base = "template/template-{SIZE}.png";
	public $qr = false;
	public $geometry = false;

	function prettify($file, $logo, $output = '') {
		/* Load image */
		$this -> qr = new Imagick();
		$this -> qr -> readImage($file);
		$this -> geometry = $this -> qr -> getImageGeometry();

		/* Perform modifications */
		$this -> add_template();
		$this -> add_logo($logo);

		/* Output image */
		if($output != '') {
			$this -> qr -> setImageFileName($output);
		}
		$this -> qr -> writeImage();
	}

	private function add_template() {
		/* This will overlay a template containing white lines,
			to make the QR codes look less code-ful */
		$size = $this -> geometry['width'];
		$template_filename = str_replace("{SIZE}", $size, $this -> template_base);
		$template = new Imagick();
		$template -> readImage($template_filename);
		$this -> qr -> compositeImage($template, imagick::COMPOSITE_OVER, 0, 0 );
	}

	private function add_logo($logo_file = false) {
		/* This places a logo in the middle of the QR code,
			64x64 would be advisable :) */
		if(!$logo_file) {
			/* No logo to add */
			return false;
		}
		$logo = new Imagick();
		$logo -> readImage($logo_file);
		$logo_size = $logo -> getImageGeometry();
		$x = ($this -> geometry['width'] - $logo_size['width']) / 2;
		$y = ($this -> geometry['height'] - $logo_size['height']) / 2;
		$this -> qr -> compositeImage($logo, imagick::COMPOSITE_OVER, $x, $y);
	}
}

To use the above class is quite simple. Once you have code.png, logo.png, and a folder full of templates, just do this:

$qr = new QR_Pretty();
$qr -> prettify("code.png", "bitrevis.png", "pretty.png");

That gives us pretty.png, which looks like this:

Thanks to that error correction, this picture still scans and takes us to http://bitrevision.com. Slightly larger logos can be used, but 64×64 looks good and scans reliably. Try embedding logos with transparency too!

This means no more excuses for ugly QR codes. Integrate this into your scripts right away.

Using speech synthesis in AGI apps

For some of my telephone apps, stringing together pre-recorded messages is a pain, and it’s not even useful for more dynamic content. This is how I got PHP to do sound output via festival whilst using the asterisk AGI.

First, I made a folder called agi in my /var/lib/asterisk/sounds/en/. This is where we will cache sound files.

This shell script makes a sound file with your text, and saves it in that directory:

#!/bin/sh
echo "$1" | text2wave -otype ulaw -o /var/lib/asterisk/sounds/en/agi/$2.ulaw

For the PHP side, we just use MD5 to name the files based on their contents, and only run the shell script if no file has been created with the text we want.

In this project, $agi refers to an instance of PHP AGI. (highly recommended)

function sayNow($string, $escape = "") {
	global $agi;
	$sounds_path	= "/var/lib/asterisk/sounds/en/agi/";
	$agi_path	= "/var/lib/asterisk/agi-bin/"
	$fn = md5($string);
	$string = str_replace(""", "", $string);
	$string = str_replace(";", ",", $string);
	$string = str_replace("'", "", $string);
	$string = str_replace(""", " slash ", $string);
	if(!file_exists("$sounds_path$fn")) {
		system("$agi_path$common/sound.sh "$string" "$fn"");
	}
	return $agi -> stream_file("$sounds_path$fn", $escape);
}

This makes sound output at least a million times easier. We use it like this:

$a = sayNow("Enter the number, then press hash");

That’s all there is to it. I should probably also set up a cron job to clear the sound files at the end of each week.

Converting Numbers To Words in PHP

This is a straightforward coding task. I’m working on some maths code in PHP, and need a function to output “twenty-five” for 25, “fifteen” for 15, etc. A quick google search pulled up a neat little PEAR package which can do this.

The results weren’t as flash as I’d hoped though. We ended up with this:

894: eight hundred ninety-four

So it turns out that the PEAR class doesn’t print commas or the word ‘and’ in its numbers. We will be feeding our numbers to festival, and also using them for maths questions. Those pesky ands and commas are a must for this project, so this is not good enough:

9539: nine thousand five hundred thirty-nine

Instead, I need:

9539: nine thousand, five hundred and thirty-nine

I found some commented out code, and tried my own modifications, but it wasn’t working right, so I scrapped the PEAR class and started from scratch, using Wikipedia to populate the lists:

The results were perfect. It took about 300 lines to replace the class, and it handles ordinal numbers too. (‘1st’ = ‘first’, ‘100th = one hundredth’, etc). I ditched the currency feature.

To download the replacement class, click here.

It’s simple to use, just express ridiculous numbers or long decimals as strings to avoid errors. See this example for features:

<?php
include("Numbers_Words.php");
 // one
echo Numbers_Words::toWords(1); newline();

 // two
echo Numbers_Words::toWords(2); newline();

 // twenty-five
echo Numbers_Words::toWords(25); newline();

 // one thousand
echo Numbers_Words::toWords(1000); newline();

 // one thousand and one
echo Numbers_Words::toWords(1001); newline();

 // one hundred thousand and one
echo Numbers_Words::toWords(100001); newline();

 // one hundred and twenty-three million, four hundred and fifty-six thousand, seven hundred and eighty-nine
echo Numbers_Words::toWords("123 456 789"); newline();

 // thirty-six point nine seven
echo Numbers_Words::toWords(36.97); newline();

 /* nine novemvigintillion, eight hundred and seventy-two octovigintillion, three hundred and fourty-eight septemvigintillion,
	nine hundred and seventy-two sesvigintillion, four hundred and ninety-two quinquavigintillion,
	three hundred and eighty-four quattuorvigintillion, nine hundred and two tresvigintillion,
	three hundred and eighty-four duovigintillion, two hundred and ninety unvigintillion, three hundred and eighty-four vigintillion,
	two hundred and ninety novemdecillion, three hundred and fourty-two octodecillion, five hundred and sixty-three septendecillion,
	four hundred and seventy-five sexdecillion, six hundred and thirty-four quindecillion, eight hundred and fifty-seven quattuordecillion,
	four hundred and fifty-seven tredecillion, three hundred and fourty-nine duodecillion, eight hundred and fifty-seven undecillion,
	two hundred and thirty-four decillion, five hundred and twenty-three nonillion, five hundred and thirty-four octillion,
	eight hundred and fifty-three septillion, two hundred and ninety sextillion, four hundred and seventy-eight quintillion,
	two hundred and ninety quadrillion, three hundred and fourty-seven trillion, two hundred and thirty-eight billion,
	nine hundred and fourty-six million, five hundred and thirty-eight thousand, four hundred and seventy-six */
echo Numbers_Words::toWords("9872348972492384902384290384290342563475634857457349857234523534853290478290347238946538476"); newline();

 // seventeenth
echo Numbers_Words::toWords("17th"); newline();

 // eight hundred and sixty-third
echo Numbers_Words::toWords("863rd"); newline();

 // negative seventy-eight point four
echo Numbers_Words::toWords("-78.4"); newline();

function newline() {
	echo "<br />n";
}
?>

Using correct strings makes synthetic voices much less annoying, and nobody can complain about bad maths questions. :)