How to generate professional-quality PDF files from PHP

There are a few ways to go about making PDF files from your PHP web app. Your options are basically-

  1. Put all of your text in a 210mm column and get the user to save it as PDF.
  2. Learn a purpose-built library, such as FPDF (free) or pdflib (proprietary).
  3. Use PHP for generating markup which can be saved to PDF. This is of course LaTeX

This article assumes an intermediate knowledge of both PHP and LaTeX, and that your server is not running Windows.

The software mix

PHP is an open-source server package which generates HTML pages, usually based on some sort of dynamic data. It is equally good at (but less well known for) generating other types of markup.

LaTeX is an open source document typesetting system, which will take a markup file in .tex format, and output a printable document, such as a PDF. The engine I will use here is XeLaTeX, because it supports modern trimmings such as Unicode and OpenType fonts.

Naturally, this post will use PHP to populate a .tex file, and then xelatex to create a PDF for the user.

This sounds straightforward enough, but it may not work with all shared hosts. Check your setup before you read on:

  1. Your server needs PHP, with safe mode disabled, so that it can run commands.
  2. This server needs xelatex, or a suitable substitute such as pdflatex.

A bit about markup

We will be working with .tex templates, which will be valid LaTeX files. The basic rules are:

  1. Define a \newcommand for every variable, so that you can compile the document without PHP.
  2. Drop PHP code in comments, which will print out code to override those variables.

So you will end up with code like this:

% Make placeholders visible
\newcommand{\placeholder}[1]{\textbf{$<$ #1 $>$}}

% Defaults for each variable
\newcommand{\test}{\placeholder{Data here}}

% Fill in
% <?php echo "\n" . "\\renewcommand{\\test}{" . LatexTemplate::escape($data['test']) . "}\n"; ?>

Look messy? A multi-line block of PHP is a little easier to follow. This example is from the body of a table, see if you can figure out the syntax:

%<?php                                                                      /*
% */ foreach($data['invoiceItem'] as $invoiceItem) {                        /*
% */    echo "\n" . LatexTemplate::escape($invoiceItem['item']) . " & " .   /*
% */        LatexTemplate::escape($invoiceItem['qty']) . " & " .            /*
% */        LatexTemplate::escape($invoiceItem['price']) . " & " .          /*
% */        LatexTemplate::escape($invoiceItem['total']) . "\\\\\n";        /*
% */ } ?>

So what about this LatexTemplate::escape() business? In LaTeX, just about every symbol seems to be part of the syntax, so it is sadly not very simple to escape.

I have settled on the following series of str_replace() calls to sanitise information for display. It is crude but effective. Generating LaTex is much like generating SQL, HTML or LDIF from your website: it is quite important to make a habit of wrapping every piece of data with a function to prevent users from writing (‘injecting’) arbitrary code into your document:

/**
 * Series of substitutions to sanitise text for use in LaTeX.
 *
 * http://stackoverflow.com/questions/2627135/how-do-i-sanitize-latex-input
 * Target document should \usepackage{textcomp}
 */
public static function escape($text) {
	// Prepare backslash/newline handling
	$text = str_replace("\n", "\\\\", $text); // Rescue newlines
	$text = preg_replace('/[\x00-\x1F\x7F-\xFF]/', '', $text); // Strip all non-printables
	$text = str_replace("\\\\", "\n", $text); // Re-insert newlines and clear \\
	$text = str_replace("\\", "\\\\", $text); // Use double-backslash to signal a backslash in the input (escaped in the final step).

	// Symbols which are used in LaTeX syntax
	$text = str_replace("{", "\\{", $text);
	$text = str_replace("}", "\\}", $text);
	$text = str_replace("$", "\\$", $text);
	$text = str_replace("&", "\\&", $text);
	$text = str_replace("#", "\\#", $text);
	$text = str_replace("^", "\\textasciicircum{}", $text);
	$text = str_replace("_", "\\_", $text);
	$text = str_replace("~", "\\textasciitilde{}", $text);
	$text = str_replace("%", "\\%", $text);

	// Brackets & pipes
	$text = str_replace("<", "\\textless{}", $text);
	$text = str_replace(">", "\\textgreater{}", $text);
	$text = str_replace("|", "\\textbar{}", $text);

	// Quotes
	$text = str_replace("\"", "\\textquotedbl{}", $text);
	$text = str_replace("'", "\\textquotesingle{}", $text);
	$text = str_replace("`", "\\textasciigrave{}", $text);

	// Clean up backslashes from before
	$text = str_replace("\\\\", "\\textbackslash{}", $text); // Substitute backslashes from first step.
	$text = str_replace("\n", "\\\\", trim($text)); // Replace newlines (trim is in case of leading \\)
	return $text;
}

We then have a template which we can include() from PHP, or run xelatex over. Below is minimal.tex, a minimal example of a PHP-latex template in this form:

% This file is a valid PHP file and also a valid LaTeX file
% When processed with LaTeX, it will generate a blank template
% Loading with PHP will fill it with details

\documentclass{article}
% Required for proper escaping
\usepackage{textcomp} % Symbols
\usepackage[T1]{fontenc} % Input format

% Because Unicode etc.
\usepackage{fontspec} % For loading fonts
\setmainfont{Liberation Serif} % Has a lot more symbols than Computer Modern

% Make placeholders visible
\newcommand{\placeholder}[1]{\textbf{$<$ #1 $>$}}

% Defaults for each variable
\newcommand{\test}{\placeholder{Data here}}

% Fill in
% <?php echo "\n" . "\\renewcommand{\\test}{" . LatexTemplate::escape($data['test']) . "}\n"; ?>

\begin{document}
	\section{Data From PHP}
	\test{}
\end{document}

Generate a PDF on the server

Here is where the fun begins. There is no plugin for compiling a LaTeX document, so we need to directly execute the command on a file.

Looks like we need to save the output somewhere then. You would generate your filled-in LaTeX code in a temporary file by doing something like this:

/**
 * Generate a PDF file using xelatex and pass it to the user
 */
public static function download($data, $template_file, $outp_file) {
	// Pre-flight checks
	if(!file_exists($template_file)) {
		throw new Exception("Could not open template");
	}
	if(($f = tempnam(sys_get_temp_dir(), 'tex-')) === false) {
		throw new Exception("Failed to create temporary file");
	}

	$tex_f = $f . ".tex";
	$aux_f = $f . ".aux";
	$log_f = $f . ".log";
	$pdf_f = $f . ".pdf";

	// Perform substitution of variables
	ob_start();
	include($template_file);
	file_put_contents($tex_f, ob_get_clean());

The next step is to execute your engine of choice on the output files:

	// Run xelatex (Used because of native unicode and TTF font support)
	$cmd = sprintf("xelatex -interaction nonstopmode -halt-on-error %s",
			escapeshellarg($tex_f));
	chdir(sys_get_temp_dir());
	exec($cmd, $foo, $ret);

Once this is done, you can delete a lot of the extra LaTeX files, and check if a .pdf appeared as expected:

	// No need for these files anymore
	@unlink($tex_f);
	@unlink($aux_f);
	@unlink($log_f);

	// Test here
	if(!file_exists($pdf_f)) {
		@unlink($f);
		throw new Exception("Output was not generated and latex returned: $ret.");
	}

And of course, send the completed file back via HTTP:

	// Send through output
	$fp = fopen($pdf_f, 'rb');
	header('Content-Type: application/pdf');
	header('Content-Disposition: attachment; filename="' . $outp_file . '"' );
	header('Content-Length: ' . filesize($pdf_f));
	fpassthru($fp);

	// Final cleanup
	@unlink($pdf_f);
	@unlink($f);
}

The static functions escape($text) and download($data, $template_file, $outp_file) are together placed into a class called LatexTemplate for the remainder of the example (complete file on GitHub).

Gluing it all together

With the library and template, it is quite easy to set up a PHP script which triggers the above code:

<?php
require_once('../LatexTemplate.php');

$test = "";
if(isset($_GET['t'])) {
	// Make the LaTeX file and send it through
	$test = $_GET['t'];
	if($test =="") {
		// Test pattern to show symbol handling
		for($i = 0; $i < 256; $i++) {
			$test .= chr($i) . " . ";
		}
	}

	try {
		LatexTemplate::download(array('test' => $test), 'minimal.tex', 'foobar.pdf');
	} catch(Exception $e) {
		echo $e -> getMessage();
	}

}
?>
<html>
<head>
<title>LaTeX test (minimal)</title>
</head>
</html>
<body>
	<p>Enter some text to be placed on the output:</p>
	<form>
		<input type="text" name="t" /><input type="submit" value="Generate" />
	</form>
</body>
</html>

The above code will show a form, which asks for input. When it gets some text, it will generate a PDF containing the text. If no text is given, it will output an ASCII table, simply to show that it can handle the symbols.

Once the template code is hidden away, this powerful technique is easily applied.

Results

This is only a minimal example. In any real application, your template would be more extensive.

Compiling the template directly creates this PDF:

From the web, a form is presented to fill this single field:

Which results in a PDF containing the user data:

Tips

  1. The text after \end{document} is not even parsed in latex. Use this area to write <?php ?> with
    fewer constraints.
  2. Consult the github repository for this code to see the complete example.
  3. Comment out the line @unlink($tex_f); of you want to preserve (for debugging, etc) the generated markup.

Including git commit history in a LaTeX document

LaTeX is a document typesetting system, which lends itself well to use within a source-code management program such as git. I found a need to include a changelog in a document, and found this brilliant blog article by Jerel Unruh which creates a HTML log using git log.

For inclusion within a Makefile target, I re-worked this into a shell script, changes.sh, which will also auto-detect URL’s for any Github-hosted repository:

#!/bin/bash
# Find remote URL for hashes (designed for GitHub-hosted projects)
origin=`git config remote.origin.url`
base=`dirname "$origin"`/`basename "$origin" .git`

# Output LaTeX table in chronological order
echo "\\begin{tabular}{l l l}\\textbf{Detail} & \\textbf{Author} & \\textbf{Description}\\\\\\hline"
git log --pretty=format:"\\href{$base/commit/%H}{%h} & %an & %s\\\\\\hline" --reverse
echo "\end{tabular}"

The above script outputs a LaTeX table, including a hyperlink to each commit. In the pre-amble, you need:

\usepackage{hyperref}

In your Makefile, you might add something like:

changelog:
	./changes.sh > changelog.tex

The resulting changelog can then be included in the document via:

\input{changelog.tex}

Writing in Ancient Egyptian with HieroTeX

When I started learning Ancient Egyptian, I wanted to be able to type hieroglyphs alongside regular text, for printing translations. There is a package for the typesetting system LaTeX which does this, called “HieroTeX“. It took me a while to figure out how to use it, but the results are top-notch:

Example of HieroTex output

Because I’ve installed this on quite a few computers, I’m writing up this blog post to make it easier for other GNU/Linux users who are trying to figure it out.

Installation

This is tricky, because:

  • There is no Debian package! Uh oh.
  • Debian is phasing out tetex in favour of texlive
  • The variables.mk file needs to be edited for the install to work (diff to apply / how to apply it). This is because the default installation target is the user’s home directory.

I put togethter this script, hierotex-install-3.5.sh, which will get a working HieroTeX install on any recent version of Debian.

#!/bin/sh
# Script to download and install HieroTeX on a Debian computer.
# Use at your own risk.
#
# Some packages you should install first:
# 	apt-get install texlive make gcc

# Get and extract the files
wget -c "http://webperso.iut.univ-paris8.fr/~rosmord/archives/HieroTeX-3.5.tgz" && 
tar xvzf "HieroTeX-3.5.tgz" && 
cd HieroTeX && 
wget -c "http://webperso.iut.univ-paris8.fr/~rosmord/archives/HieroType1-3.1.4.tgz" && 
tar xvzf "HieroType1-3.1.4.tgz"

# Patch variable.mk to install for the whole system
wget http://mike42.me/blog/files/variable.patch && 
patch variable.mk < variable.patch

# Run the installer
sudo make tetex-install

Note: This page is great, but the variables.mk suggested for Debian/Ubuntu does not include the documentation folder, which will cause the installer to crash. It also suggests using tetex, which will not exist in future Debian releases! This is probably fine if you are on a .rpm-flavoured distro.

How to use

Firstly, you will need to know a little bit about the LaTeX typesetting system. See wikibooks.

HieroTeX accepts markup in Manuel de Codage format, which you will either need to learn, or get a tool which helps you mark up text in it. This Linux for Egyptologists page has some excellent suggestions.

The block of LaTeX code below is from my tex-examples repo, and was used to generate the image of Tutankhamun’s cartouche above.

\documentclass[a4paper]{article}
\usepackage{hiero}
\usepackage{egypto}
\begin{document}
	\section*{Egyptian hieroglyph example}

	\begin{hieroglyph}zA ra < i-mn:n-t-G43-t-S34 HqA-iwn-Sma >\end{hieroglyph} \\
	{\em Tutankhamun Hekaiunushema} \\
	Living Image of Amun, ruler of Upper Heliopolis
\end{document}

To build the file, you need to filter it through sesh command. Something like this would work:

cat hierotex-example.tex | sesh > hierotex-example-2.tex
latex hierotex-example-2.tex

The actual example uses a Makefile to do this.

Update May 2016: The original website for HieroTeX has gone offline, but is available via the Internet Archive: webperso.iut.univ-paris8.fr/~rosmord/archives/