How to generate professional-quality PDF files from PHP

There are a few ways to go about making PDF files from your PHP web app. Your options are basically-

  1. Put all of your text in a 210mm column and get the user to save it as PDF.
  2. Learn a purpose-built library, such as FPDF (free) or pdflib (proprietary).
  3. Use PHP for generating markup which can be saved to PDF. This is of course LaTeX

This article assumes an intermediate knowledge of both PHP and LaTeX, and that your server is not running Windows.

The software mix

PHP is an open-source server package which generates HTML pages, usually based on some sort of dynamic data. It is equally good at (but less well known for) generating other types of markup.

LaTeX is an open source document typesetting system, which will take a markup file in .tex format, and output a printable document, such as a PDF. The engine I will use here is XeLaTeX, because it supports modern trimmings such as Unicode and OpenType fonts.

Naturally, this post will use PHP to populate a .tex file, and then xelatex to create a PDF for the user.

This sounds straightforward enough, but it may not work with all shared hosts. Check your setup before you read on:

  1. Your server needs PHP, with safe mode disabled, so that it can run commands.
  2. This server needs xelatex, or a suitable substitute such as pdflatex.

A bit about markup

We will be working with .tex templates, which will be valid LaTeX files. The basic rules are:

  1. Define a \newcommand for every variable, so that you can compile the document without PHP.
  2. Drop PHP code in comments, which will print out code to override those variables.

So you will end up with code like this:

% Make placeholders visible
\newcommand{\placeholder}[1]{\textbf{$<$ #1 $>$}}

% Defaults for each variable
\newcommand{\test}{\placeholder{Data here}}

% Fill in
% <?php echo "\n" . "\\renewcommand{\\test}{" . LatexTemplate::escape($data['test']) . "}\n"; ?>

Look messy? A multi-line block of PHP is a little easier to follow. This example is from the body of a table, see if you can figure out the syntax:

%<?php                                                                      /*
% */ foreach($data['invoiceItem'] as $invoiceItem) {                        /*
% */    echo "\n" . LatexTemplate::escape($invoiceItem['item']) . " & " .   /*
% */        LatexTemplate::escape($invoiceItem['qty']) . " & " .            /*
% */        LatexTemplate::escape($invoiceItem['price']) . " & " .          /*
% */        LatexTemplate::escape($invoiceItem['total']) . "\\\\\n";        /*
% */ } ?>

So what about this LatexTemplate::escape() business? In LaTeX, just about every symbol seems to be part of the syntax, so it is sadly not very simple to escape.

I have settled on the following series of str_replace() calls to sanitise information for display. It is crude but effective. Generating LaTex is much like generating SQL, HTML or LDIF from your website: it is quite important to make a habit of wrapping every piece of data with a function to prevent users from writing (‘injecting’) arbitrary code into your document:

/**
 * Series of substitutions to sanitise text for use in LaTeX.
 *
 * http://stackoverflow.com/questions/2627135/how-do-i-sanitize-latex-input
 * Target document should \usepackage{textcomp}
 */
public static function escape($text) {
	// Prepare backslash/newline handling
	$text = str_replace("\n", "\\\\", $text); // Rescue newlines
	$text = preg_replace('/[\x00-\x1F\x7F-\xFF]/', '', $text); // Strip all non-printables
	$text = str_replace("\\\\", "\n", $text); // Re-insert newlines and clear \\
	$text = str_replace("\\", "\\\\", $text); // Use double-backslash to signal a backslash in the input (escaped in the final step).

	// Symbols which are used in LaTeX syntax
	$text = str_replace("{", "\\{", $text);
	$text = str_replace("}", "\\}", $text);
	$text = str_replace("$", "\\$", $text);
	$text = str_replace("&", "\\&", $text);
	$text = str_replace("#", "\\#", $text);
	$text = str_replace("^", "\\textasciicircum{}", $text);
	$text = str_replace("_", "\\_", $text);
	$text = str_replace("~", "\\textasciitilde{}", $text);
	$text = str_replace("%", "\\%", $text);

	// Brackets & pipes
	$text = str_replace("<", "\\textless{}", $text);
	$text = str_replace(">", "\\textgreater{}", $text);
	$text = str_replace("|", "\\textbar{}", $text);

	// Quotes
	$text = str_replace("\"", "\\textquotedbl{}", $text);
	$text = str_replace("'", "\\textquotesingle{}", $text);
	$text = str_replace("`", "\\textasciigrave{}", $text);

	// Clean up backslashes from before
	$text = str_replace("\\\\", "\\textbackslash{}", $text); // Substitute backslashes from first step.
	$text = str_replace("\n", "\\\\", trim($text)); // Replace newlines (trim is in case of leading \\)
	return $text;
}

We then have a template which we can include() from PHP, or run xelatex over. Below is minimal.tex, a minimal example of a PHP-latex template in this form:

% This file is a valid PHP file and also a valid LaTeX file
% When processed with LaTeX, it will generate a blank template
% Loading with PHP will fill it with details

\documentclass{article}
% Required for proper escaping
\usepackage{textcomp} % Symbols
\usepackage[T1]{fontenc} % Input format

% Because Unicode etc.
\usepackage{fontspec} % For loading fonts
\setmainfont{Liberation Serif} % Has a lot more symbols than Computer Modern

% Make placeholders visible
\newcommand{\placeholder}[1]{\textbf{$<$ #1 $>$}}

% Defaults for each variable
\newcommand{\test}{\placeholder{Data here}}

% Fill in
% <?php echo "\n" . "\\renewcommand{\\test}{" . LatexTemplate::escape($data['test']) . "}\n"; ?>

\begin{document}
	\section{Data From PHP}
	\test{}
\end{document}

Generate a PDF on the server

Here is where the fun begins. There is no plugin for compiling a LaTeX document, so we need to directly execute the command on a file.

Looks like we need to save the output somewhere then. You would generate your filled-in LaTeX code in a temporary file by doing something like this:

/**
 * Generate a PDF file using xelatex and pass it to the user
 */
public static function download($data, $template_file, $outp_file) {
	// Pre-flight checks
	if(!file_exists($template_file)) {
		throw new Exception("Could not open template");
	}
	if(($f = tempnam(sys_get_temp_dir(), 'tex-')) === false) {
		throw new Exception("Failed to create temporary file");
	}

	$tex_f = $f . ".tex";
	$aux_f = $f . ".aux";
	$log_f = $f . ".log";
	$pdf_f = $f . ".pdf";

	// Perform substitution of variables
	ob_start();
	include($template_file);
	file_put_contents($tex_f, ob_get_clean());

The next step is to execute your engine of choice on the output files:

	// Run xelatex (Used because of native unicode and TTF font support)
	$cmd = sprintf("xelatex -interaction nonstopmode -halt-on-error %s",
			escapeshellarg($tex_f));
	chdir(sys_get_temp_dir());
	exec($cmd, $foo, $ret);

Once this is done, you can delete a lot of the extra LaTeX files, and check if a .pdf appeared as expected:

	// No need for these files anymore
	@unlink($tex_f);
	@unlink($aux_f);
	@unlink($log_f);

	// Test here
	if(!file_exists($pdf_f)) {
		@unlink($f);
		throw new Exception("Output was not generated and latex returned: $ret.");
	}

And of course, send the completed file back via HTTP:

	// Send through output
	$fp = fopen($pdf_f, 'rb');
	header('Content-Type: application/pdf');
	header('Content-Disposition: attachment; filename="' . $outp_file . '"' );
	header('Content-Length: ' . filesize($pdf_f));
	fpassthru($fp);

	// Final cleanup
	@unlink($pdf_f);
	@unlink($f);
}

The static functions escape($text) and download($data, $template_file, $outp_file) are together placed into a class called LatexTemplate for the remainder of the example (complete file on GitHub).

Gluing it all together

With the library and template, it is quite easy to set up a PHP script which triggers the above code:

<?php
require_once('../LatexTemplate.php');

$test = "";
if(isset($_GET['t'])) {
	// Make the LaTeX file and send it through
	$test = $_GET['t'];
	if($test =="") {
		// Test pattern to show symbol handling
		for($i = 0; $i < 256; $i++) {
			$test .= chr($i) . " . ";
		}
	}

	try {
		LatexTemplate::download(array('test' => $test), 'minimal.tex', 'foobar.pdf');
	} catch(Exception $e) {
		echo $e -> getMessage();
	}

}
?>
<html>
<head>
<title>LaTeX test (minimal)</title>
</head>
</html>
<body>
	<p>Enter some text to be placed on the output:</p>
	<form>
		<input type="text" name="t" /><input type="submit" value="Generate" />
	</form>
</body>
</html>

The above code will show a form, which asks for input. When it gets some text, it will generate a PDF containing the text. If no text is given, it will output an ASCII table, simply to show that it can handle the symbols.

Once the template code is hidden away, this powerful technique is easily applied.

Results

This is only a minimal example. In any real application, your template would be more extensive.

Compiling the template directly creates this PDF:

From the web, a form is presented to fill this single field:

Which results in a PDF containing the user data:

Tips

  1. The text after \end{document} is not even parsed in latex. Use this area to write <?php ?> with
    fewer constraints.
  2. Consult the github repository for this code to see the complete example.
  3. Comment out the line @unlink($tex_f); of you want to preserve (for debugging, etc) the generated markup.

10 Replies to “How to generate professional-quality PDF files from PHP”

  1. @vini- This is LaTeX code, which is a very expressive way to typeset documents- I’d suggest doing a web search for “LaTeX background image” or similar, which should turn up answers to your question.

  2. Thank you for this article, is very useful. But I have a problem… I used the code with a large list of data and there are overwritten. The pdf generated doesn’t have page breaks. Can you help me?

  3. @Martin- This is just a method of embedding PHP into LaTeX, so a web search for “latex multi-page table” should be the first port of call for this particular layout issue.

  4. When I’m trying to run this script, I keep getting: “Output was not generated and latex returned: 1.”

  5. @Dominic – There are many moving parts in this post, I would suggest removing the “@unlink” statements, and attempting to manually compile the .tex and .log files to debug.

    This code is all on available GitHub if you want to be sure that browser copy & paste isn’t including anything strange.

  6. @Malte – sure sounds strange. Could be permissions, the Apache user needs to read it. You can make a CLI PHP script to debug with.

  7. Thanks! Works nicely!

    I’m trying to include a graphic in the header of my document, but always get “Unable to load picture or PDF file” when xelatex tries to load the graphic. It works fine on my local as well as on the server, when executing “xelatex foo.tex” in the shell, but it doesnt work when executing it with the script. The png file is in the same directory as the php and in the tmp folder, but I always get this error. Do you have any idea?

Leave a Reply

Your email address will not be published. Required fields are marked *