There are a few ways to go about making PDF files from your PHP web app. Your options are basically-
- Put all of your text in a 210mm column and get the user to save it as PDF.
- Learn a purpose-built library, such as FPDF (free) or pdflib (proprietary).
- Use PHP for generating markup which can be saved to PDF. This is of course LaTeX
This article assumes an intermediate knowledge of both PHP and LaTeX, and that your server is not running Windows.
The software mix
PHP is an open-source server package which generates HTML pages, usually based on some sort of dynamic data. It is equally good at (but less well known for) generating other types of markup.
LaTeX is an open source document typesetting system, which will take a markup file in .tex format, and output a printable document, such as a PDF. The engine I will use here is XeLaTeX, because it supports modern trimmings such as Unicode and OpenType fonts.
Naturally, this post will use PHP to populate a .tex file, and then xelatex to create a PDF for the user.
This sounds straightforward enough, but it may not work with all shared hosts. Check your setup before you read on:
- Your server needs PHP, with safe mode disabled, so that it can run commands.
- This server needs xelatex, or a suitable substitute such as pdflatex.
A bit about markup
We will be working with .tex templates, which will be valid LaTeX files. The basic rules are:
- Define a \newcommand for every variable, so that you can compile the document without PHP.
- Drop PHP code in comments, which will print out code to override those variables.
So you will end up with code like this:
% Make placeholders visible
\newcommand{\placeholder}[1]{\textbf{$<$ #1 $>$}}
% Defaults for each variable
\newcommand{\test}{\placeholder{Data here}}
% Fill in
% <?php echo "\n" . "\\renewcommand{\\test}{" . LatexTemplate::escape($data['test']) . "}\n"; ?>
Look messy? A multi-line block of PHP is a little easier to follow. This example is from the body of a table, see if you can figure out the syntax:
%<?php /*
% */ foreach($data['invoiceItem'] as $invoiceItem) { /*
% */ echo "\n" . LatexTemplate::escape($invoiceItem['item']) . " & " . /*
% */ LatexTemplate::escape($invoiceItem['qty']) . " & " . /*
% */ LatexTemplate::escape($invoiceItem['price']) . " & " . /*
% */ LatexTemplate::escape($invoiceItem['total']) . "\\\\\n"; /*
% */ } ?>
So what about this LatexTemplate::escape() business? In LaTeX, just about every symbol seems to be part of the syntax, so it is sadly not very simple to escape.
I have settled on the following series of str_replace() calls to sanitise information for display. It is crude but effective. Generating LaTex is much like generating SQL, HTML or LDIF from your website: it is quite important to make a habit of wrapping every piece of data with a function to prevent users from writing (‘injecting’) arbitrary code into your document:
/**
* Series of substitutions to sanitise text for use in LaTeX.
*
* http://stackoverflow.com/questions/2627135/how-do-i-sanitize-latex-input
* Target document should \usepackage{textcomp}
*/
public static function escape($text) {
// Prepare backslash/newline handling
$text = str_replace("\n", "\\\\", $text); // Rescue newlines
$text = preg_replace('/[\x00-\x1F\x7F-\xFF]/', '', $text); // Strip all non-printables
$text = str_replace("\\\\", "\n", $text); // Re-insert newlines and clear \\
$text = str_replace("\\", "\\\\", $text); // Use double-backslash to signal a backslash in the input (escaped in the final step).
// Symbols which are used in LaTeX syntax
$text = str_replace("{", "\\{", $text);
$text = str_replace("}", "\\}", $text);
$text = str_replace("$", "\\$", $text);
$text = str_replace("&", "\\&", $text);
$text = str_replace("#", "\\#", $text);
$text = str_replace("^", "\\textasciicircum{}", $text);
$text = str_replace("_", "\\_", $text);
$text = str_replace("~", "\\textasciitilde{}", $text);
$text = str_replace("%", "\\%", $text);
// Brackets & pipes
$text = str_replace("<", "\\textless{}", $text);
$text = str_replace(">", "\\textgreater{}", $text);
$text = str_replace("|", "\\textbar{}", $text);
// Quotes
$text = str_replace("\"", "\\textquotedbl{}", $text);
$text = str_replace("'", "\\textquotesingle{}", $text);
$text = str_replace("`", "\\textasciigrave{}", $text);
// Clean up backslashes from before
$text = str_replace("\\\\", "\\textbackslash{}", $text); // Substitute backslashes from first step.
$text = str_replace("\n", "\\\\", trim($text)); // Replace newlines (trim is in case of leading \\)
return $text;
}
We then have a template which we can include() from PHP, or run xelatex over. Below is minimal.tex, a minimal example of a PHP-latex template in this form:
% This file is a valid PHP file and also a valid LaTeX file
% When processed with LaTeX, it will generate a blank template
% Loading with PHP will fill it with details
\documentclass{article}
% Required for proper escaping
\usepackage{textcomp} % Symbols
\usepackage[T1]{fontenc} % Input format
% Because Unicode etc.
\usepackage{fontspec} % For loading fonts
\setmainfont{Liberation Serif} % Has a lot more symbols than Computer Modern
% Make placeholders visible
\newcommand{\placeholder}[1]{\textbf{$<$ #1 $>$}}
% Defaults for each variable
\newcommand{\test}{\placeholder{Data here}}
% Fill in
% <?php echo "\n" . "\\renewcommand{\\test}{" . LatexTemplate::escape($data['test']) . "}\n"; ?>
\begin{document}
\section{Data From PHP}
\test{}
\end{document}
Generate a PDF on the server
Here is where the fun begins. There is no plugin for compiling a LaTeX document, so we need to directly execute the command on a file.
Looks like we need to save the output somewhere then. You would generate your filled-in LaTeX code in a temporary file by doing something like this:
/**
* Generate a PDF file using xelatex and pass it to the user
*/
public static function download($data, $template_file, $outp_file) {
// Pre-flight checks
if(!file_exists($template_file)) {
throw new Exception("Could not open template");
}
if(($f = tempnam(sys_get_temp_dir(), 'tex-')) === false) {
throw new Exception("Failed to create temporary file");
}
$tex_f = $f . ".tex";
$aux_f = $f . ".aux";
$log_f = $f . ".log";
$pdf_f = $f . ".pdf";
// Perform substitution of variables
ob_start();
include($template_file);
file_put_contents($tex_f, ob_get_clean());
The next step is to execute your engine of choice on the output files:
// Run xelatex (Used because of native unicode and TTF font support)
$cmd = sprintf("xelatex -interaction nonstopmode -halt-on-error %s",
escapeshellarg($tex_f));
chdir(sys_get_temp_dir());
exec($cmd, $foo, $ret);
Once this is done, you can delete a lot of the extra LaTeX files, and check if a .pdf appeared as expected:
// No need for these files anymore
@unlink($tex_f);
@unlink($aux_f);
@unlink($log_f);
// Test here
if(!file_exists($pdf_f)) {
@unlink($f);
throw new Exception("Output was not generated and latex returned: $ret.");
}
And of course, send the completed file back via HTTP:
// Send through output
$fp = fopen($pdf_f, 'rb');
header('Content-Type: application/pdf');
header('Content-Disposition: attachment; filename="' . $outp_file . '"' );
header('Content-Length: ' . filesize($pdf_f));
fpassthru($fp);
// Final cleanup
@unlink($pdf_f);
@unlink($f);
}
The static functions escape($text) and download($data, $template_file, $outp_file) are together placed into a class called LatexTemplate for the remainder of the example (complete file on GitHub).
Gluing it all together
With the library and template, it is quite easy to set up a PHP script which triggers the above code:
<?php
require_once('../LatexTemplate.php');
$test = "";
if(isset($_GET['t'])) {
// Make the LaTeX file and send it through
$test = $_GET['t'];
if($test =="") {
// Test pattern to show symbol handling
for($i = 0; $i < 256; $i++) {
$test .= chr($i) . " . ";
}
}
try {
LatexTemplate::download(array('test' => $test), 'minimal.tex', 'foobar.pdf');
} catch(Exception $e) {
echo $e -> getMessage();
}
}
?>
<html>
<head>
<title>LaTeX test (minimal)</title>
</head>
</html>
<body>
<p>Enter some text to be placed on the output:</p>
<form>
<input type="text" name="t" /><input type="submit" value="Generate" />
</form>
</body>
</html>
The above code will show a form, which asks for input. When it gets some text, it will generate a PDF containing the text. If no text is given, it will output an ASCII table, simply to show that it can handle the symbols.
Once the template code is hidden away, this powerful technique is easily applied.
Results
This is only a minimal example. In any real application, your template would be more extensive.
Compiling the template directly creates this PDF:
From the web, a form is presented to fill this single field:
Which results in a PDF containing the user data:
Tips
- The text after \end{document} is not even parsed in latex. Use this area to write <?php ?> with
fewer constraints. - Consult the github repository for this code to see the complete example.
- Comment out the line @unlink($tex_f); of you want to preserve (for debugging, etc) the generated markup.