Using speech synthesis in AGI apps

For some of my telephone apps, stringing together pre-recorded messages is a pain, and it’s not even useful for more dynamic content. This is how I got PHP to do sound output via festival whilst using the asterisk AGI.

First, I made a folder called agi in my /var/lib/asterisk/sounds/en/. This is where we will cache sound files.

This shell script makes a sound file with your text, and saves it in that directory:

#!/bin/sh
echo "$1" | text2wave -otype ulaw -o /var/lib/asterisk/sounds/en/agi/$2.ulaw

For the PHP side, we just use MD5 to name the files based on their contents, and only run the shell script if no file has been created with the text we want.

In this project, $agi refers to an instance of PHP AGI. (highly recommended)

function sayNow($string, $escape = "") {
	global $agi;
	$sounds_path	= "/var/lib/asterisk/sounds/en/agi/";
	$agi_path	= "/var/lib/asterisk/agi-bin/"
	$fn = md5($string);
	$string = str_replace(""", "", $string);
	$string = str_replace(";", ",", $string);
	$string = str_replace("'", "", $string);
	$string = str_replace(""", " slash ", $string);
	if(!file_exists("$sounds_path$fn")) {
		system("$agi_path$common/sound.sh "$string" "$fn"");
	}
	return $agi -> stream_file("$sounds_path$fn", $escape);
}

This makes sound output at least a million times easier. We use it like this:

$a = sayNow("Enter the number, then press hash");

That’s all there is to it. I should probably also set up a cron job to clear the sound files at the end of each week.

Converting Numbers To Words in PHP

This is a straightforward coding task. I’m working on some maths code in PHP, and need a function to output “twenty-five” for 25, “fifteen” for 15, etc. A quick google search pulled up a neat little PEAR package which can do this.

The results weren’t as flash as I’d hoped though. We ended up with this:

894: eight hundred ninety-four

So it turns out that the PEAR class doesn’t print commas or the word ‘and’ in its numbers. We will be feeding our numbers to festival, and also using them for maths questions. Those pesky ands and commas are a must for this project, so this is not good enough:

9539: nine thousand five hundred thirty-nine

Instead, I need:

9539: nine thousand, five hundred and thirty-nine

I found some commented out code, and tried my own modifications, but it wasn’t working right, so I scrapped the PEAR class and started from scratch, using Wikipedia to populate the lists:

The results were perfect. It took about 300 lines to replace the class, and it handles ordinal numbers too. (‘1st’ = ‘first’, ‘100th = one hundredth’, etc). I ditched the currency feature.

To download the replacement class, click here.

It’s simple to use, just express ridiculous numbers or long decimals as strings to avoid errors. See this example for features:

<?php
include("Numbers_Words.php");
 // one
echo Numbers_Words::toWords(1); newline();

 // two
echo Numbers_Words::toWords(2); newline();

 // twenty-five
echo Numbers_Words::toWords(25); newline();

 // one thousand
echo Numbers_Words::toWords(1000); newline();

 // one thousand and one
echo Numbers_Words::toWords(1001); newline();

 // one hundred thousand and one
echo Numbers_Words::toWords(100001); newline();

 // one hundred and twenty-three million, four hundred and fifty-six thousand, seven hundred and eighty-nine
echo Numbers_Words::toWords("123 456 789"); newline();

 // thirty-six point nine seven
echo Numbers_Words::toWords(36.97); newline();

 /* nine novemvigintillion, eight hundred and seventy-two octovigintillion, three hundred and fourty-eight septemvigintillion,
	nine hundred and seventy-two sesvigintillion, four hundred and ninety-two quinquavigintillion,
	three hundred and eighty-four quattuorvigintillion, nine hundred and two tresvigintillion,
	three hundred and eighty-four duovigintillion, two hundred and ninety unvigintillion, three hundred and eighty-four vigintillion,
	two hundred and ninety novemdecillion, three hundred and fourty-two octodecillion, five hundred and sixty-three septendecillion,
	four hundred and seventy-five sexdecillion, six hundred and thirty-four quindecillion, eight hundred and fifty-seven quattuordecillion,
	four hundred and fifty-seven tredecillion, three hundred and fourty-nine duodecillion, eight hundred and fifty-seven undecillion,
	two hundred and thirty-four decillion, five hundred and twenty-three nonillion, five hundred and thirty-four octillion,
	eight hundred and fifty-three septillion, two hundred and ninety sextillion, four hundred and seventy-eight quintillion,
	two hundred and ninety quadrillion, three hundred and fourty-seven trillion, two hundred and thirty-eight billion,
	nine hundred and fourty-six million, five hundred and thirty-eight thousand, four hundred and seventy-six */
echo Numbers_Words::toWords("9872348972492384902384290384290342563475634857457349857234523534853290478290347238946538476"); newline();

 // seventeenth
echo Numbers_Words::toWords("17th"); newline();

 // eight hundred and sixty-third
echo Numbers_Words::toWords("863rd"); newline();

 // negative seventy-eight point four
echo Numbers_Words::toWords("-78.4"); newline();

function newline() {
	echo "<br />n";
}
?>

Using correct strings makes synthetic voices much less annoying, and nobody can complain about bad maths questions. :)

Scripting Windows Shares

As much as I try to avoid it, sometimes I need to use Windows servers, and windows .bat files aren’t exactly the pinnacle of scripting languages. This post is about bulk-sharing home directories with consistent permissions.

As I’m a reformed Visual Basic programmer, so I decided to solve this in PHP rather than VB script.

This crude script will make a batch file to share every subdirectory (ie, hundreds of users’ home directories), and also delete desktop.ini from each of them them. Save the code below as magic.php, run it, and then run tricks.bat.

<?php
$stuff = whats_here();
$tricks = fopen("tricks.bat", "w");
foreach($stuff as $folder) {
	$line = do_things($folder);
	fwrite($tricks, $line);
}
fclose($tricks);

function do_things($folder) {
	$things .= ":: $folderrn";
	$things .= "net share $folder /DELETErn";
	$things .= "net share $folder=".getcwd()."\$folder /GRANT:EVERYONE,FULLrn";
	$things .= "del /Q $folderdesktop.inirnrn";
	return $things;
}

function whats_here() {
	/* List directories in this one */
	$here = opendir(getcwd());
	$dir  = array();
	while($kid = readdir($here)) {
		if(is_dir($kid) && $kid != "." && $kid != "..") {
			$dir[] = $kid;
		}
	}
	closedir($here);
	return $dir;
}?>

I’ve heard that Windows PowerShell is pretty useful once you get used to it, but Windows sysadmins seem to dislike scripting as much as I dislike using repetitious GUI interfaces. Oh well, a couple of PHP scripts wont hurt. :)

Loading OEIS integer sequences

To show that interpreted languages can be fast when used well, I’m posting this example.

Take the On-line Encyclopedia of Integer Sequences database, which is a collection of integer sequences. That means lists of numbers. You can get a file from http://oeis.org/stripped.gz, which contains the first few numbers of each sequence. Today’s file extracts to about 38MB.

Now I need to do lookups in this file for a program I’m writing, and that program is in PHP. We want to know some sequences based on their A-number, like this:

$primes = oeis("A000040");
$fibonacci = oeis("A000045");

If we’re smart about it, then even a large file can be parsed in fractions of a second. Here’s how we do it:

/* This code needs an extract of the OEIS database to operate.
	I got it from http://oeis.org/stripped.gz
	Just extract that to this folder for lookups */
function oeis($number) {
	/* Return an array of values based on a sequence's OEIS number */
	$number = strtoupper($number);
	$fp = fopen("stripped", "r");
	while($ln = fgets($fp)) { /* Find this sequence and break the loop */
		if(substr($ln, 0, strlen($number) + 1) == $number." ") {
			$res = $ln;
			break;
		}
	}
	fclose($fp);
	/* Exit if we haven't got anything */
	if(!isset($res)) { return false; }
	$rv = explode(" ", $ln); /* Split lines into left and right of space */
	$ln = trim($rv[1]);
	$ln = substr($ln, 1, strlen($ln)-2); /* Slices off extra commas on sides */
	$rv = explode(",", $ln); /* Split by commas */
	return $rv;
}

Note that we don’t use explode() until after we have found the line we need, and also note that file_get_contents() is not used at all. (Multi-megabyte strings will bog you down in any language).

Parsing Asterisk Configuration

If you are writing for Asterisk PBX, you may feel the need to create a public telephone directory.

Today we’ve been putting together Phonebook, a web-based phonebook which pulls data from an asterisk server. We used JQTouch for the interface, with PHP to process data files. We also use a CSV file exported from our hosted gmail to ensure that people’s names are spelled correctly.

The configuration

Asterisk’s configuration files are INI-like, and you’ll find them in /etc/asterisk/ on most systems. Just one important thing though, asterisk lets you use brackets ( ) in caller-IDs, etc, which will cause PHP’s parse_ini_file() function to fizzle out and die due to an alleged syntax error.

To save yourself some trouble, use this little class I wrote, which will load INI data into an associative array:

class ns_ini_parser {
	/* Mike's non-standard INI parser for asterisk files. https://mike42.me
		Note that PHP's parse_ini_file will die with a syntax error on key = value (bracket), which is unacceptable */

	function parse_string($string) {
		$lines		= explode("n", $string);
		$section	= "0";
		$result		= array();
		foreach($lines as $line) {
			$line = trim($line);
			if($line == "" || substr($line, 0,1) == ";") {
				/* Comment, no action */
			} elseif(substr($line, 0,1) == "[") {  /* [section] */
				$l		  = strlen($line);
				$line		  = trim(substr($line, 1, $l - 2));	/* Strip brackets */
				$section	  = $line;
				if(!isset($result[$section])) {
					$result[$section] = array();
				}
			} else {				/* key = val */
				$parts = explode("=", $line);
				$key = trim($parts[0]);	/* The key is everything left of the equal */
				unset($parts[0]);	/* Got that, unset it */
				$val = trim(join("=", $parts));	/* Value is everything on the righht */
				$result[$section][$key] = $val;
			}
		}
		return $result;
	}

	function parse_file($path) {
		$string = file_get_contents($path);
		return $this -> parse_string($string);
	}
}

The example usage below will list user’s extensions next to their caller ID, which you could use for a web phonebook:

/* Example PHP code to parse asterisk configuration */
$config_parser = new ns_ini_parser;
$users  = $config_parser -> parse_file("/etc/asterisk/users.conf");

foreach($users as $id => $user) {
	if(is_numeric($id)) { /* Only show numbers, not other sections */
		echo "<a href="tel:$id">$id</a> ".$user['fullname']."<br />";
	}
}

I don’t think it gets easier than that! I’ll post the rest once you can manage the contacts as well.

Some scripts to make word puzzles

Q M Q V O U F C D P
Q L I E U N E E B L
U R T C B X N P Q B
C N Y W H E Y U X T
S X R S V A C Z D K
B E Z Z S Z E Z M P
F Q A Z R O F L Q M
K I F R L S Y E U H
K M N I C W H T X V
W O R D N H A C D Q

I’ve put together a couple of PHP scripts to make puzzles. The humble find-a-word, a word scrambler, and a cipher.

The output is just HTML, so you can include them on web-pages if you like (look there’s one over there! :o)

I still need to write a command-line interface, as making a find-a-word large enough to hold every single word in the English dictionary is a bit too much for one page-load.

But hey does that sound fun or what? I’m going to market word-search wallpaper!

Write something on a chessboard

I’ve put together a little algorithm called chess104. It will let you encode data as positions of chess pieces (104 bits of data, hence the name).

That works out to 20 characters using a squashy 5-bit encoding, but there are other options too if you really feel the need to write "COFFEE" in efficient hexadecimal.

If I can find a speedy way to encode more data, there might be a sequel to this. 104 bits is nowhere near the limit, but my other ideas were too much for my netbook to handle (presumably that makes them bad candidates for running as a web app).

Samoan Language Resources Section Started

I’ve put the beginnings of my Samoan Language section online. It’s still about 6-months away from becoming useful, but you can see the start of it here.

The story is: I wanted to put my language notes online, but in the 21st century, there’s no way I’d do it without audio, and hyperlinks to a good dictionary.

There doesn’t seem to be anything like that online yet, so I took the liberty of creating a neat database combining the words with audio, definitions, and examples which can be embedded in the notes.

That opens up a lot of ideas for other things, but for now I still have a lot of data to collect. Back to work!

I blame tetrads!

Tetrad nets

If you play too much Tetris, then you should check out the new subpage: The Tetrad Corner. It’s a collection of things I’ve made this week about the shapes used in tetris (‘tetrads’).

The original idea was that it would be cool to have a tetradic phone password. That is, your password makes a Tetris shape on the keypad. I was bored enough this week to actually write a little script for it, along with a PHP class for rotating the blocks, rendering them as a HTML table, etc.

It turned out alright, but fearing that I had begun something as time-consuming as tetris itself, I banished it to a single-page sub-site. The source code is there if you want those crufty HTML-tetrads on your site: (I don’t!)

Crufty HTML-generated tetrads

Now what I do want is a few tetris blocks for my desk, so I made up some nets to print out and cut up. Now I can make a small paper Tetris game! (PDF linked to image above). Neat huh?

A tetris game in the palm of my hand!

Update 12/4: I built these shapes today. There were two tabs missing on the nets (fixed!), so I cheated by using sticky tape. Click to enlarge the cheesily-colourised image.