Winning 2048 game with key-mashing?

This new, simple, addictive game is out, called 2048. You need to slide two numbers together, resulting in a bigger number, in ever-increasing powers of two. You get 2048, and you win.

2014-03-2048-1

I noticed that somebody already wrote neat AI for it, although it does run quite slowly. But then I also noticed a friend mashing keys in a simple pattern, and thought I should test whether this was more effective. The answer: it kinda is.

2014-03-2048-2

At least in the first half of the game, a simple key-mashing pattern is a much faster way to build high numbers. The PHP script below will usually get to 512 without much trouble, but rarely to 1024. I would suggest running it for a while, and then taking over with some strategy.

The script

This script spits out commands which can be piped to xte for some automatic key-mashing on GNU / Linux. Save as 2048.php

#!/usr/bin/env php
mouseclick 1
<?php
for($i = 0; $i < 10; $i++) {
	move("Left", 1);
	move("Right", 1);
}

while(1) {
	move("Down", 1);
	move("Left", 1);
	move("Down", 1);
	move("Right", 1);
}

function move($dir, $count) {
	for($i = 0; $i < $count; $i++) {
		echo "key $dir\nsleep 0.025\n";
		usleep(25000);
	}
}

And then in a terminal, run this then click over to a 2048 game:

sleep 3; php 2048.php | xte

Good luck!

How to liberate your myki data

myki logo

myki is the public transport ticketing system in Melbourne. If you register your myki, you can view the usage history online. Unfortunately, you are limited to paging through HTML, or downloading a PDF.

This post will show you how to get your myki history into a CSV file on a GNU/Linux computer, so that you can analyse it with your favourite spreadsheet/database program.

Get your data as PDFs

Firstly, you need to register your myki, log in, and export your history. The web interface seemed to give you the right data if you chose blocks of 1 month.

Export myki data for each month

Once you do this, organise these into a folder filled with statements.

A folder filled with myki statements

You need the pdftotext utility to go on. In debian, this is in the poppler-utils package.

The manual steps below run you through how to extract the data, and at the bottom of the screen there are some scripts I’ve put together to do this automatically.

Manual steps to extract your data

These steps are basically a crash course in "scraping" PDF files.

To convert all of the PDF’s to text, run:

for i in *.pdf; do pdftotext -layout -nopgbrk $i; done

This preserves the line-based layout. The next step is to filter out the lines which don’t contain data. Each line we’re interested in begins with a date, followed by the word “Touch On”, “Touch Off”, or “Top Up”

18/08/2013 13:41:20   T...

We can filter all of the text files using grep, and a regex to match this:

cat *.txt | grep "^[0-3][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] *T"

The output looks like:
Filtered output, showing data

So what are we looking at?

  1. One row per line
  2. Fields delimited by multiple spaces

To collapse every double-space into a tab, we use unexpand. Then, to collapse duplicate tabs, we use tr:

cat filtered-data.txt | unexpand -t 2 | tr -s '\t'

Finally, some fields need to be quoted, and tabs need to be converted to CSV. The PHP script below will do that step.

Scripts to get your data

myki2csv.sh is a script which performs the above manual steps:

#!/bin/bash
# Convert myki history from PDF to CSV
#	(c) Michael Billington < michael.billington@gmail.com >
#	MIT Licence
hash pdftotext || exit 1
hash unexpand || exit 1
pdftotext -layout -nopgbrk $1 - | \
	grep "^[0-3][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] *T" | \
	unexpand -t2 | \
	tr -s '\t' | \
	./tab2csv.php > ${1%.pdf}.csv

tab2csv.php is called at the end of the above script, to turn the result into a well-formed CSV file:

#!/usr/bin/env php
<?php
/* Generate well-formed CSV from dodgy tab-delimitted data
	(c) Michael Billington < michael.billington@gmail.com >
	MIT Licence */
$in = fopen("php://stdin", "r");
$out = fopen("php://stdout", "w");
while($line = fgets($in)) {
	$a = explode("\t", $line);
	foreach($a as $key => $value) {
		$a[$key]=trim($value);
		/* Quote out ",", and escape "" */
		if(!(strpos($value, "\"") === false &&
				strpos($value, ",") === false)) {
			$a[$key] = "\"".str_replace("\"", "\"\"", $a[$key])."\"";
		}
	}
	$line = implode(",", $a) . "\r\n";
	fwrite($out, $line);
}

Invocation

Call script on a single foo.pdf to get foo.csv:

./myki2csv.sh foo.pdf

Convert all PDF’s to CSV and then join them:

for i in *.pdf; do ./myki2csv.sh $i; done
tac *.csv > my-myki-data.csv

Importing into LibreOffice

The first field must be marked as a DD/MM/YYYY date, and the “zones” need to be marked as text (so that “1/2” isn’t treated as a fraction!)

These are my import settings:

Options to import the myki data into LibreOffice

Happy data analysis!

Update 2013-09-18: The -nopgbrk option was added to the above instructions, to prevent page break characters causing grep to skip one valid line per page

Update 2014-05-04: The code for the above, as well as this follow-up post are now available on github.

Making an XKCD-style password generator in C++

I’m learning C++ at the moment, and I don’t find long tutorials or studying the standard template library particularly fun.

Making this type of password-generator is not new, but it is a nice practical exercise to start out in any language.

1. Get a list of common English words

Googling “common English words” yielded this list, purporting to contain 5,000 words. Unfortunately it contains almost 1,000 duplicates and numerous non-words! Wiktionary has a much higher-quality list of words compiled from Project Gutenberg, but the markup looks a bit like this:

==== 1 - 1000 ====
===== 1 - 100 =====
[[the]] = 56271872
[[of]] = 33950064
[[and]] = 29944184
[[to]] = 25956096
[[in]] = 17420636
[[I]] = 11764797  

Noting the wikilinks surrounding each word, I put together this PHP script to extract the link destinations and called it get-wikilinks.php:

#!/usr/bin/php
<?php
/* Return list of wikilinked words from input text */
$text = explode("[[", file_get_contents("php://stdin"));
foreach($text as $link) {
	$rbrace = strpos($link, "]]");
	if(!$rbrace === false) {
		/* Also escape on [[foo|bar]] links */
		$pipe = strpos($link, "|");
		if(!$pipe === false && $pipe < $rbrace) {
			$rbrace = $pipe;
		}
		$word = trim(substr($link, 0, $rbrace))."n";
		if(strpos($word, "'") === false && !is_numeric(substr($word, 0, 1))) {
			/* Leave out words with apostrophes or starting with numbers */
			echo $word;
		}
	}
}

The output of this script is much more workable:

$ chmod +x get-wikilinks.php
$ cat wikt.txt | ./get-wikilinks.php
the
of
and
to
in
I

Using sort and uniq makes a top-notch list of common words, ready for an app to digest:

$ cat wikt.txt | ./get-wikilinks.php | sort | uniq > wordlist.txt

2. Write some C++

There are two problems being solved here:

  • Reading a file into memory
    • An ifstream is used to access the file, and getline() will return false when EOF has been reached
    • Each line is loaded into a vector (roughly the same type of container as an ArrayList in Java), which is resized dynamically and accessed like an array.
  • Choosing random numbers
    • These are seeded from a random_device, being more cross-platform than reading from a file like /dev/urandom.
    • Note that random is new to C++11.
pw.cpp
#include <fstream>
#include <vector>
#include <string>
#include <iostream>
#include <random>
#include <cstdlib>

using namespace std;

int main(int argc, char* argv[]) {
    const char* fname = "wordlist.txt";

    /* Parse command-line arguments */
    int max = 1;
    if(argc == 2) {
        max = atoi(argv[1]);
    }

    /* Open word list file */
    ifstream input;
    input.open(fname);
    if(input.fail()) {
        cerr << "ERROR: Failed to open " << fname << endl;
    }

    /* Read to end and load words */
    vector<string> wordList;
    string line;
    while(getline(input, line)) {
        wordList.push_back(line);
    }

    /* Seed from random device */
    random_device rd;
    default_random_engine gen;
    gen.seed(rd());
    uniform_int_distribution<int> dist(0, wordList.size() - 1);

    /* Output as many passwords as required */
    const int pwLen = 4;
    int wordId, i, j;
    for(i = 0; i < max; i++) {
        for(j = 0; j < pwLen; j++) {
            cout << wordList[dist(gen)] << ((j != pwLen - 1) ? " " : "");
        }
        cout << endl;
    }

    return 0;
}

3. Compile

Lots of projects in compiled languages have a Makefile, so that you can compile them without having to type all the compiler options manually.

Makefiles are a bit heavy to learn properly, but for a project this tiny, something simple is fine:

default:
	g++ pw.cpp -o pw -std=c++11

clean:
	rm -f pw

Now we can compile and run the generator:

make
./pw

The output looks like this for ./pw 30 ("generate 30 passwords"):

Moodlification

Moodle is an alright piece of software, but if you ever try to discuss code, it will mangle it and create a horrible mess.

To mitigate this, write your posts in HTML view, and paste your code as it appears after running it through this moodlify.php script:

#!/usr/bin/php
<?php
/* moodlify.php -- whitespace fixes for code to post on moodle */
if(!isset($argv[1])) {
    echo "Usage: " . $argv[0] . " [file]\n";
    exit(0);
}
$file = $argv[1];
if(!file_exists($file) || !$code = file_get_contents($file)) {
    echo "Could not open file: ".$argv[1]."\n";
    exit(1);
}

/* Series of find-replaces to strip whitespace */
$code = htmlentities($code); /* < > " ' etc */
$code = str_replace("\t", "    ", $code); /* Tabs (4 spaces)*/

$code = str_replace("  ", "&nbsp;&nbsp;", $code); /* Double-spaces */
$code = str_replace("\n", "<br />", $code); /* Newlines */
$code = "<div style=\"padding-left:4em;\"><code>".$code."</code></div>\n";

/* Output options */
if(isset($argv[2])) {
    file_put_contents($argv[2], $code);
} else {
    echo $code;
}?>

And moodle will never mangle your code again!

Update 2012-09-05 Corrected the above code due to out own CMS mangling the code, due to a workaround for this horrible (now deprecated) anti-feature.

Alphanumeric phone numbers

Some popular phone numbers (eg. 1300-FOOBAR) are not numbers at all. If you are running a VOIP server then you may be interested in this snippet of PHP code to convert them into actual numbers, allowing users to dial by typing the more familiar form.

function normaliseTelephoneNumber($start) {
	/* Return an extension with numbers substituted in place of letters for dialling */
	$map = array(	"A" => "2", "B" => "2", "C" => "2",
			"D" => "3", "E" => "3", "F" => "3",
			"G" => "4", "H" => "4", "I" => "4",
			"J" => "5", "J" => "5", "L" => "5",
			"M" => "6", "N" => "6", "O" => "6",
			"P" => "7", "Q" => "7", "R" => "7", "S" => "7",
			"T" => "8", "U" => "8", "V" => "8",
			"W" => "9", "X" => "9", "Y" => "9", "Z" => "9",
			"+" => "+", "*" => "*", "#" => "#");
	$new = "";
	$hasnumber = false;
	$ext = strtoupper($start);
	for($i = 0; $i < strlen($ext); $i++) {
		$c = substr($ext, $i, 1);
		if(isset($map[$c])) {
			$new .= $map[$c];
			if($hasnumber == false) {
				/* No numbers before letters */
				return $start;
			}
		} else if(is_numeric($c)) {
			$new .= $c;
			$hasnumber = true;
		}
	}

	if($hasnumber == true) {
		return $new; /* Return numeric version as appropriate */
	} else {
		return $start; /* Leaves full words like "joe" or "bazza" unchanged */
	}
	return $new;
}

Note that this will only alter the number if it begins with numbers. This is to make sure that (at least in my case) the local network extensions don't get messed with:

echo normaliseTelephoneNumber("1300-FOOBAR")."n"; /* 1300366227 */
echo normaliseTelephoneNumber("mike")."n"; /* mike */

Beautiful QR Codes

The verdict is in. QR codes are ugly. But they don’t have to be. Check out the modified code featured on this Wikipedia article. You don’t need to be a QR Code expert to do something like that.

The basic idea is that QR codes have error Correction. We can generate codes which store the data in multiple places, so that scanners will still read them if they are damaged.

Scripting the whole operation

Tedious image editing is not my cup of tea, so I made a PHP class to apply some templates to QR codes and add a centred logo for us.

Note: If you don’t have ImageMagick for PHP, read this page. On Ubuntu, apt-get install php5-imagick worked fine for me.

This is the basic formula:

QR Code + Template + Logo = Pretty PR Code

First we need the QR code. It is best generated with the phpqrcode library. Adding the logo wont work unless we use high error correction (H):

QRcode::png("http://bitrevision.com", "code.png", 'H', 8, 0);

After that, code.png looks like this:

Now the templates. Note that we used 8×8 pixels per block and 0 for the border above. The resulting code may line up with one of these templates, adding white lines over the image. Save these images to a ‘template’ folder:

Next, the logo. We have one of those:

Time for the code. This class will do most of the work. Just check the template folder contains an overlay that fits your code.

class QR_Pretty {
	public $template_base = "template/template-{SIZE}.png";
	public $qr = false;
	public $geometry = false;

	function prettify($file, $logo, $output = '') {
		/* Load image */
		$this -> qr = new Imagick();
		$this -> qr -> readImage($file);
		$this -> geometry = $this -> qr -> getImageGeometry();

		/* Perform modifications */
		$this -> add_template();
		$this -> add_logo($logo);

		/* Output image */
		if($output != '') {
			$this -> qr -> setImageFileName($output);
		}
		$this -> qr -> writeImage();
	}

	private function add_template() {
		/* This will overlay a template containing white lines,
			to make the QR codes look less code-ful */
		$size = $this -> geometry['width'];
		$template_filename = str_replace("{SIZE}", $size, $this -> template_base);
		$template = new Imagick();
		$template -> readImage($template_filename);
		$this -> qr -> compositeImage($template, imagick::COMPOSITE_OVER, 0, 0 );
	}

	private function add_logo($logo_file = false) {
		/* This places a logo in the middle of the QR code,
			64x64 would be advisable :) */
		if(!$logo_file) {
			/* No logo to add */
			return false;
		}
		$logo = new Imagick();
		$logo -> readImage($logo_file);
		$logo_size = $logo -> getImageGeometry();
		$x = ($this -> geometry['width'] - $logo_size['width']) / 2;
		$y = ($this -> geometry['height'] - $logo_size['height']) / 2;
		$this -> qr -> compositeImage($logo, imagick::COMPOSITE_OVER, $x, $y);
	}
}

To use the above class is quite simple. Once you have code.png, logo.png, and a folder full of templates, just do this:

$qr = new QR_Pretty();
$qr -> prettify("code.png", "bitrevis.png", "pretty.png");

That gives us pretty.png, which looks like this:

Thanks to that error correction, this picture still scans and takes us to http://bitrevision.com. Slightly larger logos can be used, but 64×64 looks good and scans reliably. Try embedding logos with transparency too!

This means no more excuses for ugly QR codes. Integrate this into your scripts right away.

Using speech synthesis in AGI apps

For some of my telephone apps, stringing together pre-recorded messages is a pain, and it’s not even useful for more dynamic content. This is how I got PHP to do sound output via festival whilst using the asterisk AGI.

First, I made a folder called agi in my /var/lib/asterisk/sounds/en/. This is where we will cache sound files.

This shell script makes a sound file with your text, and saves it in that directory:

#!/bin/sh
echo "$1" | text2wave -otype ulaw -o /var/lib/asterisk/sounds/en/agi/$2.ulaw

For the PHP side, we just use MD5 to name the files based on their contents, and only run the shell script if no file has been created with the text we want.

In this project, $agi refers to an instance of PHP AGI. (highly recommended)

function sayNow($string, $escape = "") {
	global $agi;
	$sounds_path	= "/var/lib/asterisk/sounds/en/agi/";
	$agi_path	= "/var/lib/asterisk/agi-bin/"
	$fn = md5($string);
	$string = str_replace(""", "", $string);
	$string = str_replace(";", ",", $string);
	$string = str_replace("'", "", $string);
	$string = str_replace(""", " slash ", $string);
	if(!file_exists("$sounds_path$fn")) {
		system("$agi_path$common/sound.sh "$string" "$fn"");
	}
	return $agi -> stream_file("$sounds_path$fn", $escape);
}

This makes sound output at least a million times easier. We use it like this:

$a = sayNow("Enter the number, then press hash");

That’s all there is to it. I should probably also set up a cron job to clear the sound files at the end of each week.

Converting Numbers To Words in PHP

This is a straightforward coding task. I’m working on some maths code in PHP, and need a function to output “twenty-five” for 25, “fifteen” for 15, etc. A quick google search pulled up a neat little PEAR package which can do this.

The results weren’t as flash as I’d hoped though. We ended up with this:

894: eight hundred ninety-four

So it turns out that the PEAR class doesn’t print commas or the word ‘and’ in its numbers. We will be feeding our numbers to festival, and also using them for maths questions. Those pesky ands and commas are a must for this project, so this is not good enough:

9539: nine thousand five hundred thirty-nine

Instead, I need:

9539: nine thousand, five hundred and thirty-nine

I found some commented out code, and tried my own modifications, but it wasn’t working right, so I scrapped the PEAR class and started from scratch, using Wikipedia to populate the lists:

The results were perfect. It took about 300 lines to replace the class, and it handles ordinal numbers too. (‘1st’ = ‘first’, ‘100th = one hundredth’, etc). I ditched the currency feature.

To download the replacement class, click here.

It’s simple to use, just express ridiculous numbers or long decimals as strings to avoid errors. See this example for features:

<?php
include("Numbers_Words.php");
 // one
echo Numbers_Words::toWords(1); newline();

 // two
echo Numbers_Words::toWords(2); newline();

 // twenty-five
echo Numbers_Words::toWords(25); newline();

 // one thousand
echo Numbers_Words::toWords(1000); newline();

 // one thousand and one
echo Numbers_Words::toWords(1001); newline();

 // one hundred thousand and one
echo Numbers_Words::toWords(100001); newline();

 // one hundred and twenty-three million, four hundred and fifty-six thousand, seven hundred and eighty-nine
echo Numbers_Words::toWords("123 456 789"); newline();

 // thirty-six point nine seven
echo Numbers_Words::toWords(36.97); newline();

 /* nine novemvigintillion, eight hundred and seventy-two octovigintillion, three hundred and fourty-eight septemvigintillion,
	nine hundred and seventy-two sesvigintillion, four hundred and ninety-two quinquavigintillion,
	three hundred and eighty-four quattuorvigintillion, nine hundred and two tresvigintillion,
	three hundred and eighty-four duovigintillion, two hundred and ninety unvigintillion, three hundred and eighty-four vigintillion,
	two hundred and ninety novemdecillion, three hundred and fourty-two octodecillion, five hundred and sixty-three septendecillion,
	four hundred and seventy-five sexdecillion, six hundred and thirty-four quindecillion, eight hundred and fifty-seven quattuordecillion,
	four hundred and fifty-seven tredecillion, three hundred and fourty-nine duodecillion, eight hundred and fifty-seven undecillion,
	two hundred and thirty-four decillion, five hundred and twenty-three nonillion, five hundred and thirty-four octillion,
	eight hundred and fifty-three septillion, two hundred and ninety sextillion, four hundred and seventy-eight quintillion,
	two hundred and ninety quadrillion, three hundred and fourty-seven trillion, two hundred and thirty-eight billion,
	nine hundred and fourty-six million, five hundred and thirty-eight thousand, four hundred and seventy-six */
echo Numbers_Words::toWords("9872348972492384902384290384290342563475634857457349857234523534853290478290347238946538476"); newline();

 // seventeenth
echo Numbers_Words::toWords("17th"); newline();

 // eight hundred and sixty-third
echo Numbers_Words::toWords("863rd"); newline();

 // negative seventy-eight point four
echo Numbers_Words::toWords("-78.4"); newline();

function newline() {
	echo "<br />n";
}
?>

Using correct strings makes synthetic voices much less annoying, and nobody can complain about bad maths questions. 🙂

Scripting Windows Shares

As much as I try to avoid it, sometimes I need to use Windows servers, and windows .bat files aren’t exactly the pinnacle of scripting languages. This post is about bulk-sharing home directories with consistent permissions.

As I’m a reformed Visual Basic programmer, so I decided to solve this in PHP rather than VB script.

This crude script will make a batch file to share every subdirectory (ie, hundreds of users’ home directories), and also delete desktop.ini from each of them them. Save the code below as magic.php, run it, and then run tricks.bat.

<?php
$stuff = whats_here();
$tricks = fopen("tricks.bat", "w");
foreach($stuff as $folder) {
	$line = do_things($folder);
	fwrite($tricks, $line);
}
fclose($tricks);

function do_things($folder) {
	$things .= ":: $folderrn";
	$things .= "net share $folder /DELETErn";
	$things .= "net share $folder=".getcwd()."\$folder /GRANT:EVERYONE,FULLrn";
	$things .= "del /Q $folderdesktop.inirnrn";
	return $things;
}

function whats_here() {
	/* List directories in this one */
	$here = opendir(getcwd());
	$dir  = array();
	while($kid = readdir($here)) {
		if(is_dir($kid) && $kid != "." && $kid != "..") {
			$dir[] = $kid;
		}
	}
	closedir($here);
	return $dir;
}?>

I’ve heard that Windows PowerShell is pretty useful once you get used to it, but Windows sysadmins seem to dislike scripting as much as I dislike using repetitious GUI interfaces. Oh well, a couple of PHP scripts wont hurt. 🙂

Loading OEIS integer sequences

To show that interpreted languages can be fast when used well, I’m posting this example.

Take the On-line Encyclopedia of Integer Sequences database, which is a collection of integer sequences. That means lists of numbers. You can get a file from http://oeis.org/stripped.gz, which contains the first few numbers of each sequence. Today’s file extracts to about 38MB.

Now I need to do lookups in this file for a program I’m writing, and that program is in PHP. We want to know some sequences based on their A-number, like this:

$primes = oeis("A000040");
$fibonacci = oeis("A000045");

If we’re smart about it, then even a large file can be parsed in fractions of a second. Here’s how we do it:

/* This code needs an extract of the OEIS database to operate.
	I got it from http://oeis.org/stripped.gz
	Just extract that to this folder for lookups */
function oeis($number) {
	/* Return an array of values based on a sequence's OEIS number */
	$number = strtoupper($number);
	$fp = fopen("stripped", "r");
	while($ln = fgets($fp)) { /* Find this sequence and break the loop */
		if(substr($ln, 0, strlen($number) + 1) == $number." ") {
			$res = $ln;
			break;
		}
	}
	fclose($fp);
	/* Exit if we haven't got anything */
	if(!isset($res)) { return false; }
	$rv = explode(" ", $ln); /* Split lines into left and right of space */
	$ln = trim($rv[1]);
	$ln = substr($ln, 1, strlen($ln)-2); /* Slices off extra commas on sides */
	$rv = explode(",", $ln); /* Split by commas */
	return $rv;
}

Note that we don’t use explode() until after we have found the line we need, and also note that file_get_contents() is not used at all. (Multi-megabyte strings will bog you down in any language).