Writing in Ancient Egyptian with HieroTeX

When I started learning Ancient Egyptian, I wanted to be able to type hieroglyphs alongside regular text, for printing translations. There is a package for the typesetting system LaTeX which does this, called “HieroTeX“. It took me a while to figure out how to use it, but the results are top-notch:

Example of HieroTex output

Because I’ve installed this on quite a few computers, I’m writing up this blog post to make it easier for other GNU/Linux users who are trying to figure it out.

Installation

This is tricky, because:

  • There is no Debian package! Uh oh.
  • Debian is phasing out tetex in favour of texlive
  • The variables.mk file needs to be edited for the install to work (diff to apply / how to apply it). This is because the default installation target is the user’s home directory.

I put togethter this script, hierotex-install-3.5.sh, which will get a working HieroTeX install on any recent version of Debian.

#!/bin/sh
# Script to download and install HieroTeX on a Debian computer.
# Use at your own risk.
#
# Some packages you should install first:
# 	apt-get install texlive make gcc

# Get and extract the files
wget -c "http://webperso.iut.univ-paris8.fr/~rosmord/archives/HieroTeX-3.5.tgz" && 
tar xvzf "HieroTeX-3.5.tgz" && 
cd HieroTeX && 
wget -c "http://webperso.iut.univ-paris8.fr/~rosmord/archives/HieroType1-3.1.4.tgz" && 
tar xvzf "HieroType1-3.1.4.tgz"

# Patch variable.mk to install for the whole system
wget http://mike42.me/blog/files/variable.patch && 
patch variable.mk < variable.patch

# Run the installer
sudo make tetex-install

Note: This page is great, but the variables.mk suggested for Debian/Ubuntu does not include the documentation folder, which will cause the installer to crash. It also suggests using tetex, which will not exist in future Debian releases! This is probably fine if you are on a .rpm-flavoured distro.

How to use

Firstly, you will need to know a little bit about the LaTeX typesetting system. See wikibooks.

HieroTeX accepts markup in Manuel de Codage format, which you will either need to learn, or get a tool which helps you mark up text in it. This Linux for Egyptologists page has some excellent suggestions.

The block of LaTeX code below is from my tex-examples repo, and was used to generate the image of Tutankhamun’s cartouche above.

\documentclass[a4paper]{article}
\usepackage{hiero}
\usepackage{egypto}
\begin{document}
	\section*{Egyptian hieroglyph example}

	\begin{hieroglyph}zA ra < i-mn:n-t-G43-t-S34 HqA-iwn-Sma >\end{hieroglyph} \\
	{\em Tutankhamun Hekaiunushema} \\
	Living Image of Amun, ruler of Upper Heliopolis
\end{document}

To build the file, you need to filter it through sesh command. Something like this would work:

cat hierotex-example.tex | sesh > hierotex-example-2.tex
latex hierotex-example-2.tex

The actual example uses a Makefile to do this.

Update May 2016: The original website for HieroTeX has gone offline, but is available via the Internet Archive: webperso.iut.univ-paris8.fr/~rosmord/archives/

Infinite loop in a Makefile

I had to kick myself for writing this bug into a Makefile today. I am writing it down so that I remember to use &&, and not ; next time.

The Makefile had this target in it:

clean:
	cd folder; \
		make clean
	rm -f file

The problem here is that the statement cd folder; make clean counts as one "line", and make wont notice a problem (and stop) until that line has executed.

Unfortunately, the folder name had a typo, so this script had a chance to call make clean without changing directory, calling itself and becoming stuck in an infinite loop.

So long story short, this is why Makefiles should be littered with the "and" operator.

clean:
	cd folder && \
		make clean
	rm -f file

How to liberate your myki data

myki logo

myki is the public transport ticketing system in Melbourne. If you register your myki, you can view the usage history online. Unfortunately, you are limited to paging through HTML, or downloading a PDF.

This post will show you how to get your myki history into a CSV file on a GNU/Linux computer, so that you can analyse it with your favourite spreadsheet/database program.

Get your data as PDFs

Firstly, you need to register your myki, log in, and export your history. The web interface seemed to give you the right data if you chose blocks of 1 month.

Export myki data for each month

Once you do this, organise these into a folder filled with statements.

A folder filled with myki statements

You need the pdftotext utility to go on. In debian, this is in the poppler-utils package.

The manual steps below run you through how to extract the data, and at the bottom of the screen there are some scripts I’ve put together to do this automatically.

Manual steps to extract your data

These steps are basically a crash course in "scraping" PDF files.

To convert all of the PDF’s to text, run:

for i in *.pdf; do pdftotext -layout -nopgbrk $i; done

This preserves the line-based layout. The next step is to filter out the lines which don’t contain data. Each line we’re interested in begins with a date, followed by the word “Touch On”, “Touch Off”, or “Top Up”

18/08/2013 13:41:20   T...

We can filter all of the text files using grep, and a regex to match this:

cat *.txt | grep "^[0-3][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] *T"

The output looks like:
Filtered output, showing data

So what are we looking at?

  1. One row per line
  2. Fields delimited by multiple spaces

To collapse every double-space into a tab, we use unexpand. Then, to collapse duplicate tabs, we use tr:

cat filtered-data.txt | unexpand -t 2 | tr -s '\t'

Finally, some fields need to be quoted, and tabs need to be converted to CSV. The PHP script below will do that step.

Scripts to get your data

myki2csv.sh is a script which performs the above manual steps:

#!/bin/bash
# Convert myki history from PDF to CSV
#	(c) Michael Billington < michael.billington@gmail.com >
#	MIT Licence
hash pdftotext || exit 1
hash unexpand || exit 1
pdftotext -layout -nopgbrk $1 - | \
	grep "^[0-3][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] *T" | \
	unexpand -t2 | \
	tr -s '\t' | \
	./tab2csv.php > ${1%.pdf}.csv

tab2csv.php is called at the end of the above script, to turn the result into a well-formed CSV file:

#!/usr/bin/env php
<?php
/* Generate well-formed CSV from dodgy tab-delimitted data
	(c) Michael Billington < michael.billington@gmail.com >
	MIT Licence */
$in = fopen("php://stdin", "r");
$out = fopen("php://stdout", "w");
while($line = fgets($in)) {
	$a = explode("\t", $line);
	foreach($a as $key => $value) {
		$a[$key]=trim($value);
		/* Quote out ",", and escape "" */
		if(!(strpos($value, "\"") === false &&
				strpos($value, ",") === false)) {
			$a[$key] = "\"".str_replace("\"", "\"\"", $a[$key])."\"";
		}
	}
	$line = implode(",", $a) . "\r\n";
	fwrite($out, $line);
}

Invocation

Call script on a single foo.pdf to get foo.csv:

./myki2csv.sh foo.pdf

Convert all PDF’s to CSV and then join them:

for i in *.pdf; do ./myki2csv.sh $i; done
tac *.csv > my-myki-data.csv

Importing into LibreOffice

The first field must be marked as a DD/MM/YYYY date, and the “zones” need to be marked as text (so that “1/2” isn’t treated as a fraction!)

These are my import settings:

Options to import the myki data into LibreOffice

Happy data analysis!

Update 2013-09-18: The -nopgbrk option was added to the above instructions, to prevent page break characters causing grep to skip one valid line per page

Update 2014-05-04: The code for the above, as well as this follow-up post are now available on github.

Backing up from a hosting provider

Backups are great, and they’re not rocket science. I’m writing up how we do backups, not because I think it’s a cool or unique setup (because it’s not), but to highlight how effective a simple solution can be.

We use rsync to take a local copy of whatever is on our web host without wasting bandwidth downloading files that aren’t needed. The layout looks like this:

Our hosting provider is accessible via ssh, and the backup box we use is a Raspberry Pi model B, costing (more or less) 50 AUD to get running.

On the server

On the server, we back up databases with mysqldump. To do this, you need to enter user details into a .my.cnf file, and then something like this will do the trick:

#!/bin/sh
# Remove old dump
rm -f database.sql.gz

# Dump and compress database
mysqldump -h sql.example.com --all-databases > database.sql
gzip database.sql

The above script is called database-dump.sh, and is called from the backup box, to dump the databases to a file before grabbing all the files.

On the backup box

First, a script to get the files. You should use password-less login with ssh-copy-id for this to work non-interactively:

#!/bin/sh
# Update the database dump
ssh user@host.example.com './database-dump.sh'
# Get files
rsync -avz --delete-during user@host.example.com:/home/user .

We save a copy of the files at this date in a dated archive, so we can back-date to find deleted things. At the end of the above script:

mkdir -p archive
now=$(date +"%Y-%m-%d")
tar -czf archive/backup-$now.tar.gz user

There aren’t a huge number of changes to record daily, so we got cron to run the above script weekly on the backup box. Read man crontab for how to do this.

What backup is not

If you think you shouldn’t be doing backups, you’re wrong. The following are not good excuses:

  1. Trust — Whoever is looking after the data wont lose it.
    Our host is pretty good, but their terms of service say they wont be responsible for any data loss. Even providers which have support agreements can make mistakes. You’ll also be able to work faster if you’re not paranoid about any mistake being unrecoverable.
  2. Expense — It’s a nice idea but not worth it.
    It’s dirt cheap, you can learn to do it yourself, and once set up requires virtually no administration. If your organisation can’t afford some kind of backup solution, then it should probably stop using data in any form.
  3. RAID — I invested money in RAID, so I don’t need backups.
    If you accidentally delete something, or notice that some your files have been tampered with, then RAID will not help you. If there is a problem (eg. fire) at the hosting location, then you will be in trouble regardless of disk redundancy.

Debian & XFCE quirks on Toshiba NB550D

Today I re-installed Debian wheezy on my Toshiba netbook and realised how useful it might be to collate the hurdles into one tidy reference blog post (to save looking everything up next time).

This just covers everything I had to configure or work around to get a working setup.

Install & hardware issues

From linux, use dd to write your disk image onto a flash drive:

dd if=debian-wheezy-DI-rc1-amd64-netinst.iso of=/dev/sdX bs=4M

If you don’t know your flash drive device, then locate ‘Disk Utility’ or use sudo fdisk -l and choose the likely candidate.

Now boot up the netbook. If you’ve disabled the splash screen, then F12 will get you a boot menu and F2 will let you enable USB booting (if you don’t see the flash drive).

The installer gives you a warning about needing non-free firmware. You can safely ignore this, it’s just bluetooth.

When you get the option to, Install openssh. You will have graphics issues later, and your computer will be next to useless if you don’t have some way to log in.

Follow the installer as usual, and boot into the new system.

Graphics

From GNOME, everything initially worked okay out of the box for me, but logging out would predictably corrupt the graphics like so:

Pro-tip(TM): Write down your IP address before this happens and follow the rest of the steps via ssh.

These steps on the Debian wiki suggest getting xserver-xorg-video-radeon and xserver-xorg-video-ati, but they are already installed (and xserver-xorg-video-radeonhd does not appear to exist in wheezy). The free firmware also didn’t work for me:

root@mikebook:~# apt-get install firmware-linux-free

So it looks like we need firmware-linux-nonfree, which means we need to allow non-free packages. Edit the end of each line in /etc/apt/sources.list to add contrib and non-free:

After this, update your package list, install the non-free firmware, and restart X (rebooting is shown here but not really necessary):

apt-get update
apt-get install firmware-linux-nonfree
reboot

Next time you log in, GNOME will report that it is running in full-bloat mode, which is a good sign. If you still have issues, then the output of lspci is what you need to google:

02:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI RV710 [Radeon HD 4350] (prog-if 00 [VGA controller])

Touchpad / two-finger scroll

The Laptop’s Synaptics touchpad will work just fine on the default settings. I only wrote this up because the version of XFCE in wheezy has no options for tapping, two-finger scroll, or other fancy things (unlike the screenshots on xfce.org).

GNOME will let you set up per-user mouse preferences, but these don’t affect gdm (the login screen), so you can’t tap the login buttons (how annoying!)

The solution is to configure the mouse using Xorg’s configuration. The Debian wiki page on the topic gives an example file to dump in /etc/X11/xorg.conf.d/. I swapped two values to get two-finger right-click:

Section "InputClass"
        Identifier      "Touchpad"                      # required
        MatchIsTouchpad "yes"                           # required
        Driver          "synaptics"                     # required
        Option          "MinSpeed"              "0.5"
        Option          "MaxSpeed"              "1.0"
        Option          "AccelFactor"           "0.075"
        Option          "TapButton1"            "1"
        Option          "TapButton2"            "3"     # multitouch
        Option          "TapButton3"            "2"     # multitouch
        Option          "VertTwoFingerScroll"   "1"     # multitouch
        Option          "HorizTwoFingerScroll"  "1"     # multitouch
        Option          "VertEdgeScroll"        "1"
        Option          "CoastingSpeed"         "8"
        Option          "CornerCoasting"        "1"
        Option          "CircularScrolling"     "1"
        Option          "CircScrollTrigger"     "7"
        Option          "EdgeMotionUseAlways"   "1"
        Option          "LBCornerButton"        "8"     # browser "back" btn
        Option          "RBCornerButton"        "9"     # browser "forward" btn
EndSection

After saving this file, reboot or run killall Xorg as root.

Sleep

I haven’t investigated this properly, but I would steer clear of suspend-to-RAM, and set your power settings to hibernate (ie suspend-to-disk) instead. This is one error you might get on wake (if you are lucky enough to get a display after it wakes):

This is not a kernel thing, as a no-X install can pm-suspend without issue.

Sudo

By default, the user you create during the Debian setup is not in the sudo group. To change this:

su
adduser joebloggs sudo

You need to log out then again for this to affect your session.

Making XFCE more useable

XFCE was my desktop of choice, so at this point, you can either stop reading or run this:

apt-get install xfce4 xfce4-goodies

Appearance

Because it runs GTK-2 and not GTK-3, GNOME apps will look ugly beside XFCE apps if you don’t choose settings which work well for both toolkits. I chose these ones but there are other good combinations:

  • Window Manager -> Theme: Default-4.6
  • Appearance -> Style: Anquita

If you use it, then you should open gnome-terminal now. It defaults to black-on-black under XFCE, which you will want to swap out for something less stupid.

Replacing Thunar with Nautilus

Thunar is great, but Nautilus is more familiar to me, and can easily be set up as the preferred file browser:

  • Preferred Applications -> Utilities -> File Manager: Nautilus

Thunar will hold onto your desktop unless you remove it from Session and Startup (tab over to ‘Session’ and delete xfdesktop)

To tell Nautilus to handle the desktop, install gnome-tweak-tool, and check the box labelled ‘Have file manager handle the desktop’. Next time you start Nautilus, it will give you a working desktop.

Disable screensavers

XFCE has some very cool screensavers, but personally I think this part of desktop computing is a bit last-century:

  • Settings -> Screensaver: Blank screen only

The program XScreenSaver itself is a bit of an eyesore. If you don’t like it, this forum post has some suggestions for alternatives.

Getting a calendar

The default clock on the panel is not clickable. Simply remove it and add the ‘DateTime’ widget — This can show a clock with a drop-down calendar, which is basically standard.

Getting ‘Print Screen’ to work

XFCE makes this super easy to set up (once you turn up this thread on google):

  • Settings manager -> Keyboard -> Application shortcuts

Add a new shortcut to this command:

xfce4-screenshooter -f

Then hit Print, and you should get this:

External monitor

When you use an external monitor and switch off the laptop display, you can get stuck without a screen if you pull out the cable! The XFCE screen-switching app (mapped to Fn-F5 on my keyboard) is not really navigable by keyboard, so I added this shortcut as well:

The command xrandr --auto will switch on any connected monitor with a sane default resolution, fixing your display without rebooting.

Update 2013-03-15: I changed this to Shift+Alt+F5, because some programs use the above shortcut, rendering it useless when said programs have focus.

Pyrocket and Ubuntu

I have a great USB rocket launcher, it’s more useful than a computer mouse most of the time actually. I spotted a moth on the roof the other day, and hadn’t installed pyrocket on this computer yet.

A quick apt-get install pyrocket is all it takes to solve that though, right? No such luck.

Apparently, the quality control in the Ubuntu repos are such that this package has been broken for several months now, despite the dependency issue being fixed by an upstream fork.

So this is the error you get at the moment anyway:

Traceback (most recent call last):
  File "/usr/bin/pyrocket", line 17, in 
    from rocket_frontend import RocketWindow
  File "/usr/lib/pymodules/python2.7/rocket_frontend.py", line 11, in <module>
    from rocket_webcam import VideoWindow
  File "/usr/lib/pymodules/python2.7/rocket_webcam.py", line 2, in <module>
    from opencv import cv, highgui
ImportError: No module named opencv

The solution, beyond complaining about it, is to read the bug report here and do this:

git clone https://github.com/stadler/pyrocket
cd pyrocket/src
./pyrocket.py

But the moth had escaped by then.

Strange problems from corrupt files

I’m upgrading from Firefox 0.x to 2.x on one of my computers. Decompressing the downloaded file seemed to be taking far too long, so I took a look at what was up. Somehow, it seems that the incomplete download of 2.4MB (the whole thing should be around 9MB) managed to cause gzip to write out 521MB of data before I stopped it.

The moral of the story? Always check the MD5 sums.