Making an XKCD-style password generator in C++

I’m learning C++ at the moment, and I don’t find long tutorials or studying the standard template library particularly fun.

Making this type of password-generator is not new, but it is a nice practical exercise to start out in any language.

1. Get a list of common English words

Googling “common English words” yielded this list, purporting to contain 5,000 words. Unfortunately it contains almost 1,000 duplicates and numerous non-words! Wiktionary has a much higher-quality list of words compiled from Project Gutenberg, but the markup looks a bit like this:

==== 1 - 1000 ====
===== 1 - 100 =====
[[the]] = 56271872
[[of]] = 33950064
[[and]] = 29944184
[[to]] = 25956096
[[in]] = 17420636
[[I]] = 11764797  

Noting the wikilinks surrounding each word, I put together this PHP script to extract the link destinations and called it get-wikilinks.php:

#!/usr/bin/php
<?php
/* Return list of wikilinked words from input text */
$text = explode("[[", file_get_contents("php://stdin"));
foreach($text as $link) {
	$rbrace = strpos($link, "]]");
	if(!$rbrace === false) {
		/* Also escape on [[foo|bar]] links */
		$pipe = strpos($link, "|");
		if(!$pipe === false && $pipe < $rbrace) {
			$rbrace = $pipe;
		}
		$word = trim(substr($link, 0, $rbrace))."n";
		if(strpos($word, "'") === false && !is_numeric(substr($word, 0, 1))) {
			/* Leave out words with apostrophes or starting with numbers */
			echo $word;
		}
	}
}

The output of this script is much more workable:

$ chmod +x get-wikilinks.php
$ cat wikt.txt | ./get-wikilinks.php
the
of
and
to
in
I

Using sort and uniq makes a top-notch list of common words, ready for an app to digest:

$ cat wikt.txt | ./get-wikilinks.php | sort | uniq > wordlist.txt

2. Write some C++

There are two problems being solved here:

  • Reading a file into memory
    • An ifstream is used to access the file, and getline() will return false when EOF has been reached
    • Each line is loaded into a vector (roughly the same type of container as an ArrayList in Java), which is resized dynamically and accessed like an array.
  • Choosing random numbers
    • These are seeded from a random_device, being more cross-platform than reading from a file like /dev/urandom.
    • Note that random is new to C++11.
pw.cpp
#include <fstream>
#include <vector>
#include <string>
#include <iostream>
#include <random>
#include <cstdlib>

using namespace std;

int main(int argc, char* argv[]) {
    const char* fname = "wordlist.txt";

    /* Parse command-line arguments */
    int max = 1;
    if(argc == 2) {
        max = atoi(argv[1]);
    }

    /* Open word list file */
    ifstream input;
    input.open(fname);
    if(input.fail()) {
        cerr << "ERROR: Failed to open " << fname << endl;
    }

    /* Read to end and load words */
    vector<string> wordList;
    string line;
    while(getline(input, line)) {
        wordList.push_back(line);
    }

    /* Seed from random device */
    random_device rd;
    default_random_engine gen;
    gen.seed(rd());
    uniform_int_distribution<int> dist(0, wordList.size() - 1);

    /* Output as many passwords as required */
    const int pwLen = 4;
    int wordId, i, j;
    for(i = 0; i < max; i++) {
        for(j = 0; j < pwLen; j++) {
            cout << wordList[dist(gen)] << ((j != pwLen - 1) ? " " : "");
        }
        cout << endl;
    }

    return 0;
}

3. Compile

Lots of projects in compiled languages have a Makefile, so that you can compile them without having to type all the compiler options manually.

Makefiles are a bit heavy to learn properly, but for a project this tiny, something simple is fine:

default:
	g++ pw.cpp -o pw -std=c++11

clean:
	rm -f pw

Now we can compile and run the generator:

make
./pw

The output looks like this for ./pw 30 ("generate 30 passwords"):