ansible

Ansible is an automation system which is widely used for deployments and configuration. It contains a dizzying array of modules for interfacing with things like files, services, package managers, and various pieces of software and hardware.

Background

There are some modules which I would expect every Ansible user to be familiar with, while others are barely used at all. I could have a guess about which modules I use most, but what about the wider community?

I couldn’t find any existing data, so I set out to determine which Ansible modules were most widely-used. The best source for this is Ansible Galaxy, which is a directory of around 18,000 Ansible roles. Using some scripts, I sifted through the 18,259 publicly accessible roles from Ansible Galaxy, found 71,610 files to look through, and tallied up which modules were being used by each of the 317,757 tasks in those files.

The list

These are the top 100 modules as of January 2019. I used the same method to build such a list in January 2018, so you can also see the change in popularity of some modules over the last 12 months.

Rank	Module	Uses	Rank +/-
1	file	26,224	+1
2	include	25,267	-1
3	template	24,062	–
4	command	23,952	–
5	service	19,436	–
6	shell	19,401	–
7	set_fact	17,079	+1
8	apt	16,006	+1
9	lineinfile	12,432	-2
10	copy	8,794	+1
11	yum	8,733	-1
12	assert	7,504	+7
13	include_tasks	6,686	+15
14	stat	5,922	-2
15	package	5,422	-1
16	get_url	5,323	-3
17	debug	5,147	-2
18	import_tasks	5,016	+20
19	include_vars	4,251	-2
20	apt_repository	3,789	-4
21	user	3,233	-3
22	fail	2,812	+2
23	unarchive	2,734	-2
24	apt_key	2,665	-4
25	pip	2,474	–
26	systemd	2,389	+8
27	action	2,114	-4
28	git	2,043	-1
29	uri	1,680	+7
30	group	1,665	-1
31	sysctl	1,331	-5
32	raw	1,323	-2
33	mysql_user	1,266	–
34	meta	1,204	+3
35	replace	1,125	-3
36	ini_file	1,125	-5
37	find	1,030	-15
38	local_action	986	-3
39	mysql_db	959	–
40	cron	925	–
41	wait_for	898	–
42	rpm_key	771	–
43	include_role	742	+15
44	yum_repository	723	+2
45	mount	669	-2
46	blockinfile	619	+3
47	firewalld	579	+1
48	ufw	563	-1
49	authorized_key	557	-4
50	docker_container	488	+2
51	dnf	455	+5
52	seboolean	417	-8
53	homebrew	409	-2
54	fetch	378	+1
55	npm	367	-5
56	osx_defaults	363	+8
57	postgresql_user	350	-4
58	pkgng	345	-4
59	pause	340	+4
60	script	314	+5
61	setup	290	+7
62	postgresql_db	289	-2
63	mysql_replication	286	-1
64	win_regedit	281	+6
65	pacman	257	-4
66	debconf	256	+1
67	slurp	255	+10
68	gem	253	-11
69	iptables	240	+13
70	apache2_module	231	-4
71	synchronize	213	-2
72	docker	213	-13
73	alternatives	210	–
74	selinux	202	–
75	oc_obj	199	+98
76	make	194	-1
77	win_shell	191	+13
78	modprobe	187	+11
79	hostname	181	-7
80	zypper	174	-4
81	xml	160	–
82	supervisorctl	160	-11
83	win_file	149	+15
84	homebrew_cask	149	-6
85	add_host	144	-2
86	rabbitmq_user	129	-7
87	pamd	124	+81
88	win_command	116	+13
89	assemble	116	-9
90	htpasswd	115	-6
91	apk	112	-5
92	openbsd_pkg	111	-7
93	win_get_url	109	+14
94	win_chocolatey	109	–
95	docker_image	109	+4
96	tempfile	106	+25
97	locale_gen	105	-5
98	easy_install	97	-10
99	django_manage	97	-3
100	composer	96	-3

Out of over 2,500 modules mentioned in Ansible Galaxy, it turns out that only a few dozen are widely used. Following this, there is a long tail of infrequently-used and custom modules.

How many modules do I need to know?

You can write 80% of the roles in Ansible Galaxy using only the 74 most popular Ansible modules.

For writing an individual Ansible role, you don’t need anywhere near this many. The median number of modules used by an role in Ansible Galaxy is just 6.

There are a few reasons that you won’t need to use many popular modules all at the same time:

Different technologies: eg. mysql_db and postgresql_db set up different databases, and would be invoked by different roles if you need both.
Obsolete modules: eg. docker has been split up, and new code will use docker_container instead.
Different levels of abstraction: eg. cross-platform developers will use the package module, while develpers who know their platform may directly use the apt, dnf and yum modules. You are unlikely to see these mixed.
Different styles: include_tasks vs. import_tasks.

Some notes

Here are some things to keep in mind when reading the table.

I skipped any role that was not hosted in a public GitHub repository, or could not be parsed. A php-yaml bug also kept a few files out of the 2019 tally.
Each task was counted only once, so local_action and action are obscuring which modules were eventually executed.
The OpenShift oc_obj module on this list is not bundled with Ansible (the module itself is distributed through Galaxy).
I did not attempt to exclude abandoned or incomplete packages. Eg. The docker module on this list has been removed from the latest version, but remains popular in existing Ansible Galaxy roles.

LXC is a Linux container technology that I use for both development and production setups hosted on Debian.

This type of container acts a lot like a lightweight virtual machine, and can be administered with standard linux tools. When configured over SSH, you should be able to use the same scripts against either an LXC container or VM without noticing the difference.

This setup will provision “privileged” containers behind a NAT, which is a setup that is most useful for a developer workstation. A setup in a server rack would be more likely to use “unprivileged” containers on a host bridge, which is slightly more complex to set up. The good news is that the guest container will behave very similarly once it’s provisioned, so developers shouldn’t need to adapt their code to those details either.

Manual setup of an LXC container

You need to know how to do something manually before you can automate it.

The best reference guide for this is the current Debian documentation. This is a shorter version of those instructions, with only the parts we need.

Packages

Everything you need for LXC is in the lxc Debian package:

$ sudo apt-get install lxc
...
The following additional packages will be installed:
  bridge-utils debootstrap liblxc1 libpam-cgfs lxcfs python3-lxc uidmap
Suggested packages:
  btrfs-progs lvm2
The following NEW packages will be installed:
  bridge-utils debootstrap liblxc1 libpam-cgfs lxc lxcfs python3-lxc uidmap
0 upgraded, 8 newly installed, 0 to remove and 0 not upgraded.
Need to get 1,367 kB of archives.
After this operation, 3,762 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
...

Network

Enable the LXC bridge, and start it up:

echo 'USE_LXC_BRIDGE="true"' | sudo tee -a /etc/default/lxc-net

$ sudo systemctl start lxc-net

This gives you an internal network for your containers to connect to. From there, they can connect out to the Internet, or communicate with each-other:

$ ip addr show
...
3: lxcbr0:  mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:16:3e:00:00:00 brd ff:ff:ff:ff:ff:ff
    inet 10.0.3.1/24 scope global lxcbr0
       valid_lft forever preferred_lft forever

Defaults

Instruct LXC to attach a NIC to this network each time you make a containers:

$ sudo vi /etc/lxc/default.conf

Replace that file with:

lxc.network.type = veth
lxc.network.link = lxcbr0
lxc.network.flags = up
lxc.network.hwaddr = 00:16:3e:xx:xx:xx

You can then create a ‘test1’ box, using the Debian image online. Note the output here indicates that the container has no SSH server or root password.

$ sudo lxc-create --name test1 --template=download -- --dist=debian --release=stretch --arch=amd64
Setting up the GPG keyring
Downloading the image index
Downloading the rootfs
Downloading the metadata
The image cache is now ready
Unpacking the rootfs

---
You just created a Debian container (release=stretch, arch=amd64, variant=default)

To enable sshd, run: apt-get install openssh-server

For security reason, container images ship without user accounts
and without a root password.

Use lxc-attach or chroot directly into the rootfs to set a root password
or create user accounts.

The container is created in a stopped state. Start it up now:

$ sudo lxc-start --name test1

It now appears with an automatically assigned IP.

$ sudo lxc-ls --fancy
NAME  STATE   AUTOSTART GROUPS IPV4       IPV6 
test1 RUNNING 0         -      10.0.3.250 -

Set up login access

Start by getting your SSH public key ready. You can locate at ~/.ssh/id_rsa.pub. You can use ssh-keygen to create this if it doesn’t exist.

To SSH in, you need to install an SSH server, and get this public key into the /root/authorized_keys file in the container.

$ sudo lxc-attach --name test1
root@test1:/# apt-get update
root@test1:/# apt-get -y install openssh-server
root@test1:/# mkdir -p ~/.ssh
root@test1:/# echo "ssh-rsa (public key) user@host" >> ~/.ssh/authorized_keys

Type exit or press Ctrl+D to quit, and try to log in from your regular account over SSH:

$ ssh root@10.0.3.250
The authenticity of host '10.0.3.250 (10.0.3.250)' can't be established.
ECDSA key fingerprint is SHA256:EWH1zUW4BEZUzfkrFL1K+24gTzpd8q8JRVc5grKaZfg.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.0.3.250' (ECDSA) to the list of known hosts.
Linux test1 4.14.0-3-amd64 #1 SMP Debian 4.14.13-1 (2018-01-14) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
root@test1:~#

Any you’re in. You may be surprised how minimal the LXC images are by default, but the full power of Debian is available from apt-get.

This container is not configured to start on boot. For that, you would add this line to /var/lib/lxc/test1/config:

lxc.start.auto = 1

Teardown

To stop the test1 container and then delete it permanently, run:

sudo lxc-stop --name test1
sudo lxc-destroy --name test1

Automated setup of LXC containers with Ansible

Now that the basic steps have been done manually, I’ll show you how to Ansible to create a set of LXC containers. If you haven’t used it before, Ansible is an automation tool for managing computers. At its heart, it just logs into machines and runs things. These scripts are an approximate automation of the steps above, so that you can create 10 or 100 containers at once if you need to.

I use this method on a small project that I maintain on GitHub called ansible-live, which bootstraps a containerized training environment for Ansible.

Host setup

You need a few packages and config files on the host. In addition to the lxc package, we need lxc-dev and the lxc-python2 python package to manage the containers from Ansible:

- hosts: localhost
  connection: local
  become: true
  vars:
  - interface: lxcbr0

  tasks:
  - name: apt lxc packages are installed on host
    apt: name={{ item }}
    with_items:
    - lxc
    - lxc-dev
    - python-pip

  - copy:
      dest: /etc/default/lxc-net
      content: |
        USE_LXC_BRIDGE="true"

  - copy:
      dest: /etc/lxc/default.conf
      content: |
        lxc.network.type = veth
        lxc.network.link = {{ interface }}
        lxc.network.flags = up
        lxc.network.hwaddr = 00:16:3e:xx:xx:xx

  - service:
      name: lxc-net
      state: started

  - name: pip lxc packages are installed on host
    pip:
      name: "{{ item }}"
    with_items:
    - lxc-python2
    run_once: true

This can be executed with this command:

ansible-playbook setup.yml --ask-become-pass --diff

Container creation

Add a file called inventory to specify the containers to use. These are two IP addresses in the range of the LXC network.

deb1 ansible_host=10.0.3.100
deb2 ansible_host=10.0.3.101

For local work, I find it easier to set an IP address with Ansible and use the /etc/hosts file, which is why IP addresses are included here. Without it, you need to wait for each container to boot, then detect its IP address before you can log in.

Add this to setup.yml

- hosts: all
  connection: local
  become: true
  vars:
  - interface: lxcbr0
  tasks:
  - name: Load in local SSH key path
    set_fact:
      my_ssh_key: "{{ lookup('env','HOME') }}/.ssh/id_rsa.pub"

  - name: interface device exists
    command: ip addr show {{ interface }}
    changed_when: false
    run_once: true

  - name: Local user has an SSH key
    command: stat {{ my_ssh_key }}
    changed_when: false
    run_once: true

  - name: containers exist and have local SSH key
    delegate_to: localhost
    lxc_container:
      name: "{{ inventory_hostname }}"
      container_log: true
      template: debian
      state: started
      template_options: --release stretch
      container_config:
        - "lxc.network.type = veth"
        - "lxc.network.flags = up"
        - "lxc.network.link = {{ interface }}"
        - "lxc.network.ipv4 = {{ ansible_host }}/24"
        - "lxc.network.ipv4.gateway = auto"
      container_command: |
        if [ ! -d ~/.ssh ]; then
          mkdir ~/.ssh
          echo "{{ lookup('file', my_ssh_key) }}" | tee -a ~/.ssh/authorized_keys
          sed -i 's/dhcp/manual/' /etc/network/interfaces && systemctl restart network
        fi

In the next block of setup.yml, use keyscan to get the SSH keys of each machine as it becomes available.

- hosts: all
  connection: local
  become: false
  serial: 1
  tasks:
  - wait_for: host={{ ansible_host }} port=22

  - name: container key is up-to-date locally
    shell: ssh-keygen -R {{ ansible_host }}; (ssh-keyscan {{ ansible_host }} >> ~/.ssh/known_hosts)

Lastly, jump in via SSH and install python. This is required for any follow-up configuration that uses Ansible.

- hosts: all
  gather_facts: no
  vars:
  - ansible_user: root
  tasks:
  - name: install python on target machines
    raw: which python || (apt-get -y update && apt-get install -y python)

Next, you can execute the whole script to create the two containers.

ansible-playbook setup.yml --ask-become-pass --diff

Scaling to hundreds of containers

Now that you have created two containers, it is easy enough to see how you would make 20 containers by adding a new inventory:

for i in {1..20}; do echo deb$(printf "%03d" $i).example.com ansible_host=10.0.3.$((i+1)); done | tee inventory
deb001.example.com ansible_host=10.0.3.2
deb002.example.com ansible_host=10.0.3.3
deb003.example.com ansible_host=10.0.3.4
...

And then run the script again:

ansible-playbook -i inventory setup.yml --ask-become-pass

This produces 20 machines after a few minutes.

The processes running during this setup were mostly rync (copying the container contents), plus the network waiting to retrieve python many times. If you need to optimise to frequent container spin-ups, LXC supports
storage back-ends that have copy-on-write, and you can cache package installs with a local webserver, or build some packages into the template.

Running these 20 containers plus a Debian desktop, I found that my computer was using just 2.9GB of RAM, so I figured I would test 200 empty containers at once.

for i in {1..200}; do echo deb$(printf "%03d" $i).example.com ansible_host=10.0.3.$((i+1)); done > inventory
ansible -i inventory setup.yml

It took truly a very long time to add Python to each install, but the result is as you would expect:

$ sudo lxc-ls --fancy
NAME               STATE   AUTOSTART GROUPS IPV4       IPV6 
deb001.example.com RUNNING 0         -      10.0.3.2   -    
deb002.example.com RUNNING 0         -      10.0.3.3   -    
deb003.example.com RUNNING 0         -      10.0.3.4   -    
...
deb198.example.com RUNNING 0         -      10.0.3.199 -    
deb199.example.com RUNNING 0         -      10.0.3.200 -    
deb200.example.com RUNNING 0         -      10.0.3.201 -

The base resource usage of an idle container is absolutely tiny, around 13 megabytes — the system moved from 2.9GB to 5.4GB of RAM used when I added 180 containers. Containers clearly have a lower overhead than VM’s, since no RAM has been reserved here.

Software updates

The containers are updated just like regular VM’s-

apt-get update
apt-get dist-upgrade

Backups

In this setup, the container’s contents is stored under /var/lib/lxc/. As long as the container is stopped, you get at it safely with tar or rsync to make a full copy:

$ sudo tar -czf deb001.20180209.tar.gz /var/lib/lxc/deb001.example.com/
$ rsync -avz /var/lib/lxc/deb001.example.com/ remote-computer@example.com:/backups/deb001.example.com/

Full-machine snapshots are also available on the Ceph or LVM back-ends, if you use those.

Teardown

The same Ansible module can be used to delete all of these machines in a few seconds.

- hosts: all
  connection: local
  become: true
  tasks:
  - name: Containers do not exist
    delegate_to: localhost
    lxc_container:
      name: "{{ inventory_hostname }}"
      state: absent

ansible-playbook -i inventory teardown.yml --ask-become-pass

Conclusion

Hopefully this post has given you some insight into one way that Linux containers can be used. I have found LXC to be a great technology to work with for standalone setups, and regularly use the same scripts to configure either an LXC container or a VM’s depending on the target environment.

The low resource usage also means that I can run fairly complex setups on a laptop, where the overhead of large VM’s would be prohibitive.

I don’t think that LXC is directly comparable to full container ecosystems like Docker, since they are geared towards different use cases. These are both useful tools to know, and have complementary strengths.

Tag: ansible

The top 100 Ansible modules

Background

The list

How many modules do I need to know?

Some notes

Automating LXC container creation with Ansible

Manual setup of an LXC container

Packages

Network

Defaults

Set up login access

Teardown

Automated setup of LXC containers with Ansible

Host setup

Container creation

Scaling to hundreds of containers

Software updates

Backups

Teardown

Conclusion