The top 100 Ansible modules

Ansible is an automation system which is widely used for deployments and configuration. It contains a dizzying array of modules for interfacing with things like files, services, package managers, and various pieces of software and hardware.

Background

There are some modules which I would expect every Ansible user to be familiar with, while others are barely used at all. I could have a guess about which modules I use most, but what about the wider community?

I couldn’t find any existing data, so I set out to determine which Ansible modules were most widely-used. The best source for this is Ansible Galaxy, which is a directory of around 18,000 Ansible roles. Using some scripts, I sifted through the 18,259 publicly accessible roles from Ansible Galaxy, found 71,610 files to look through, and tallied up which modules were being used by each of the 317,757 tasks in those files.

The list

These are the top 100 modules as of January 2019. I used the same method to build such a list in January 2018, so you can also see the change in popularity of some modules over the last 12 months.

Rank Module Uses Rank +/-
1 file 26,224 +1
2 include 25,267 -1
3 template 24,062
4 command 23,952
5 service 19,436
6 shell 19,401
7 set_fact 17,079 +1
8 apt 16,006 +1
9 lineinfile 12,432 -2
10 copy 8,794 +1
11 yum 8,733 -1
12 assert 7,504 +7
13 include_tasks 6,686 +15
14 stat 5,922 -2
15 package 5,422 -1
16 get_url 5,323 -3
17 debug 5,147 -2
18 import_tasks 5,016 +20
19 include_vars 4,251 -2
20 apt_repository 3,789 -4
21 user 3,233 -3
22 fail 2,812 +2
23 unarchive 2,734 -2
24 apt_key 2,665 -4
25 pip 2,474
26 systemd 2,389 +8
27 action 2,114 -4
28 git 2,043 -1
29 uri 1,680 +7
30 group 1,665 -1
31 sysctl 1,331 -5
32 raw 1,323 -2
33 mysql_user 1,266
34 meta 1,204 +3
35 replace 1,125 -3
36 ini_file 1,125 -5
37 find 1,030 -15
38 local_action 986 -3
39 mysql_db 959
40 cron 925
41 wait_for 898
42 rpm_key 771
43 include_role 742 +15
44 yum_repository 723 +2
45 mount 669 -2
46 blockinfile 619 +3
47 firewalld 579 +1
48 ufw 563 -1
49 authorized_key 557 -4
50 docker_container 488 +2
51 dnf 455 +5
52 seboolean 417 -8
53 homebrew 409 -2
54 fetch 378 +1
55 npm 367 -5
56 osx_defaults 363 +8
57 postgresql_user 350 -4
58 pkgng 345 -4
59 pause 340 +4
60 script 314 +5
61 setup 290 +7
62 postgresql_db 289 -2
63 mysql_replication 286 -1
64 win_regedit 281 +6
65 pacman 257 -4
66 debconf 256 +1
67 slurp 255 +10
68 gem 253 -11
69 iptables 240 +13
70 apache2_module 231 -4
71 synchronize 213 -2
72 docker 213 -13
73 alternatives 210
74 selinux 202
75 oc_obj 199 +98
76 make 194 -1
77 win_shell 191 +13
78 modprobe 187 +11
79 hostname 181 -7
80 zypper 174 -4
81 xml 160
82 supervisorctl 160 -11
83 win_file 149 +15
84 homebrew_cask 149 -6
85 add_host 144 -2
86 rabbitmq_user 129 -7
87 pamd 124 +81
88 win_command 116 +13
89 assemble 116 -9
90 htpasswd 115 -6
91 apk 112 -5
92 openbsd_pkg 111 -7
93 win_get_url 109 +14
94 win_chocolatey 109
95 docker_image 109 +4
96 tempfile 106 +25
97 locale_gen 105 -5
98 easy_install 97 -10
99 django_manage 97 -3
100 composer 96 -3

Out of over 2,500 modules mentioned in Ansible Galaxy, it turns out that only a few dozen are widely used. Following this, there is a long tail of infrequently-used and custom modules.

How many modules do I need to know?

You can write 80% of the roles in Ansible Galaxy using only the 74 most popular Ansible modules.

For writing an individual Ansible role, you don’t need anywhere near this many. The median number of modules used by an role in Ansible Galaxy is just 6.

There are a few reasons that you won’t need to use many popular modules all at the same time:

  • Different technologies: eg. mysql_db and postgresql_db set up different databases, and would be invoked by different roles if you need both.
  • Obsolete modules: eg. docker has been split up, and new code will use docker_container instead.
  • Different levels of abstraction: eg. cross-platform developers will use the package module, while develpers who know their platform may directly use the apt, dnf and yum modules. You are unlikely to see these mixed.
  • Different styles: include_tasks vs. import_tasks.

Some notes

Here are some things to keep in mind when reading the table.

  • I skipped any role that was not hosted in a public GitHub repository, or could not be parsed. A php-yaml bug also kept a few files out of the 2019 tally.
  • Each task was counted only once, so local_action and action are obscuring which modules were eventually executed.
  • The OpenShift oc_obj module on this list is not bundled with Ansible (the module itself is distributed through Galaxy).
  • I did not attempt to exclude abandoned or incomplete packages. Eg. The docker module on this list has been removed from the latest version, but remains popular in existing Ansible Galaxy roles.

Automating LXC container creation with Ansible

LXC is a Linux container technology that I use for both development and production setups hosted on Debian.

This type of container acts a lot like a lightweight virtual machine, and can be administered with standard linux tools. When configured over SSH, you should be able to use the same scripts against either an LXC container or VM without noticing the difference.

This setup will provision “privileged” containers behind a NAT, which is a setup that is most useful for a developer workstation. A setup in a server rack would be more likely to use “unprivileged” containers on a host bridge, which is slightly more complex to set up. The good news is that the guest container will behave very similarly once it’s provisioned, so developers shouldn’t need to adapt their code to those details either.

Manual setup of an LXC container

You need to know how to do something manually before you can automate it.

The best reference guide for this is the current Debian documentation. This is a shorter version of those instructions, with only the parts we need.

Packages

Everything you need for LXC is in the lxc Debian package:

$ sudo apt-get install lxc
...
The following additional packages will be installed:
  bridge-utils debootstrap liblxc1 libpam-cgfs lxcfs python3-lxc uidmap
Suggested packages:
  btrfs-progs lvm2
The following NEW packages will be installed:
  bridge-utils debootstrap liblxc1 libpam-cgfs lxc lxcfs python3-lxc uidmap
0 upgraded, 8 newly installed, 0 to remove and 0 not upgraded.
Need to get 1,367 kB of archives.
After this operation, 3,762 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
...

Network

Enable the LXC bridge, and start it up:

echo 'USE_LXC_BRIDGE="true"' | sudo tee -a /etc/default/lxc-net
$ sudo systemctl start lxc-net

This gives you an internal network for your containers to connect to. From there, they can connect out to the Internet, or communicate with each-other:

$ ip addr show
...
3: lxcbr0:  mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:16:3e:00:00:00 brd ff:ff:ff:ff:ff:ff
    inet 10.0.3.1/24 scope global lxcbr0
       valid_lft forever preferred_lft forever

Defaults

Instruct LXC to attach a NIC to this network each time you make a containers:

$ sudo vi /etc/lxc/default.conf

Replace that file with:

lxc.network.type = veth
lxc.network.link = lxcbr0
lxc.network.flags = up
lxc.network.hwaddr = 00:16:3e:xx:xx:xx

You can then create a ‘test1’ box, using the Debian image online. Note the output here indicates that the container has no SSH server or root password.

$ sudo lxc-create --name test1 --template=download -- --dist=debian --release=stretch --arch=amd64
Setting up the GPG keyring
Downloading the image index
Downloading the rootfs
Downloading the metadata
The image cache is now ready
Unpacking the rootfs

---
You just created a Debian container (release=stretch, arch=amd64, variant=default)

To enable sshd, run: apt-get install openssh-server

For security reason, container images ship without user accounts
and without a root password.

Use lxc-attach or chroot directly into the rootfs to set a root password
or create user accounts.

The container is created in a stopped state. Start it up now:

$ sudo lxc-start --name test1 

It now appears with an automatically assigned IP.

$ sudo lxc-ls --fancy
NAME  STATE   AUTOSTART GROUPS IPV4       IPV6 
test1 RUNNING 0         -      10.0.3.250 -    

Set up login access

Start by getting your SSH public key ready. You can locate at ~/.ssh/id_rsa.pub. You can use ssh-keygen to create this if it doesn’t exist.

To SSH in, you need to install an SSH server, and get this public key into the /root/authorized_keys file in the container.

$ sudo lxc-attach --name test1
root@test1:/# apt-get update
root@test1:/# apt-get -y install openssh-server
root@test1:/# mkdir -p ~/.ssh
root@test1:/# echo "ssh-rsa (public key) user@host" >> ~/.ssh/authorized_keys

Type exit or press Ctrl+D to quit, and try to log in from your regular account over SSH:

$ ssh root@10.0.3.250
The authenticity of host '10.0.3.250 (10.0.3.250)' can't be established.
ECDSA key fingerprint is SHA256:EWH1zUW4BEZUzfkrFL1K+24gTzpd8q8JRVc5grKaZfg.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.0.3.250' (ECDSA) to the list of known hosts.
Linux test1 4.14.0-3-amd64 #1 SMP Debian 4.14.13-1 (2018-01-14) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
root@test1:~# 

Any you’re in. You may be surprised how minimal the LXC images are by default, but the full power of Debian is available from apt-get.

This container is not configured to start on boot. For that, you would add this line to /var/lib/lxc/test1/config:

lxc.start.auto = 1

Teardown

To stop the test1 container and then delete it permanently, run:

sudo lxc-stop --name test1
sudo lxc-destroy --name test1

Automated setup of LXC containers with Ansible

Now that the basic steps have been done manually, I’ll show you how to Ansible to create a set of LXC containers. If you haven’t used it before, Ansible is an automation tool for managing computers. At its heart, it just logs into machines and runs things. These scripts are an approximate automation of the steps above, so that you can create 10 or 100 containers at once if you need to.

I use this method on a small project that I maintain on GitHub called ansible-live, which bootstraps a containerized training environment for Ansible.

Host setup

You need a few packages and config files on the host. In addition to the lxc package, we need lxc-dev and the lxc-python2 python package to manage the containers from Ansible:

- hosts: localhost
  connection: local
  become: true
  vars:
  - interface: lxcbr0

  tasks:
  - name: apt lxc packages are installed on host
    apt: name={{ item }}
    with_items:
    - lxc
    - lxc-dev
    - python-pip

  - copy:
      dest: /etc/default/lxc-net
      content: |
        USE_LXC_BRIDGE="true"

  - copy:
      dest: /etc/lxc/default.conf
      content: |
        lxc.network.type = veth
        lxc.network.link = {{ interface }}
        lxc.network.flags = up
        lxc.network.hwaddr = 00:16:3e:xx:xx:xx

  - service:
      name: lxc-net
      state: started

  - name: pip lxc packages are installed on host
    pip:
      name: "{{ item }}"
    with_items:
    - lxc-python2
    run_once: true

This can be executed with this command:

ansible-playbook setup.yml --ask-become-pass --diff

Container creation

Add a file called inventory to specify the containers to use. These are two IP addresses in the range of the LXC network.

deb1 ansible_host=10.0.3.100
deb2 ansible_host=10.0.3.101

For local work, I find it easier to set an IP address with Ansible and use the /etc/hosts file, which is why IP addresses are included here. Without it, you need to wait for each container to boot, then detect its IP address before you can log in.

Add this to setup.yml

- hosts: all
  connection: local
  become: true
  vars:
  - interface: lxcbr0
  tasks:
  - name: Load in local SSH key path
    set_fact:
      my_ssh_key: "{{ lookup('env','HOME') }}/.ssh/id_rsa.pub"

  - name: interface device exists
    command: ip addr show {{ interface }}
    changed_when: false
    run_once: true

  - name: Local user has an SSH key
    command: stat {{ my_ssh_key }}
    changed_when: false
    run_once: true

  - name: containers exist and have local SSH key
    delegate_to: localhost
    lxc_container:
      name: "{{ inventory_hostname }}"
      container_log: true
      template: debian
      state: started
      template_options: --release stretch
      container_config:
        - "lxc.network.type = veth"
        - "lxc.network.flags = up"
        - "lxc.network.link = {{ interface }}"
        - "lxc.network.ipv4 = {{ ansible_host }}/24"
        - "lxc.network.ipv4.gateway = auto"
      container_command: |
        if [ ! -d ~/.ssh ]; then
          mkdir ~/.ssh
          echo "{{ lookup('file', my_ssh_key) }}" | tee -a ~/.ssh/authorized_keys
          sed -i 's/dhcp/manual/' /etc/network/interfaces && systemctl restart network
        fi

In the next block of setup.yml, use keyscan to get the SSH keys of each machine as it becomes available.

- hosts: all
  connection: local
  become: false
  serial: 1
  tasks:
  - wait_for: host={{ ansible_host }} port=22

  - name: container key is up-to-date locally
    shell: ssh-keygen -R {{ ansible_host }}; (ssh-keyscan {{ ansible_host }} >> ~/.ssh/known_hosts)

Lastly, jump in via SSH and install python. This is required for any follow-up configuration that uses Ansible.

- hosts: all
  gather_facts: no
  vars:
  - ansible_user: root
  tasks:
  - name: install python on target machines
    raw: which python || (apt-get -y update && apt-get install -y python)

Next, you can execute the whole script to create the two containers.

ansible-playbook setup.yml --ask-become-pass --diff

Scaling to hundreds of containers

Now that you have created two containers, it is easy enough to see how you would make 20 containers by adding a new inventory:

for i in {1..20}; do echo deb$(printf "%03d" $i).example.com ansible_host=10.0.3.$((i+1)); done | tee inventory
deb001.example.com ansible_host=10.0.3.2
deb002.example.com ansible_host=10.0.3.3
deb003.example.com ansible_host=10.0.3.4
...

And then run the script again:

ansible-playbook -i inventory setup.yml --ask-become-pass

This produces 20 machines after a few minutes.

The processes running during this setup were mostly rync (copying the container contents), plus the network waiting to retrieve python many times. If you need to optimise to frequent container spin-ups, LXC supports
storage back-ends that have copy-on-write, and you can cache package installs with a local webserver, or build some packages into the template.

Running these 20 containers plus a Debian desktop, I found that my computer was using just 2.9GB of RAM, so I figured I would test 200 empty containers at once.

for i in {1..200}; do echo deb$(printf "%03d" $i).example.com ansible_host=10.0.3.$((i+1)); done > inventory
ansible -i inventory setup.yml

It took truly a very long time to add Python to each install, but the result is as you would expect:

$ sudo lxc-ls --fancy
NAME               STATE   AUTOSTART GROUPS IPV4       IPV6 
deb001.example.com RUNNING 0         -      10.0.3.2   -    
deb002.example.com RUNNING 0         -      10.0.3.3   -    
deb003.example.com RUNNING 0         -      10.0.3.4   -    
...
deb198.example.com RUNNING 0         -      10.0.3.199 -    
deb199.example.com RUNNING 0         -      10.0.3.200 -    
deb200.example.com RUNNING 0         -      10.0.3.201 -    

The base resource usage of an idle container is absolutely tiny, around 13 megabytes — the system moved from 2.9GB to 5.4GB of RAM used when I added 180 containers. Containers clearly have a lower overhead than VM’s, since no RAM has been reserved here.

Software updates

The containers are updated just like regular VM’s-

apt-get update
apt-get dist-upgrade

Backups

In this setup, the container’s contents is stored under /var/lib/lxc/. As long as the container is stopped, you get at it safely with tar or rsync to make a full copy:

$ sudo tar -czf deb001.20180209.tar.gz /var/lib/lxc/deb001.example.com/
$ rsync -avz /var/lib/lxc/deb001.example.com/ remote-computer@example.com:/backups/deb001.example.com/

Full-machine snapshots are also available on the Ceph or LVM back-ends, if you use those.

Teardown

The same Ansible module can be used to delete all of these machines in a few seconds.

- hosts: all
  connection: local
  become: true
  tasks:
  - name: Containers do not exist
    delegate_to: localhost
    lxc_container:
      name: "{{ inventory_hostname }}"
      state: absent
ansible-playbook -i inventory teardown.yml --ask-become-pass

Conclusion

Hopefully this post has given you some insight into one way that Linux containers can be used. I have found LXC to be a great technology to work with for standalone setups, and regularly use the same scripts to configure either an LXC container or a VM’s depending on the target environment.

The low resource usage also means that I can run fairly complex setups on a laptop, where the overhead of large VM’s would be prohibitive.

I don’t think that LXC is directly comparable to full container ecosystems like Docker, since they are geared towards different use cases. These are both useful tools to know, and have complementary strengths.