Lab 14 - Ansible
In this lab, you’re going to use Ansible to manage the Tiger Enterprises infrastructure with "code" instead of with manually-entered commands.
Tiger Enterprises is growing by leaps and bounds! We want the ability to spin up arbitrary numbers of systems and have them automatically configured to our liking. And then maintain that configuration over time, so systems don't drift apart.
Activities
Installation
Ansible is an agentless automation tool that manages machines over the SSH protocol. Ansible can be run from any machine with Python 2 (version 2.7) or Python 3 (versions 3.5 and higher) installed. Once installed, Ansible does not add a database, and there will be no daemons to start or keep running. You only need to install it on one machine (the "control node") and it can manage an entire fleet of remote machines from that central point. When Ansible manages remote machines, it does not leave software installed or running on them beyond that which the administrator specifies.
References:
First, create a new Security Group for the Ansible controller:
- Go into EC2 -> Network & Security -> Security Groups
- Create a new security group:
Tiger-SG-Ansible
- VPC:
Tiger-VPC
- Inbound rules:
SSH
from sourceAnywhere
All ICMP - IPv4
from sourceAnywhere
(This permits ICMP pings for network debugging)
Create a new EC2 instance at AWS meeting the following requirements:
- AMI: Latest Amazon-provided Ubuntu 20.04 image - x86_64
- Instance type: t2.micro (1 vCPU, 1 GiB memory)
- Network: “Tiger-VPC"
- Subnet: “Tiger-VPC-Public” # Select the PUBLIC subnet (Normally this would be private, but we can save some $$ by skipping the VPN...)
- Auto-assign Public IP: Use Subnet setting (Enable)
- Storage: 8GiB with "Delete on Termination" enabled
- Tag: Name = “Ansible: Ubuntu 20.04"
- Security Group: Use existing security group: “Tiger-SG-Ansible"
- Keypair: Existing keypair / “COMP-175-Lab-1"
On that instance, install Ansible:
$ sudo apt update
$ sudo apt install software-properties-common
$ sudo apt-add-repository --yes --update ppa:ansible/ansible
$ sudo apt install ansible ansible-lint
Part 1 - Inventory
In Ansible, a collection of machines to manage is called an Inventory. An inventory can be as simple as a list of IP addresses or Fully Qualified Domain Names (FQDNs), or contain many groups or individual machines, each with their own alias and environment variables.
Reference:
The default inventory file is stored at /etc/ansible/hosts
, and both INI and YAML formats are supported. Let's go with INI file format here - it's easier to get started with. The example file below defines host aliases, and uses the ansible_host
parameter to specify the name (DNS or IP) that the alias corresponds to.
Take this file as an example and customize it for the specific instances and specific IP addresses in your AWS account. The instances do not have to be currently running - AWS preserves the private IP addresses even when the instance is shut down. Only the public IP address changes. Omit the Windows instances. While Ansible does have the ability to manage Windows computers (see the ansible.windows collection), we won't be managing Windows computers in this lab.
$ sudo nano /etc/ansible/hosts
# Global settings (unless otherwise specified)
[all:vars]
ansible_user=ubuntu
# Group "ubuntu"
[ubuntu]
load_balancer ansible_host=10.101.0.123
web1 ansible_host=10.101.0.124
vpn ansible_host=10.101.0.162
docker ansible_host=10.101.0.20
k8s-controller ansible_host=10.101.0.60
k8s-worker01 ansible_host=10.101.0.65
k8s-worker02 ansible_host=10.101.0.21
ansible-control ansible_host=localhost
Before continuing, confirm that you have connected via SSH to the Ansible machine using your SSH agent. While we could setup additional keys and user accounts purely for Ansible (a good idea for a large deployment), for this lab the most straightforward option is to have Ansible login using your private key. Rather than uploading that key to the file server and letting it sit on disk, we'll instead pass it in memory via the SSH Agent so that it is only present while you are using Ansible. When you log out of SSH, the key is erased from memory.
# Confirm that the agent on ansible-control has your key
$ ssh-add -l
# Example Output:
# 2048 SHA256:rMn+23O+Sh2tZo7SkOMiDtTh0VJ0J6bxWeuSqeY4oXU /Users/shafer/Downloads/COMP-175-Lab-1.pem (RSA)
Test this inventory with some basic Ansible commands. (Note that, you could use the keyword all
instead of the alias ansible-control
if you wanted to perform these actions on all instances, but because they are not currently running to save $$, let's restrict our focus to just the ansible-control system, which happens to be currently running.)
# Use the ping module to ping the ansible-control node
$ ansible ansible-control -m ping
# Example Output:
# The authenticity of host 'localhost (127.0.0.1)' can't be established.
# ECDSA key fingerprint is SHA256:RhksEx8cZv4/tdLRjuWpcGbP3mN4YPd42+vk9TcFB3s.
# Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
# ansible-control | SUCCESS => {
# "ansible_facts": {
# "discovered_interpreter_python": "/usr/bin/python3"
# },
# "changed": false,
# "ping": "pong"
# }
Note that the "authenticity of host" warning is expected. SSH warns you when you connect to a new machine that hasn't previously been contacted before. You see the same warning in your SSH client the first time you connect to a new instance. It's only a problem if you get the warning and you had previously connected to that machine - that would be a sign that perhaps you aren't connecting to the system you expected to. Human error? A sign of attack?
# Run a live command on the ansible-control nodes
$ ansible ansible-control -a "/bin/echo Tiger Enterprises"
# Example Output:
# ansible-control | CHANGED | rc=0 >>
# Tiger Enterprises
# Print facts (discovered variables) about the ansible-control
$ ansible ansible-control -m ansible.builtin.setup
# Example Output:
# LENGTHY!!
Part 2 - Ad-hoc Ansible Commands
In Ansible, ad-hoc commands are one-line commands that are manually entered.
Reference:
Let's start with a basic ad-hoc command - updating the packages on the ansible-control
host. There are several options for accomplishing this:
Option 1: Prior to Ansible automation, you would use a command like this:
sudo apt update && sudo apt upgrade -y
. That would first update the list of available packages, and only if that command succeeded, it would then upgrade currently-installed packages to the latest version.
Option 2: You could use the module ansible.builtin.shell in Ansible to execute arbitrary shell commands on the target: ansible ansible-control -m shell -a "sudo apt update && sudo apt upgrade -y
. While it does "use Ansible", it only barely uses Ansible, and manually providing commands to run risks your Ansible configuration falling short of Idempotency. That is to say, if you run your Ansible command 1 time or 1000 times, the end state should be exactly the same. That is very hard to guarantee with arbitrary raw shell commands which often have side-effects.
Option 3: The preferred method is to avoid using raw shell commands whenever possible, in favor of purpose-specific modules. In this case, Ansible has the module ansible.builtin.apt that can install, uninstall, and update packages. Let's use this module:
$ ansible ansible-control -m apt -a "update_cache=yes upgrade=yes" --become
# --become is privilege escalation, e.g. sudo
# This takes a while to run
# * Checks for updated packages
# * Installs updated packages
# Example Output:
# ansible-control | CHANGED => {
# "ansible_facts": {
# "discovered_interpreter_python": "/usr/bin/python3"
# },
# "changed": true,
# "msg": "Reading package lists...
# Followed by ALL the output that apt produced during updating...
Task 1 - Ad-hoc - Update Hostname
Using an ad-hoc Ansible command, update the hostname of the current instance to ansible-control
.
Tips:
- You may find the module ansible.builtin.hostname useful for this.
- You need to be root to change the hostname
- Although Ubuntu is based on Debian, the
systemd
mechanism to change names will be more successful. Autodetection of this works properly.
Deliverables:
- Submit a screenshot of:
(a) The ad-hoc Ansible command to update the hostname, and
(b) The output ofhostnamectl
showing that the hostname has been changed
Part 3 - Playbooks
An Ansible Playbook is a YAML-formatted file that specifies configuration to deploy on one or more systems. With a playbook, you can declare configurations, specify exact states that must be obtained in a specific order, and launch tasks either synchronously or asynchronously. Playbook files can be stored in version control so that changes can be easily tracked and enterprise configuration standards enforced.
References:
- YAML Syntax
- Working with Playbooks
- Understanding privilege escalation: become
- Ansible Collection:
ansible.builtin
- Ansible Collection:
ansible.builtin.apt
- Ansible Collection:
ansible.builtin.systemd
Here is an example playbook that updates packages on the ansible host, based on the previous ad-hoc command.
$ nano update-server.yml
---
- hosts: ansible-control
tasks:
- name: Update and upgrade apt packages
become: 'yes' # Privilege escalation, e.g. 'sudo'
apt:
upgrade: 'yes'
update_cache: 'yes'
cache_valid_time: 86400 #Cache is valid for 1 day
Prior to running your playbook, take a moment to verify it first:
# Is the syntax correct?
$ ansible-playbook update-server.yml --syntax-check
# Desired output:
# playbook: update-server.yml
# What hosts will this playbook affect?
$ ansible-playbook update-server.yml --list-hosts
# Example Output:
# playbook: update-server.yml
#
# play #1 (ansible-control): ansible-control TAGS: []
# pattern: ['ansible-control']
# hosts (1):
# ansible-control
# What changes will be made? (without taking any action)
# This is a SIMULATION (aka "dry run")
$ ansible-playbook update-server.yml --check
Run your playbook:
$ ansible-playbook update-server.yml
# Can use --verbose option if you want detailed output on
# actions taken (successfully or unsuccessfully)
Task 2 - Playbook - Update Hostname
Create the Ansible playbook ansible-hostname.yml
.
- Goal: Set Hostname
- The playbook should ensure that the hostname of the ansible-control instance is
ansible-control
- The playbook should ensure that there is a line in the
/etc/hosts
file reading127.0.0.1 ansible-control
- To verify, use
sudo hostnamectl
andcat /etc/hosts
Deliverables:
- Upload your
ansible-hostname.yml
file - Submit a screenshot of your verification that the hostname was updated successfully
Task 3 - Playbook - Install Fail2Ban
Create the Ansible playbook fail2ban.yml
.
- Goal: Install Fail2Ban
- The playbook should ensure Fail2Ban is installed on all systems in group
ubuntu
. - To verify, use
sudo systemctl status fail2ban
to confirm that the service is loaded and active (running).
Note: You don't need to have every machine running for this. It is sufficient to demonstrate that this playbook succeeds on ansible-control. The other machines that Ansible can't reach will just show as errors, and that's fine.
Deliverables:
- Upload your
fail2ban.yml
file - Submit a screenshot of your verification that fail2ban was installed successfully
Task 4 - Playbook - Install Chrony
Create the Ansible playbook chrony.yml
.
- Goal: Install Chrony time synchronization
- The playbook should ensure that the chrony time manager is installed and running on all systems in group
ubuntu
. - The playbook should ensure that the contents of
/etc/chrony/chrony.conf
contains the lineserver 169.254.169.123 prefer iburst minpoll 4 maxpoll 4
- The playbook should ensure that the chrony systemd service is restarted if the config file is modified (which will happen the first time, because the system package does not include that AWS-specific time server). Do not restart the chrony service otherwise.
- To verify, confirm that
chronyc sources -v
has its first time server set to169.254.169.123
. Furthermore, even if you run the playbook a dozen times, the/etc/chrony/chrony.conf
file should only have the one new line (not a dozen), andsudo systemctl status chrony
should show that the server has not been restarted multiple times (view the uptime on line 3)
Note: You don't need to have every machine running for this. It is sufficient to demonstrate that this playbook succeeds on ansible-control. The other machines that Ansible can't reach will just show as errors, and that's fine.
Deliverables:
- Upload your
chrony.yml
file - Submit a screenshot of your verification that chrony was (a) installed and (b) configured successfully
Part 4 - Playbooks for EC2
Ansible has two collections with a number of useful modules for AWS: amazon.aws
(maintained by Ansible) and community.aws
(maintained by the community).
References:
- Ansible Collection:
amazon.aws
- Ansible Collection:
community.aws
- Boto3: AWS SDK for Python - This is a supporting library
To install both collections, use the ansible-galaxy
command on the Ansible controller:
$ ansible-galaxy collection install amazon.aws
$ ansible-galaxy collection install community.aws
For the AWS integration to be complete, an additional library needs to be installed and configured on the Ansible controller: Boto, the AWS SDK for Python
$ sudo apt install python3-pip
$ sudo pip3 install boto boto3
# Pip has a newer version than available in Ubuntu package manager for release 20.04
Next, set up a credentials file for boto:
$ mkdir ~/.aws
$ nano ~/.aws/credentials
The contents of the fiile should be:
[default]
aws_access_key_id = YOUR_KEY
aws_secret_access_key = YOUR_SECRET
aws_session_token= YOUR_SESSION_TOKEN
You can get your secrets from your AWS Academy account in the Learner Lab portal. Click on the "AWS Details", and then "Show" next to AWS CLI. Copy that entire block into your ~/.aws/credentials
file.
WARNING: In your AWS Academy account, the access key and token expire every four hours. If you stop work on the lab and resume it the next day, you will need to update your ~/.aws/credentials
file with the current access codes.
WARNING: In your AWS Academy account, three pieces of information are required to authenticate you: AWS_ACCESS_KEY_ID
, AWS_SECRET_KEY
, and AWS_SESSION_TOKEN
. However, some of the boto library functions don't use the session token for whatever reason (aka it's a bug). Setting these three items as environment variables in your Ansible playbook may resolve any "AuthFailure: AWS was not able to validate the provided access credentials" errors you encounter when you've confident you have the current credentials for your current session.
Finally, set up a region file for boto:
$ nano ~/.aws/config
The contents of the file should be:
[default]
region=us-east-1
If you want to test boto3, a simple Python script can be used to list the instances in your account:
$ nano boto-test.py
$ chmod +x boto-test.py
$ ./boto-test.py
Contents of boto-test.py:
#!/bin/python3
import boto3
ec2 = boto3.resource('ec2')
for instance in ec2.instances.all():
print(instance)
Task 5 - Playbook - Script DNS Updates
Create the Ansible playbook dns-update.yml
- Goal: Assign the DNS name
ansible.STUDENT-NAME.tigerenterprises.org
to your instance, and script it so DNS automatically updates with the current public IP address when the system boots. - The playbook should ensure that the
dns-update
script is present in/var/lib/cloud/scripts/per-boot/
so that it runs automatically upon boot. You can put the script in the same directory as the yml file, and have Ansible copy it over. Ensure that the script is marked as executable. - The playbook should ensure that the AWS CLI is installed
- The playbook should ensure that the
/root/.aws/credentials
file is configured with theclasswide-account
information - The playbook should run the script
/var/lib/cloud/scripts/per-boot/dns-update
(as root) and print the output to the screen for human viewing at the end. - Note: You do not have to explicitly create the Type A record in Route53 as part of your playbook. The
dns-update
script will create it as part of its updating process. - To verify:
- Manually run
aws --version
to confirm that the CLI now exists - Manually run the script
sudo /var/lib/cloud/scripts/per-boot/dns-update
and ensure that it runs without errors. You should see that the DNS update is "PENDING".
- Manually run
{
"ChangeInfo": {
"Id": "/change/C061018739I8DCF8VUOO2",
"Status": "PENDING",
"SubmittedAt": "2020-11-18T01:30:09.988000+00:00"
}
}
Tips: The following Ansible modules were helpful when building the instructor solution to this playbook:
- ansible.builtin.command – Execute commands on targets
- ansible.builtin.copy – Copy files to remote locations
- ansible.builtin.file - Manage files and file properties
- ansible.builtin.get_url – Downloads files from HTTP, HTTPS, or FTP to node
- ansible.builtin.unarchive – Unpacks an archive after (optionally) copying it from the local machine
Deliverables:
- Upload your
dns-update.yml
file - Submit a screenshot of
ansible-playbook dns-update.yml
showing as much output as possible, particularly the end (wheredns-update
runs and its output is shown)
Task 6 - Playbook - Create Kubernetes Worker Nodes
Create the Ansible playbook kubernetes-worker.yml
- Goal: Use Ansible to ensure the specified number of Kubernetes worker nodes exists and are configured as workers
- The playbook should ensure that a security group with the name
Tiger-K8s-Worker-Ansible
exists with the ports specified in the Kubernetes lab - The playbook should ensure that 2 EC2 instances exist with the tag
ansible-key
set tok8s-worker-auto
. New instances will be started if an insufficient number exists, and surplus instances will be terminated if too many instances exist. Instances should be created with all of the standard settings for Kubernetes workers, including:- Security Group:
Tiger-K8s-Worker-Ansible
- VPC (use the VPC ID)
- VPC Subnet (use the Subnet ID)
- Instance Type
- Volumes (use type
gp2
and device name/dev/sda1
, as shown in your previous Kubernetes workers) - Tags:
Name: K8s-Worker-Ansible
ansible-key: k8s-worker-auto
(This is how Ansible will identify instances that it can manage from others that YOU manually created)
- Security Group:
- The playbook should ensure that the all EC2 instances with the tag
ansible-key: k8s-worker-auto
have the following settings applied to them, following the standard Kubernetes requirements:- Chrony is installed
- Docker is installed
- Kubernetes is installed
- Worker is joined to cluster
- To verify:
- On the Kubernetes controller,
kubectl get nodes
should show your two new workers with a status ofReady
(plus the two existing non-Ansible workers with a status ofNotReady
, unless you also have those running ) - On the Kubernetes controller,
kubectl top nodes
should show you CPU and memory utilization on your two new workers (plus no statistics for the existing non-Ansible workers, unless you also have those running)
- On the Kubernetes controller,
Tips: The following Ansible modules were helpful when building the instructor solution to this playbook:
- amazon.aws.ec2 – create, terminate, start or stop an instance in ec2
- amazon.aws.ec2_group – maintain an ec2 VPC security group
- community.aws.ec2_instance_facts – Gather information about ec2 instances in AWS
- ansible.builtin.add_host – Add a host (and alternatively a group) to the ansible-playbook in-memory inventory
- ansible.builtin.apt – Manages apt-packages
- ansible.builtin.apt_key – Add or remove an apt key
- ansible.builtin.apt_repository – Add and remove APT repositories
- ansible.builtin.command – Execute commands on targets
- ansible.builtin.stat – Retrieve file or file system status
If we were using Version Control, this would be a great time to commit and push your Ansible playbooks to your enterprise repository.
Deliverables:
- Upload your
kubernetes-worker.yml
file - Submit a screenshot of
ansible-playbook kubernetes-worker.yml
showing as much output as possible, particularly the end. - Submit a screenshot of
kubectl get nodes
andkubectl top nodes
on the Kubernetes controller, showing your two new workers.
Troubleshooting
- Are you getting a
dial tcp 127.0.0.1:10248: connection: connection refused
error when you try to do yourkubeadm join
? Verify that you are putting the correct settings into Docker'sdaemon.json
file and rebooting the docker service afterwards. Otherwise, Kubernetes won't integrate with Docker properly.
Lab Deliverables
After submitting the Canvas assignment, you should STOP your virtual machines, not terminate them. We'll use them again in future labs, and thus want to save the configuration and OS data.
Upload to the Lab 14 Canvas assignment all the lab deliverables to demonstrate your work:
- Part 2 - Ad-hoc Ansible Commands
- Task 1: Submit a screenshot of:
(a) The ad-hoc Ansible command to update the hostname, and
(b) The output ofhostnamectl
showing that the hostname has been changed
- Task 1: Submit a screenshot of:
- Part 3 - Playbooks
- Task 2: Upload your
ansible-hostname.yml
file - Submit a screenshot of your verification that the hostname was updated successfully
- Task 3: Upload your
fail2ban.yml
file - Submit a screenshot of your verification that fail2ban was installed successfully
- Task 4: Upload your
chrony.yml
file - Submit a screenshot of your verification that chrony was (a) installed and (b) configured successfully
- Task 2: Upload your
- Part 4 - Playbooks for EC2
- Task 5: Upload your
dns-update.yml
file - Submit a screenshot of
ansible-playbook dns-update.yml
showing as much output as possible, particularly the end (wheredns-update
runs and its output is shown) - Task 6: Upload your
kubernetes-worker.yml
file - Submit a screenshot of
ansible-playbook kubernetes-worker.yml
showing as much output as possible, particularly the end. - Submit a screenshot of
kubectl get nodes
andkubectl top nodes
on the Kubernetes controller, showing your two new workers.
- Task 5: Upload your