Lab 14 - Ansible

In this lab, you’re going to use Ansible to manage the Tiger Enterprises infrastructure with "code" instead of with manually-entered commands.

Tiger Enterprises is growing by leaps and bounds!    We want the ability to spin up arbitrary numbers of systems and have them automatically configured to our liking.  And then maintain that configuration over time, so systems don't drift apart.

Activities

Installation

Ansible is an agentless automation tool that manages machines over the SSH protocol. Ansible can be run from any machine with Python 2 (version 2.7) or Python 3 (versions 3.5 and higher) installed. Once installed, Ansible does not add a database, and there will be no daemons to start or keep running. You only need to install it on one machine (the "control node") and it can manage an entire fleet of remote machines from that central point. When Ansible manages remote machines, it does not leave software installed or running on them beyond that which the administrator specifies.

References:

First, create a new Security Group for the Ansible controller:

  • Go into EC2 -> Network & Security -> Security Groups
  • Create a new security group: Tiger-SG-Ansible
  • VPC: Tiger-VPC
  • Inbound rules:
    • SSH from source Anywhere
    • All ICMP - IPv4 from source Anywhere (This permits ICMP pings for network debugging)

Create a new EC2 instance at AWS meeting the following requirements:

  • AMI: Latest Amazon-provided Ubuntu 20.04 image - x86_64
  • Instance type: t2.micro (1 vCPU, 1 GiB memory)
  • Network: “Tiger-VPC"
  • Subnet: “Tiger-VPC-Public” # Select the PUBLIC subnet (Normally this would be private, but we can save some $$ by skipping the VPN...)
  • Auto-assign Public IP: Use Subnet setting (Enable)
  • Storage: 8GiB with "Delete on Termination" enabled
  • Tag: Name = “Ansible: Ubuntu 20.04"
  • Security Group: Use existing security group: “Tiger-SG-Ansible"
  • Keypair: Existing keypair / “COMP-175-Lab-1"

On that instance, install Ansible:

$ sudo apt update
$ sudo apt install software-properties-common
$ sudo apt-add-repository --yes --update ppa:ansible/ansible
$ sudo apt install ansible ansible-lint

Part 1 - Inventory

In Ansible, a collection of machines to manage is called an Inventory. An inventory can be as simple as a list of IP addresses or Fully Qualified Domain Names (FQDNs), or contain many groups or individual machines, each with their own alias and environment variables.

Reference:

The default inventory file is stored at /etc/ansible/hosts, and both INI and YAML formats are supported. Let's go with INI file format here - it's easier to get started with. The example file below defines host aliases, and uses the ansible_host parameter to specify the name (DNS or IP) that the alias corresponds to.

Take this file as an example and customize it for the specific instances and specific IP addresses in your AWS account. The instances do not have to be currently running - AWS preserves the private IP addresses even when the instance is shut down. Only the public IP address changes. Omit the Windows instances. While Ansible does have the ability to manage Windows computers (see the ansible.windows collection), we won't be managing Windows computers in this lab.

$ sudo nano /etc/ansible/hosts
# Global settings (unless otherwise specified)
[all:vars]
ansible_user=ubuntu

# Group "ubuntu"
[ubuntu]
load_balancer ansible_host=10.101.0.123
web1 ansible_host=10.101.0.124
vpn ansible_host=10.101.0.162
docker ansible_host=10.101.0.20
k8s-controller ansible_host=10.101.0.60
k8s-worker01 ansible_host=10.101.0.65
k8s-worker02 ansible_host=10.101.0.21
ansible-control ansible_host=localhost

Before continuing, confirm that you have connected via SSH to the Ansible machine using your SSH agent. While we could setup additional keys and user accounts purely for Ansible (a good idea for a large deployment), for this lab the most straightforward option is to have Ansible login using your private key. Rather than uploading that key to the file server and letting it sit on disk, we'll instead pass it in memory via the SSH Agent so that it is only present while you are using Ansible. When you log out of SSH, the key is erased from memory.

# Confirm that the agent on ansible-control has your key
$ ssh-add -l

# Example Output:
# 2048 SHA256:rMn+23O+Sh2tZo7SkOMiDtTh0VJ0J6bxWeuSqeY4oXU /Users/shafer/Downloads/COMP-175-Lab-1.pem (RSA)

Test this inventory with some basic Ansible commands. (Note that, you could use the keyword all instead of the alias ansible-control if you wanted to perform these actions on all instances, but because they are not currently running to save $$, let's restrict our focus to just the ansible-control system, which happens to be currently running.)

# Use the ping module to ping the ansible-control node
$ ansible ansible-control -m ping

# Example Output:
# The authenticity of host 'localhost (127.0.0.1)' can't be established.
# ECDSA key fingerprint is SHA256:RhksEx8cZv4/tdLRjuWpcGbP3mN4YPd42+vk9TcFB3s.
# Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
# ansible-control | SUCCESS => {
#     "ansible_facts": {
#         "discovered_interpreter_python": "/usr/bin/python3"
#     },
#     "changed": false,
#     "ping": "pong"
# }

Note that the "authenticity of host" warning is expected. SSH warns you when you connect to a new machine that hasn't previously been contacted before. You see the same warning in your SSH client the first time you connect to a new instance. It's only a problem if you get the warning and you had previously connected to that machine - that would be a sign that perhaps you aren't connecting to the system you expected to. Human error? A sign of attack?

# Run a live command on the ansible-control nodes
$ ansible ansible-control -a "/bin/echo Tiger Enterprises"

# Example Output:
# ansible-control | CHANGED | rc=0 >>
# Tiger Enterprises
# Print facts (discovered variables) about the ansible-control
$ ansible ansible-control -m ansible.builtin.setup

# Example Output:
# LENGTHY!!

Part 2 - Ad-hoc Ansible Commands

In Ansible, ad-hoc commands are one-line commands that are manually entered.

Reference:

Let's start with a basic ad-hoc command - updating the packages on the ansible-control host. There are several options for accomplishing this:

Option 1: Prior to Ansible automation, you would use a command like this: sudo apt update && sudo apt upgrade -y. That would first update the list of available packages, and only if that command succeeded, it would then upgrade currently-installed packages to the latest version.

Option 2: You could use the module ansible.builtin.shell in Ansible to execute arbitrary shell commands on the target: ansible ansible-control -m shell -a "sudo apt update && sudo apt upgrade -y. While it does "use Ansible", it only barely uses Ansible, and manually providing commands to run risks your Ansible configuration falling short of Idempotency. That is to say, if you run your Ansible command 1 time or 1000 times, the end state should be exactly the same. That is very hard to guarantee with arbitrary raw shell commands which often have side-effects.

Video: What is Ansible Part VII: Idempotence Explained

Option 3: The preferred method is to avoid using raw shell commands whenever possible, in favor of purpose-specific modules. In this case, Ansible has the module ansible.builtin.apt that can install, uninstall, and update packages. Let's use this module:

$ ansible ansible-control -m apt -a "update_cache=yes upgrade=yes" --become 
# --become is privilege escalation, e.g. sudo

# This takes a while to run
# * Checks for updated packages
# * Installs updated packages

# Example Output:
# ansible-control | CHANGED => {
#     "ansible_facts": {
#         "discovered_interpreter_python": "/usr/bin/python3"
#     },
#     "changed": true,
#     "msg": "Reading package lists...
#  Followed by ALL the output that apt produced during updating...

Task 1 - Ad-hoc - Update Hostname

Using an ad-hoc Ansible command, update the hostname of the current instance to ansible-control.

Tips:

  • You may find the module ansible.builtin.hostname useful for this.
  • You need to be root to change the hostname
  • Although Ubuntu is based on Debian, the systemd mechanism to change names will be more successful. Autodetection of this works properly.

Deliverables:

  • Submit a screenshot of:
    (a) The ad-hoc Ansible command to update the hostname, and
    (b) The output of hostnamectl showing that the hostname has been changed

Part 3 - Playbooks

An Ansible Playbook is a YAML-formatted file that specifies configuration to deploy on one or more systems. With a playbook, you can declare configurations, specify exact states that must be obtained in a specific order, and launch tasks either synchronously or asynchronously. Playbook files can be stored in version control so that changes can be easily tracked and enterprise configuration standards enforced.

References:

Here is an example playbook that updates packages on the ansible host, based on the previous ad-hoc command.

$ nano update-server.yml
---
- hosts: ansible-control
  tasks:
  - name: Update and upgrade apt packages
    become: 'yes'  # Privilege escalation, e.g. 'sudo'
    apt:
      upgrade: 'yes'
      update_cache: 'yes'
      cache_valid_time: 86400 #Cache is valid for 1 day

Prior to running your playbook, take a moment to verify it first:

# Is the syntax correct?
$ ansible-playbook update-server.yml --syntax-check
# Desired output:
# playbook: update-server.yml

# What hosts will this playbook affect?
$ ansible-playbook update-server.yml --list-hosts
# Example Output:
# playbook: update-server.yml
# 
#   play #1 (ansible-control): ansible-control  TAGS: []
#     pattern: ['ansible-control']
#     hosts (1):
#       ansible-control

# What changes will be made? (without taking any action)
# This is a SIMULATION (aka "dry run")
$ ansible-playbook update-server.yml --check

Run your playbook:

$ ansible-playbook update-server.yml 
# Can use --verbose option if you want detailed output on
# actions taken (successfully or unsuccessfully)

Task 2 - Playbook - Update Hostname

Create the Ansible playbook ansible-hostname.yml.

  • Goal: Set Hostname
  • The playbook should ensure that the hostname of the ansible-control instance is ansible-control
  • The playbook should ensure that there is a line in the /etc/hosts file reading 127.0.0.1 ansible-control
  • To verify, use sudo hostnamectl and cat /etc/hosts

Deliverables:

  • Upload your ansible-hostname.yml file
  • Submit a screenshot of your verification that the hostname was updated successfully

Task 3 - Playbook - Install Fail2Ban

Create the Ansible playbook fail2ban.yml.

  • Goal: Install Fail2Ban
  • The playbook should ensure Fail2Ban is installed on all systems in group ubuntu.
  • To verify, use sudo systemctl status fail2ban to confirm that the service is loaded and active (running).

Note: You don't need to have every machine running for this. It is sufficient to demonstrate that this playbook succeeds on ansible-control. The other machines that Ansible can't reach will just show as errors, and that's fine.

Deliverables:

  • Upload your fail2ban.yml file
  • Submit a screenshot of your verification that fail2ban was installed successfully

Task 4 - Playbook - Install Chrony

Create the Ansible playbook chrony.yml.

  • Goal: Install Chrony time synchronization
  • The playbook should ensure that the chrony time manager is installed and running on all systems in group ubuntu.
  • The playbook should ensure that the contents of /etc/chrony/chrony.conf contains the line server 169.254.169.123 prefer iburst minpoll 4 maxpoll 4
  • The playbook should ensure that the chrony systemd service is restarted if the config file is modified (which will happen the first time, because the system package does not include that AWS-specific time server). Do not restart the chrony service otherwise.
  • To verify, confirm that chronyc sources -v has its first time server set to 169.254.169.123. Furthermore, even if you run the playbook a dozen times, the /etc/chrony/chrony.conf file should only have the one new line (not a dozen), and sudo systemctl status chrony should show that the server has not been restarted multiple times (view the uptime on line 3)

Note: You don't need to have every machine running for this. It is sufficient to demonstrate that this playbook succeeds on ansible-control. The other machines that Ansible can't reach will just show as errors, and that's fine.

Deliverables:

  • Upload your chrony.yml file
  • Submit a screenshot of your verification that chrony was (a) installed and (b) configured successfully

Part 4 - Playbooks for EC2

Ansible has two collections with a number of useful modules for AWS: amazon.aws (maintained by Ansible) and community.aws (maintained by the community).

References:

To install both collections, use the ansible-galaxy command on the Ansible controller:

$ ansible-galaxy collection install amazon.aws
$ ansible-galaxy collection install community.aws

For the AWS integration to be complete, an additional library needs to be installed and configured on the Ansible controller: Boto, the AWS SDK for Python

$ sudo apt install python3-pip
$ sudo pip3 install boto boto3  
# Pip has a newer version than available in Ubuntu package manager for release 20.04

Next, set up a credentials file for boto:

$ mkdir ~/.aws
$ nano ~/.aws/credentials

The contents of the fiile should be:

[default]
aws_access_key_id = YOUR_KEY
aws_secret_access_key = YOUR_SECRET
aws_session_token= YOUR_SESSION_TOKEN

You can get your secrets from your AWS Academy account in the Learner Lab portal. Click on the "AWS Details", and then "Show" next to AWS CLI. Copy that entire block into your ~/.aws/credentials file.

WARNING: In your AWS Academy account, the access key and token expire every four hours. If you stop work on the lab and resume it the next day, you will need to update your ~/.aws/credentials file with the current access codes.

WARNING: In your AWS Academy account, three pieces of information are required to authenticate you: AWS_ACCESS_KEY_ID, AWS_SECRET_KEY, and AWS_SESSION_TOKEN. However, some of the boto library functions don't use the session token for whatever reason (aka it's a bug). Setting these three items as environment variables in your Ansible playbook may resolve any "AuthFailure: AWS was not able to validate the provided access credentials" errors you encounter when you've confident you have the current credentials for your current session.

Finally, set up a region file for boto:

$ nano ~/.aws/config

The contents of the file should be:

[default]
region=us-east-1

If you want to test boto3, a simple Python script can be used to list the instances in your account:

$ nano boto-test.py
$ chmod +x boto-test.py
$ ./boto-test.py

Contents of boto-test.py:

#!/bin/python3

import boto3

ec2 = boto3.resource('ec2')
for instance in ec2.instances.all():
   print(instance)

Task 5 - Playbook - Script DNS Updates

Create the Ansible playbook dns-update.yml

  • Goal: Assign the DNS name ansible.STUDENT-NAME.tigerenterprises.org to your instance, and script it so DNS automatically updates with the current public IP address when the system boots.
  • The playbook should ensure that the dns-update script is present in /var/lib/cloud/scripts/per-boot/ so that it runs automatically upon boot. You can put the script in the same directory as the yml file, and have Ansible copy it over. Ensure that the script is marked as executable.
  • The playbook should ensure that the AWS CLI is installed
  • The playbook should ensure that the /root/.aws/credentials file is configured with the classwide-account information
  • The playbook should run the script /var/lib/cloud/scripts/per-boot/dns-update (as root) and print the output to the screen for human viewing at the end.
  • Note: You do not have to explicitly create the Type A record in Route53 as part of your playbook. The dns-update script will create it as part of its updating process.
  • To verify:
    • Manually run aws --version to confirm that the CLI now exists
    • Manually run the script sudo /var/lib/cloud/scripts/per-boot/dns-update and ensure that it runs without errors. You should see that the DNS update is "PENDING".
{
    "ChangeInfo": {
        "Id": "/change/C061018739I8DCF8VUOO2",
        "Status": "PENDING",
        "SubmittedAt": "2020-11-18T01:30:09.988000+00:00"
    }
}

Tips: The following Ansible modules were helpful when building the instructor solution to this playbook:

Deliverables:

  • Upload your dns-update.yml file
  • Submit a screenshot of ansible-playbook dns-update.yml showing as much output as possible, particularly the end (where dns-update runs and its output is shown)

Task 6 - Playbook - Create Kubernetes Worker Nodes

Create the Ansible playbook kubernetes-worker.yml

  • Goal: Use Ansible to ensure the specified number of Kubernetes worker nodes exists and are configured as workers
  • The playbook should ensure that a security group with the name Tiger-K8s-Worker-Ansible exists with the ports specified in the Kubernetes lab
  • The playbook should ensure that 2 EC2 instances exist with the tag ansible-key set to k8s-worker-auto. New instances will be started if an insufficient number exists, and surplus instances will be terminated if too many instances exist. Instances should be created with all of the standard settings for Kubernetes workers, including:
    • Security Group: Tiger-K8s-Worker-Ansible
    • VPC (use the VPC ID)
    • VPC Subnet (use the Subnet ID)
    • Instance Type
    • Volumes (use type gp2 and device name /dev/sda1, as shown in your previous Kubernetes workers)
    • Tags:
      • Name: K8s-Worker-Ansible
      • ansible-key: k8s-worker-auto (This is how Ansible will identify instances that it can manage from others that YOU manually created)
  • The playbook should ensure that the all EC2 instances with the tag ansible-key: k8s-worker-auto have the following settings applied to them, following the standard Kubernetes requirements:
    • Chrony is installed
    • Docker is installed
    • Kubernetes is installed
    • Worker is joined to cluster
  • To verify:
    • On the Kubernetes controller, kubectl get nodes should show your two new workers with a status of Ready (plus the two existing non-Ansible workers with a status of NotReady, unless you also have those running )
    • On the Kubernetes controller, kubectl top nodes should show you CPU and memory utilization on your two new workers (plus no statistics for the existing non-Ansible workers, unless you also have those running)

Tips: The following Ansible modules were helpful when building the instructor solution to this playbook:

If we were using Version Control, this would be a great time to commit and push your Ansible playbooks to your enterprise repository.

Deliverables:

  • Upload your kubernetes-worker.yml file
  • Submit a screenshot of ansible-playbook kubernetes-worker.yml showing as much output as possible, particularly the end.
  • Submit a screenshot of kubectl get nodes and kubectl top nodes on the Kubernetes controller, showing your two new workers.

Troubleshooting

  • Are you getting a dial tcp 127.0.0.1:10248: connection: connection refused error when you try to do your kubeadm join? Verify that you are putting the correct settings into Docker's daemon.json file and rebooting the docker service afterwards. Otherwise, Kubernetes won't integrate with Docker properly.

Lab Deliverables

After submitting the Canvas assignment, you should STOP your virtual machines, not terminate them. We'll use them again in future labs, and thus want to save the configuration and OS data.

Upload to the Lab 14 Canvas assignment all the lab deliverables to demonstrate your work:

  • Part 2 - Ad-hoc Ansible Commands
    • Task 1: Submit a screenshot of:
      (a) The ad-hoc Ansible command to update the hostname, and
      (b) The output of hostnamectl showing that the hostname has been changed
  • Part 3 - Playbooks
    • Task 2: Upload your ansible-hostname.yml file
    • Submit a screenshot of your verification that the hostname was updated successfully
    • Task 3: Upload your fail2ban.yml file
    • Submit a screenshot of your verification that fail2ban was installed successfully
    • Task 4: Upload your chrony.yml file
    • Submit a screenshot of your verification that chrony was (a) installed and (b) configured successfully
  • Part 4 - Playbooks for EC2
    • Task 5: Upload your dns-update.yml file
    • Submit a screenshot of ansible-playbook dns-update.yml showing as much output as possible, particularly the end (where dns-update runs and its output is shown)
    • Task 6: Upload your kubernetes-worker.yml file
    • Submit a screenshot of ansible-playbook kubernetes-worker.yml showing as much output as possible, particularly the end.
    • Submit a screenshot of kubectl get nodes and kubectl top nodes on the Kubernetes controller, showing your two new workers.