Ansible Advanced Course
Playbook Flow
Configure error handling
In this lesson, we explore error handling in Ansible and examine how playbooks respond when tasks fail during execution. Previously, we explored execution strategies and overall task flow; now we focus on managing failures effectively.
Single Server Scenario
When deploying a web application to a single server, Ansible executes tasks sequentially. If one task fails—say, the first two tasks run successfully and the third fails—the playbook halts immediately. Below is a sample playbook structure for a single server deployment:
- name: Deploy web application
hosts: server1
tasks:
- name: Install dependencies
<< code hidden >>
- name: Install MySQL Database
<< code hidden >>
- name: Start MySQL Service
<< code hidden >>
- name: Install Python Flask Dependencies
<< code hidden >>
- name: Run web-server
<< code hidden >>
Note
In single-server scenarios, immediate task failure can be beneficial to prevent further misconfigurations or cascading errors.
Multiple Servers Scenario
When running a playbook on multiple servers, Ansible completes each task on all targeted hosts before proceeding to the next one. For instance, it will install dependencies on all servers first, then continue with MySQL installation. If a task—such as starting the database—fails on one server (e.g., server two), Ansible removes that server from subsequent tasks and continues executing on the remaining healthy servers.
Ansible’s default behavior is to execute as many tasks as possible unless an alternative configuration is provided.
Halting Execution on Errors
Consistency during deployment may require stopping the playbook as soon as any server fails. You can achieve this by enabling the any_errors_fatal
option. With this option set to true, a failure on any task across any server halts the entire playbook execution. The updated playbook structure is as follows:
- name: Deploy web application
hosts: server1,server2,server3
any_errors_fatal: true
tasks:
- name: Install dependencies
<< code hidden >>
- name: Install MySQL Database
<< code hidden >>
- name: Start MySQL Service
<< code hidden >>
- name: Install Python Flask Dependencies
<< code hidden >>
- name: Run web-server
<< code hidden >>
Important
Use any_errors_fatal
when uniformity in deployment is critical, ensuring that any failure results in an immediate stop.
Handling Partial Failures with Many Servers
When deploying on hundreds of servers, occasional individual task failures might be acceptable. You can rerun the playbook for those specific servers later. However, if a significant portion of servers fails a task, it indicates a broader problem and should stop the playbook execution. The max_fail_percentage
option allows you to define a failure threshold. For example, setting max_fail_percentage: 30
means that if more than 30% of the servers fail at a particular task, the playbook stops:
- name: Deploy web application
hosts: server1,server2,server3
max_fail_percentage: 30
tasks:
- name: Install dependencies
<< code hidden >>
- name: Install MySQL Database
<< code hidden >>
- name: Start MySQL Service
<< code hidden >>
- name: Install Python Flask Dependencies
<< code hidden >>
- name: Run web-server
<< code hidden >>
Ignoring Non-Critical Errors
In some cases, non-critical errors should not interrupt the overall playbook execution. For instance, you might include a notification task using the mail module to inform your team that the web server is ready even if the SMTP server is unstable. By setting the ignore_errors
flag to yes, you ensure that the playbook continues even if this notification fails:
- name: Deploy web application
hosts: server1,server2,server3
any_errors_fatal: true
tasks:
- name: Install dependencies
<< code hidden >>
- name: Install MySQL Database
<< code hidden >>
- name: Start MySQL Service
<< code hidden >>
- name: Install Python Flask Dependencies
<< code hidden >>
- name: Run web-server
<< code hidden >>
- name: Send notification email
mail:
to: [email protected]
subject: Server Configured
body: Web server has been configured
ignore_errors: yes
Implementing a Health Check
To further ensure the stability of your web server, you can run a health check that verifies the server log file does not contain error messages. In this task, the output of the command is captured using the register
keyword, and the failed_when
directive checks for the word "ERROR" in the output. If found, the task—and thus the playbook—fails:
- name: Deploy web application
hosts: server1,server2,server3
any_errors_fatal: true
tasks:
- name: Install dependencies
<< code hidden >>
- name: Install MySQL Database
<< code hidden >>
- name: Start MySQL Service
<< code hidden >>
- name: Install Python Flask Dependencies
<< code hidden >>
- name: Run web-server
<< code hidden >>
- name: Check server log for errors
command: cat /var/log/server.log
register: command_output
failed_when: "'ERROR' in command_output.stdout"
- name: Send notification email
mail:
to: [email protected]
subject: Web server Configured
body: Web server has been configured
ignore_errors: yes
Using Blocks and Rescue
For more robust error handling, you can group related tasks into a block. If any task within the block fails, the reserved rescue section is executed. In the example below, if the tasks within the block fail, an alert email is sent to notify the administrators of the failure:
- name: Deploy web application
hosts: server1,server2,server3
any_errors_fatal: true
tasks:
- name: Install web application
block:
- name: Install dependencies
<< code hidden >>
- name: Install MySQL Database
<< code hidden >>
- name: Start MySQL Service
<< code hidden >>
- name: Install Python Flask Dependencies
<< code hidden >>
- name: Run web-server
<< code hidden >>
rescue:
- name: Notify admins about playbook failure
mail:
to: [email protected]
subject: Playbook Failed
body: Web server configuration failed
That concludes our review of error handling in Ansible. For further practice, try applying these techniques in your own playbooks to enhance reliability and troubleshoot issues effectively.
Watch Video
Watch video content
Practice Lab
Practice lab