This article explores error handling in Ansible, focusing on managing task failures during playbook execution for single and multiple server deployments.
In this lesson, we explore error handling in Ansible and examine how playbooks respond when tasks fail during execution. Previously, we explored execution strategies and overall task flow; now we focus on managing failures effectively.
When deploying a web application to a single server, Ansible executes tasks sequentially. If one task fails—say, the first two tasks run successfully and the third fails—the playbook halts immediately. Below is a sample playbook structure for a single server deployment:
When running a playbook on multiple servers, Ansible completes each task on all targeted hosts before proceeding to the next one. For instance, it will install dependencies on all servers first, then continue with MySQL installation. If a task—such as starting the database—fails on one server (e.g., server two), Ansible removes that server from subsequent tasks and continues executing on the remaining healthy servers.
Ansible’s default behavior is to execute as many tasks as possible unless an alternative configuration is provided.
Consistency during deployment may require stopping the playbook as soon as any server fails. You can achieve this by enabling the any_errors_fatal option. With this option set to true, a failure on any task across any server halts the entire playbook execution. The updated playbook structure is as follows:
When deploying on hundreds of servers, occasional individual task failures might be acceptable. You can rerun the playbook for those specific servers later. However, if a significant portion of servers fails a task, it indicates a broader problem and should stop the playbook execution. The max_fail_percentage option allows you to define a failure threshold. For example, setting max_fail_percentage: 30 means that if more than 30% of the servers fail at a particular task, the playbook stops:
In some cases, non-critical errors should not interrupt the overall playbook execution. For instance, you might include a notification task using the mail module to inform your team that the web server is ready even if the SMTP server is unstable. By setting the ignore_errors flag to yes, you ensure that the playbook continues even if this notification fails:
Copy
Ask AI
- name: Deploy web application hosts: server1,server2,server3 any_errors_fatal: true tasks: - name: Install dependencies << code hidden >> - name: Install MySQL Database << code hidden >> - name: Start MySQL Service << code hidden >> - name: Install Python Flask Dependencies << code hidden >> - name: Run web-server << code hidden >> - name: Send notification email mail: to: [email protected] subject: Server Configured body: Web server has been configured ignore_errors: yes
To further ensure the stability of your web server, you can run a health check that verifies the server log file does not contain error messages. In this task, the output of the command is captured using the register keyword, and the failed_when directive checks for the word “ERROR” in the output. If found, the task—and thus the playbook—fails:
Copy
Ask AI
- name: Deploy web application hosts: server1,server2,server3 any_errors_fatal: true tasks: - name: Install dependencies << code hidden >> - name: Install MySQL Database << code hidden >> - name: Start MySQL Service << code hidden >> - name: Install Python Flask Dependencies << code hidden >> - name: Run web-server << code hidden >> - name: Check server log for errors command: cat /var/log/server.log register: command_output failed_when: "'ERROR' in command_output.stdout" - name: Send notification email mail: to: [email protected] subject: Web server Configured body: Web server has been configured ignore_errors: yes
For more robust error handling, you can group related tasks into a block. If any task within the block fails, the reserved rescue section is executed. In the example below, if the tasks within the block fail, an alert email is sent to notify the administrators of the failure:
Copy
Ask AI
- name: Deploy web application hosts: server1,server2,server3 any_errors_fatal: true tasks: - name: Install web application block: - name: Install dependencies << code hidden >> - name: Install MySQL Database << code hidden >> - name: Start MySQL Service << code hidden >> - name: Install Python Flask Dependencies << code hidden >> - name: Run web-server << code hidden >> rescue: - name: Notify admins about playbook failure mail: to: [email protected] subject: Playbook Failed body: Web server configuration failed
That concludes our review of error handling in Ansible. For further practice, try applying these techniques in your own playbooks to enhance reliability and troubleshoot issues effectively.