Ansible 透過 Http Status Code 當做檢核條件

最近 Ansible 使用的機會較多,簡單紀錄一下平常可能遇到的情境與解決方式,一般情況下我都是透過單一個 playbook 來處理某個工作,如果需要多個步驟時再透過 task 來切分,但有時候會發現前一個動作雖然成功執行了,但就商業邏輯的角度來看還不能算是可以正常提供服務,所以接著執行後面動作時就會引發一連串錯誤,今天筆記就是為了避免這種狀況

情境模擬:打算執行某個 job 但 Jenkins 重啟後仍未完成 warm-up,造成後續動作也就會失敗

基本環境說明

  1. Azure 標準 B1ms (1 vcpu,2 GiB 記憶體)
  2. Centos 7.7
  3. jenkins 2.204.2
  4. ansible 2.7.8
  5. absible script

    • inventory.ini

      1
      2
      [jenkins]
      jenkins1 ansible_host=192.168.1.112 ip=192.168.1.112 ansible_user=yowko ansible_password=password ansible_become_password=password
    • install.yml

      1
      2
      3
      4
      5
      6
      ---
      - name: Trigger Build
      hosts: jenkins
      tasks:
      - name: "Trigger Build"
      shell: curl http://localhost:8080/job/Test/build?token=67c2b2b3

遇到問題

  • 錯誤訊息

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    [WARNING]: Consider using the get_url or uri module rather than running
    'curl'. If you need to use command because get_url or uri is insufficient you
    can add 'warn: false' to this command task or set 'command_warnings=False' in
    ansible.cfg to get rid of this message.
    fatal: [jenkins1]: FAILED! => {
    "changed": true,
    "cmd": "curl -I http://localhost:8080/job/Test/build?token=67c2b2b3",
    "delta": "0:00:00.066076",
    "end": "2020-02-28 14:37:53.870970",
    "invocation": {
    "module_args": {
    "_raw_params": "curl -I http://localhost:8080/job/Test/build? token=67c2b2b3",
    "_uses_shell": true,
    "argv": null,
    "chdir": null,
    "creates": null,
    "executable": null,
    "removes": null,
    "stdin": null,
    "warn": true
    }
    },
    "msg": "non-zero return code",
    "rc": 7,
    "start": "2020-02-28 14:37:53.804894",
    "stderr": " % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed connect to localhost:8080; 連線被拒絕",
    "stderr_lines": [
    " % Total % Received % Xferd Average Speed Time Time Time Current",
    " Dload Upload Total Spent Left Speed",
    "",
    " 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed connect to localhost:8080; 連線被拒絕"
    ],
    "stdout": "",
    "stdout_lines": []
    }
  • 錯誤截圖

    1error

使用語法

1
2
3
4
5
6
7
8
9
10
11
12
---
- name: Trigger Build
hosts: jenkins
tasks:
- name: "Wait for Jenkins to Trigger Build"
uri:
url: "http://localhost:8080/job/Test/build?token=67c2b2b3"
status_code: 201
register: result
until: result.status == 201
retries: 60
delay: 1

心得

上面的語法是精簡版,可以拆解為兩個步驟:

  1. 先檢查服務是否正確運作
  2. 再執行目標操作

但我自己遇到的問題比較常是不好明確定義服務正常運行,或是服務正常運作跟目標是否可以執行間存在落差

參考資訊

  1. mikeifomin/wait_for_http.yml
  2. How to check for a certain Status Code (4xx) in Ansible?