Ansible 透過 Http Status Code 當做檢核條件

最近 Ansible 使用的機會較多,簡單紀錄一下平常可能遇到的情境與解決方式,一般情況下我都是透過單一個 playbook 來處理某個工作,如果需要多個步驟時再透過 task 來切分,但有時候會發現前一個動作雖然成功執行了,但就商業邏輯的角度來看還不能算是可以正常提供服務,所以接著執行後面動作時就會引發一連串錯誤,今天筆記就是為了避免這種狀況

情境模擬:打算執行某個 job 但 Jenkins 重啟後仍未完成 warm-up,造成後續動作也就會失敗

基本環境說明

  1. Azure 標準 B1ms (1 vcpu,2 GiB 記憶體)
  2. Centos 7.7
  3. jenkins 2.204.2
  4. ansible 2.7.8
  5. absible script

    • inventory.ini

      [jenkins]
      jenkins1 ansible_host=192.168.1.112 ip=192.168.1.112  ansible_user=yowko ansible_password=password ansible_become_password=password
      
    • install.yml

      ---
      - name: Trigger Build
      hosts: jenkins
      tasks:
      - name: "Trigger Build"
        shell: curl http://localhost:8080/job/Test/build?token=67c2b2b3
      

遇到問題

  • 錯誤訊息

    [WARNING]: Consider using the get_url or uri module rather than running
    'curl'.  If you need to use command because get_url or uri is insufficient you
    can add 'warn: false' to this command task or set 'command_warnings=False' in
    ansible.cfg to get rid of this message.
    
    fatal: [jenkins1]: FAILED! => {
        "changed": true, 
        "cmd": "curl -I http://localhost:8080/job/Test/build?token=67c2b2b3", 
        "delta": "0:00:00.066076", 
        "end": "2020-02-28 14:37:53.870970", 
        "invocation": {
            "module_args": {
                "_raw_params": "curl -I http://localhost:8080/job/Test/build?   token=67c2b2b3", 
                "_uses_shell": true, 
                "argv": null, 
                "chdir": null, 
                "creates": null, 
                "executable": null, 
                "removes": null, 
                "stdin": null, 
                "warn": true
            }
        }, 
        "msg": "non-zero return code", 
        "rc": 7, 
        "start": "2020-02-28 14:37:53.804894", 
        "stderr": "  % Total    % Received % Xferd  Average Speed   Time    Time     Time   Current\n                                 Dload  Upload   Total   Spent    Left      Speed\n\r  0     0    0     0    0     0      0      0 --:--:-- --:--:--    --:--:--     0curl: (7) Failed connect to localhost:8080; 連線被拒絕", 
        "stderr_lines": [
            "  % Total    % Received % Xferd  Average Speed   Time    Time     Time     Current", 
            "                                 Dload  Upload   Total   Spent    Left  Speed", 
            "", 
            "  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--       0curl: (7) Failed connect to localhost:8080; 連線被拒絕"
        ], 
        "stdout": "", 
        "stdout_lines": []
    }
    
  • 錯誤截圖

    1error

使用語法

---
- name: Trigger Build
  hosts: jenkins
  tasks:
    - name: "Wait for Jenkins to Trigger Build"
      uri:
        url: "http://localhost:8080/job/Test/build?token=67c2b2b3"
        status_code: 201
      register: result
      until: result.status == 201
      retries: 60
      delay: 1

心得

上面的語法是精簡版,可以拆解為兩個步驟:

  1. 先檢查服務是否正確運作
  2. 再執行目標操作

但我自己遇到的問題比較常是不好明確定義服務正常運行,或是服務正常運作跟目標是否可以執行間存在落差

參考資訊

  1. mikeifomin/wait_for_http.yml
  2. How to check for a certain Status Code (4xx) in Ansible?