ansible restart improvements
Andreas H 12:32 PM is there a really good example of a node reboot somewhere in our ansible work ?
lancew:koala: 12:37 PM I'd like to see one too. If there is anything wrong with lustre or its modules the reboot fails and requires a hard reboot from openstack.
chines 3:23 PM @Andreas H... not really. The old pattern looks like this 94 - name: Restart host to remove nouveau module 95 shell: "sleep 2 && shutdown -r now &" 96 async: 1 97 poll: 1 98 become: true 99 ignore_errors: true 100 when: modules_result.stdout.find('nouveau') != -1 101 102 - name: Wait for host to reboot 103 local_action: wait_for host="{{ inventory_hostname }}" search_regex=OpenSSH port=22 delay=60 timeout=900 but I think the new pattern looks like 95 - name: restart machine 96 shell: "sleep 5; sudo shutdown -r now" 97 async: 2 98 poll: 1 99 ignore_errors: true 100 become: true 101 become_user: root 102 when: reboot_now 103 104 - name: waiting for server to come back 105 wait_for_connection: sleep=60 timeout=600 delay=60 106 when: reboot_now 3:24 the async directive allows it to disconnect before the process completes. wait_for_connection is better than local_action wait_for 3:24 but as lance mentions the restart (shutdown -r now) will fail under some conditions
lancew:koala: 3:45 PM Oh also forgot any hanging NFS file open/writes, which we have significant numbers of. I haven't been checking recently but I was tracking of the the order of hundreds per day in the home folders