adding a script to catch a generic nvidia-smi error
this should catch the issue on the m3a nodes and similar in the future
tested on m3p006 ( good case ) and m3a101 ( bad case )
Downtime: Wednesday, April 8, 9:00 AM-12:00 PM. To upgrade GitLab version 18.9.3
this should catch the issue on the m3a nodes and similar in the future
tested on m3p006 ( good case ) and m3a101 ( bad case )