Skip to content
Snippets Groups Projects
Commit 84538180 authored by Kerri Wait's avatar Kerri Wait
Browse files

Fix deployment of telegraf for monitoring mlx hw counters

parent 46c640ee
No related branches found
No related tags found
5 merge requests!399Capture extra NFS stats,!393Hotfix: monitor NFS GETATTR stats via telegraf,!392Temporarily disable inputs.filecount in telegraf,!389Update telegraf config to ignore more ethX interfaces in ethtool plugin,!388Fix the telegraf config for mlx hw counters to get rid of errors in logs
......@@ -71,6 +71,20 @@
tags:
- configuration
- name: Install multifile plugin for mlx hw_counters
template:
src: inputs.multifile_mlx.conf.j2
dest: /etc/telegraf/telegraf.d/inputs.multifile_mlx.conf
owner: telegraf
group: telegraf
mode: '640'
notify:
- "restart telegraf"
become: true
become_user: root
tags:
- configuration
- name: Install nvidia-smi plugin
template:
src: inputs.nvidia_smi.conf.j2
......@@ -84,4 +98,5 @@
become_user: root
tags:
- configuration
- gpu
\ No newline at end of file
- gpu
when: "'VisNodes' in group_names"
\ No newline at end of file
# Read mlx hardware counters
{% if hwcounterlist %}
{% for interface in hwcounterlist %}
[[inputs.multifile]]
name_override = 'infiniband'
base_dir = '/sys/class/infiniband'
interval = '60s'
[[inputs.multifile.tags]]
device = '{{ interface }}'
port = '1'
type = 'hw_counters'
{% for counter in hwcounterlist[interface] | sort %}
[[inputs.multifile.file]]
file = '{{ interface }}/ports/1/hw_counters/{{ counter }}'
conversion = 'int'
{% endfor %}
{% endfor %}
{% endif %}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment