首先要想一下,需要什么日志?安装日志?显卡本身运行日志?还是运行性能数据日志?
显卡目前来看主要两个用途:图形显示(如Xorg,显示器) 和 计算(如矩阵运算,深度学习)。当然,本质都是计算。
(一)安装日志
其实这个不用操心,在安装完驱动之后的页面最下面就有显示,对应着路径去找就好啦。
如果,很久之后还想看,可以去这里找
- /var/log/nvidia-installer.log
- /var/log/nvidia-uninstall.log
(二)NVIDIA X driver 的log
一般日志会直接显示在屏幕上的,这时候如果想同时保存日志到文件,那么启动X的时候,就需要用如下的命令
- startx -- -verbose 5 -logverbose 5
(三)NVIDIA运行性能数据
一般情况用nvidia-smi足矣,
如果想更全面的信息,请加参数 -a
- $ nvidia-smi -a
效果如下:
如果想实时监控,加参数 -l
- $nvidia-smi -l
还有其他参数,大家请help查看。
(四)显卡运行日志文件
这个作为(三)的延伸吧,根据官方介绍,NVIDIA默认是没有这个东西的,需要用户根据自己的需求去配置并生成:
- ##Query the VBIOS version of each device:
- $ nvidia-smi --query-gpu=gpu_name,gpu_bus_id,vbios_version --format=csv
- name, pci.bus_id, vbios_version
- GRID K2, 0000:87:00.0, 80.04.D4.00.07
- GRID K2, 0000:88:00.0, 80.04.D4.00.08
-
- #Query GPU metrics for host-side logging
- #This query is good for monitoring the hypervisor-side GPU metrics. This query will work for both ESXi and XenServer
- $ nvidia-smi --query-gpu=timestamp,name,pci.bus_id,driver_version,pstate,pcie.link.gen.max,pcie.link.gen.current,temperature.gpu,utilization.gpu,utilization.memory,memory.total,memory.free,memory.used --format=csv -l 5
列出一些选项
更多的选项,请通过如下命令查询。
- $nvidia-smi --help-query-gpu
通过寻找自己关心的数据,编辑好命令,写成脚本,利用crontabs 技术定时触发,搜集日志即可。
- Long-term logging
- Create a shell script to automate the creation of the log file with timestamp data added to the filename and query parameters
- Add a custom cron job to /var/spool/cron/crontabs to call the script at the intervals required.
-
- ADDITIONAL LOW LEVEL COMMANDS USED FOR CLOCKS AND POWER
-
- Enable Persistence Mode
- Any settings below for clocks and power get reset between program runs unless you enable persistence mode (PM) for the driver.
- Also note that the nvidia-smi command runs much faster if PM mode is enabled.
- nvidia-smi -pm 1 — Make clock, power and other settings persist across program runs / driver invocations
- Clocks
- Command
- Detail
- nvidia-smi -ac <MEM clock, Graphics clock> View clocks supported
- nvidia-smi –q –d SUPPORTED_CLOCKS Set one of supported clocks
- nvidia-smi -q –d CLOCK View current clock
- nvidia-smi --auto-boost-default=ENABLED -i 0
- Enable boosting GPU clocks (K80 and later)
- nvidia-smi --rac Reset clocks back to base
- Power
- nvidia-smi –pl N
- Set power cap (maximum wattage the GPU will use)
- nvidia-smi -pm 1 Enable persistence mode
- nvidia-smi stats -i <device#> -d pwrDraw
- Command that provides continuous monitoring of detail stats such as power
- nvidia-smi --query-gpu=index,timestamp,power.draw,clocks.sm,clocks.mem,clocks.gr --format=csv -l 1
- Continuously provide time stamped power and clock
-
(五)DKMS
再补充个DKMS的日志
DKMS全称是Dynamic Kernel Module Support,它可以帮我们维护内核外的这些驱动程序,在内核版本变动之后可以自动重新生成新的模块。
- /var/lib/dkms/nvidia/384.81/4.4.0-87-generic/x86_64/log/
(六)OK