GPU processing: Difference between revisions

Revision as of 12:04, 18 October 2007

For recording how we get on with the experimental NVIDIA cards

Hardware

The Quadro FX 5600 fits nicely into the WoC midi tower cases (after removing the hard drive bay and relocating the hard drive in the floppy bay, where it's somewhat less than securely mounted so treat the hardware gently). We replaced the PSU with an Amtec 560W one to have enough power for the card.

Software install

http://developer.nvidia.com/object/cuda.html

The CUDA toolkit I installed in my home directory, and then copied to the NFS server (the installer script does not do any configuration so this should be safe). There's a 'cuda' module to set the required environment variables (and a few others that I think will be useful)

The NVIDIA driver itself requires a bit more decision-making. The NVIDIA site says to install it via Yast, by adding NVIDIA as a package install source. This is against our policy of never mixing third party RPMs with the SuSE ones. So I installed the kernel source package

zypper install kernel-source

and then ran the by hand driver installation. The -e flag is great as it shows you exactly what's going on. Unfortunately it seems you need a version of the driver which EXACTLY matches the CUDA. Mine's a few minor releases newer. All seems to be working, but the logs are full of whining from the kernel. I'll sort that out in a bit.

The install backed up and replaced a few libraries and things. This is a worry as any future SuSE update will trash this. I need a new version of the image without those SuSE libs to use on these machines. The kernel module got dropped into the running kernel's lib/modules directory tree, unsurprisingly. I am inclined to live with the need to update this by hand. The auto-patching script doesn't do kernels on 10.2 so it will always need a computer officer's intervention to do a new kernel anyway.

The SDK is currently in /usr/local/shared/suse-10.2/x86_64/cuda/sdk and requires the 'cuda' module to be loaded.

Device special problems

Not sure if the nvidia.ko kernel module will actually get loaded on boot; for the moment I've modprobed it. Does it even need loading on boot? Apparently yes. Although it looks to me as if SuSE's modprobe.conf actually has the right hooks in it anyway. But the reason that wasn't working turned out to be the lack of device specials.

Unsurprisingly I had to do nasty things to permissions in /dev to get the stuff to work: /dev/nvidiactl and /dev/nvidia0 need to be read-write by the user. Also, weirdly, the device specials weren't there to start with, even though I had loaded the driver. I ran sax and they were made. Will this stick? Nope. You have to start X to get them created after a reboot, and then chmod them! Surely it must be possible to arrange for them to be handled by udev in a sensible way? And possibly PAM. Apparently not. This is because udev relies on sysfs to tell it what devices to create. Non GPLed drivers may not use sysfs, so the nvidia drivers can't do this. The Nvidia X driver creates the device specials by hand on startup! Well I don't want to have to start X and stop it every time we reboot, so I need a better method. It seems that if you create the nodes you want under /lib/udev/devices, udev will make the corresponding things in /dev at boot.

For future reference:

crw-rw-rw- 1 root video 195, 255 2007-10-17 16:56 /dev/nvidiactl

crw-rw-rw- 1 root video 195, 0 2007-10-17 16:56 /dev/nvidia0

Tests

NVIDIA supply a set of test/example programs. Catherine has downloaded them all, so I just copied the cen1001:~/NVIDIA_CUDA_SDK directory to my home directory. For various reasons, the libglut library isn't installed, but I don't need this (nor anticipate it being of much use to others, unless they want to use openGL rather than CUDA(!)). It is used in some of the example codes, so make in NVIDIA_CUDA_SDK dies as soon as it gets to the first such code. I got round this by compiling the other codes one at a time. The cuda module needs to be loaded first. A useful command is:

  ~/NVIDIA_CUDA_SDK/projects> for dir in *; do grep lglut $dir/Makefile 2>&1 >/dev/null ; if [ "$?" -ne "0" ]; then cd $dir; \
make; cd ..; fi;  done

The executables are placed in NVIDIA_CUDA_SDK/bin/linux/release/. All the non-glut tests pass.

--james 13:04, 18 October 2007 (BST)

GPU processing: Difference between revisions

Revision as of 12:04, 18 October 2007

Contents

Hardware

Software install

Device special problems

Tests

Navigation menu

GPU processing: Difference between revisions

Revision as of 12:04, 18 October 2007

Hardware

Software install

Device special problems

Tests

Navigation menu

Search