GPU processing
For recording how we get on with the experimental NVIDIA cards
Hardware
The Quadro FX 5600 fits nicely into the WoC midi tower cases (after removing the hard drive bay and relocating the hard drive in the floppy bay, where it's somewhat less than securely mounted so treat the hardware gently). We replaced the PSU with an Amtec 560W one to have enough power for the card.
Software install
http://developer.nvidia.com/object/cuda.html
The CUDA toolkit I installed in my home directory, and then copied to the NFS server (the installer script does not do any configuration so this should be safe). There's a 'cuda' module to set the required environment variables (and a few others that I think will be useful)
The NVIDIA driver itself requires a bit more decision-making. The NVIDIA site says to install it via Yast, by adding NVIDIA as a package install source. This is against our policy of never mixing third party RPMs with the SuSE ones. So I installed the kernel source package
zypper install kernel-source
and then ran the by hand driver installation. The -e flag is great as it shows you exactly what's going on. Unfortunately it seems you need a version of the driver which EXACTLY matches the CUDA. Mine's a few minor releases newer. All seems to be working, but the logs are full of whining from the kernel. I'll sort that out in a bit.
The install backed up and replaced a few libraries and things. This is a worry as any future SuSE update will trash this. I need a new version of the image without those SuSE libs to use on these machines. The kernel module got dropped into the running kernel's lib/modules directory tree, unsurprisingly. I am inclined to live with the need to update this by hand. The auto-patching script doesn't do kernels on 10.2 so it will always need a computer officer's intervention to do a new kernel anyway.
The SDK is currently in /usr/local/shared/suse-10.2/x86_64/cuda/sdk and requires the 'cuda' module to be loaded.
Device special problems
Not sure if the nvidia.ko kernel module will actually get loaded on boot; for the moment I've modprobed it. Does it even need loading on boot? Apparently yes. But that's not all...
Unsurprisingly I had to do nasty things to permissions in /dev to get the stuff to work: /dev/nvidiactl and /dev/nvidia0 need to be read-write by the user. Also, weirdly, the device specials weren't there to start with, even though I had loaded the driver. I ran sax and they were made. Will this stick? Nope. You have to start X to get them created after a reboot, and then chmod them! Surely it must be possible to arrange for them to be handled by udev in a sensible way? And possibly PAM. Apparently not. This is because udev relies on sysfs to tell it what devices to create. Non GPLed drivers may not use sysfs, so the nvidia drivers can't do this. The Nvidia X driver creates the device specials by hand on startup! Well I don't want to have to start X and stop it every time we reboot, so I need a better method. It seems that if you create the nodes you want under /lib/udev/devices, udev will make the corresponding things in /dev at boot.
For future reference:
crw-rw-rw- 1 root video 195, 255 2007-10-17 16:56 /dev/nvidiactl
crw-rw-rw- 1 root video 195, 0 2007-10-17 16:56 /dev/nvidia0