Not very long ago, I had a rather frightening experience that made me reconsider my testing practices of the increasingly popular Kernel Modesetting (KMS) support for various ATI Radeon chipsets on Linux. While I couldn’t determine exactly what happened back then, I’ve now got another similar story of KMS-related bugs that can cause permanent damage to your hardware.
My Wesnoth-UMC-Dev collaborator and personal friend of mine, Espreon, owns a Dell Inspiron e1705 laptop which ships with an ATI Mobility Radeon x1400 graphics controller. This is in contrast to my HP Pavilion dv5-1132la notebook (bluecore) which has an ATI Radeon HD 3200 (RS780-based) controller.
Espreon’s laptop is now damaged and unusable after some minor testing of KMS + Gallium3D drivers. The screen simply doesn’t work anymore.
I feel the need to carefully and meticulously analyze our stories since the KMS-enabled Radeon drivers are slowly becoming a standard amongst X.org-based Unix distributions including Debian GNU/Linux — Squeeze (6.0) is going to ship with a configuration apt for running on Radeon controllers in KMS operation without any user intervention. This is not to be unexpected since the KMS stack is clearly superior in terms of security and stability to the Xfree86/X.org based device drivers since it doesn’t require such things like making the X server’s executable setuid root, and allowing direct access to the host’s memory, video BIOS, etc. from a userland application.
But, is it really worth the risk? Is KMS really well-tested and safe enough to feature in stable mainline Linux kernels and in major general-purpose system distributions such as Debian? Let’s take a look at our personal experiences with the new graphics subsystem and drivers which are due to become mainstream around the end of this year.
11:15 PM.
Here we go, with a current version of mesa's source code from the git repository. Everything looking the same again. No performance improvements or extra losses. Again, I use eduke32 as a stress testing suite for lack of something better.
So...wait...WHOAH...THE COLORS MAN, THE COLORS!! IT'S SO BEAUTIFUL...IT'S...IT'S...
vsmlbhmnermernqvatmguvfmguramlbhmunirmabmyvsr*
*unexpected end of record*
In my case, I was trying out the KMS support provided by Linux 2.6.34 and the current Mesa (classic DRI r600) and Radeon DDX from their respective Git repositories back then, and the target application was the Eduke32 port of Duke Nukem 3D, using an old version of the High Resolution Pack — which means, detailed large textures, many models with motion, uncapped framerate. Espreon’s situation was different — he was just trying the final boss level of Frogatto, which is a game with simpler 2D rendering and capped framerate around 50 frames per second, albeit still a mildly exigent OpenGL client. Also, he was trying the newer Mesa Gallium3D drivers for ATI R3xx-based chipsets (r300g) since they are getting new (existing) features at a steady rate, faster than r600g which would apply to my GPU.
Both cases had as immediate symptom X.org seemingly locking up. In my case, the kernel also locked up and I could not switch to a text terminal, nor use any magic SysRq sequences to reboot the machine. However, Espreon’s case differs in that his kernel continued running normally and he could switch to a text terminal, in which he noticed vertical lines and screwed up colors.
Now, as soon as I figured that there was something really wrong happening to my laptop’s GPU or screen, I closed its lid, unplugged the AC adapter and removed the battery as quickly as I could. Espreon used sudo reboot instead, thus keeping the laptop running despite this inconsistent state.
After cutting off all power to my laptop, I waited some seconds and plugged the adapter and inserted the battery again, and turned it on, to watch the BIOS splash and GNU GRUB text-mode menu blinking like hell. Fearing that I might have killed the GPU/screen, and feeling the metaphorical bucket of water on my head, I decided to let bluecore cool down for a few minutes. Then I turned it on and everything was working normally again. In Espreon’s case, after completing the warm reboot from Linux, he could see an abnormally grainy BIOS splash screen which led him to turn off the laptop. Ever since then, whenever he turns it on, although the boot process appears to carry on as usual, neither the screen’s backlight or the LCD panel itself would work again.
I don’t know much about how the graphics controller and the related software work, or how they interact with a laptop’s display panel, but I suspect that a rare bug in the KMS code — most likely at the kernel side — could be forcing execution to jump to a bad location in such way it doesn’t immediately crash, and overriding the GPU state in some dangerous way that sets the display parameters off the output device’s limits.
***
Now Espreon needs to either get his laptop repaired or buy a new computer. As for me, bluecore seems to be doing fine after that little accident. Naturally, now that I know these catastrophic bugs can be triggered by more than just stress-testing like I did, I’m not considering giving KMS much more testing again, at least not until the kernel version 2.6.40 or so. This does not mean I’ll stop tracking mainline kernels, because there’s always improvements and additions in other areas outside of the graphics stack — this in particular is pretty motivating even for RC kernel testing.
So, this is an inconspicuous issue that might be worth considering, especially for system distributors that are transitioning to KMS drivers for their newer releases. It’s a pity that we don’t have methods for determining a failsafe way to reproduce these bugs, at least not without investing enormous amounts of money. Maybe HP and AMD could team up and donate AMD ATI-powered laptops to the kernel DRM developers? 
Random netizens may have stumbled upon my blog and read what I have had to say about the free Radeon KMS drivers on Linux at different points of their development, or about Linux support for the HP Pavilion dv5-1132la. I’ve got to admit, ashamed, that I’v
Despite Espreon’s rather unfortunate accident with the (back then still experimental) ATI R300 Gallium3D driver from Mesa, I have been wanting to give the R600 driver a try since a while, especially after consulting on the #radeon IRC channel about its fe