The last post (minus the hard disk driver crash during preparation for S3 issue) turns out to be a rather embarrassing case of not doing the research, so I deserve a punch from a drm/radeon dev for that. 
What I should've done if I were connected to the Internet at the time I decided to start to experiment with KMS again, is reading the instructions from the X.org wiki for properly setting up the drivers with KMS support. It wasn't as trivial as I had expected because of the missing firmware blobs I mentioned the last time.
The problem with firmware
The required firmware blob, R600_rlc.bin, is not completely freely redistributable (more specifically, not modifiable), and isn't included in the vanilla kernel from kernel.org; instead, it must be acquired from the firmware repository.
git clone git://git.kernel.org/pub/scm/linux/kernel/git/dwmw2/linux-firmware.git
Again, this X.org wiki page does a better job of explaining the rest than I could ever hope to.
After sorting out this issue with help from the guys at #radeon, freenode, I produced yet another kernel from my recipe (2.6.33-bluecore251-kms-4), recompiled the radeon DDX with KMS support again and rebooted to the new kernel.
Performance on a RS780
I've read varying results from people trying KMS with ATI R6xx/R7xx chipsets; some achieve just as good performance with KMS compared to UMS and some get slower rendering (maybe they also missed the firmware bit?
). In my case, I could notice a mix of both.
EXA and Xv acceleration work about as good as with UMS, bar some border cases such as Firefox 3.5's rendering and scrolling of this very page. Mesa is a different case, and seems to greatly depend on the clients. For example, kwin compositing in OpenGL mode is significantly slower than with UMS. While a build of eduke32 I have can achieve around 100 FPS with high-res textures and models enabled under UMS, it can barely manage around 40 FPS with KMS and OpenGL compositing from kwin, but it gets somewhat better (around 60 FPS) after disabling compositing.
Frogatto manages the maximum of 50 FPS with compositing enabled most of the time, but it occasionally drops performance for less than a second without the FPS counter changing — but as with eduke32, it feels mostly smooth again after disabling compositing.
XRender compositing with kwin isn't exactly better with KMS, and feels substantially slower than OpenGL compositing, just like with UMS. Still, compositing in general seems not to be good for this configuration with KMS.
Despite the slowdown, another consequence of using KMS here appears to be reduced CPU usage from the X server when its clients are relatively idle — CPU usage keeps between 1 and 5% with or without compositing when using KMS, whereas with UMS that can greatly vary between 2% and 30%.
S3 issues?
Apparently, the delay after resuming the laptop from suspend-to-RAM mentioned in the former post was only originated by the driver's requests for missing firmware whenever it needs to reinitialize itself. There's no such delay after installing and linking those blobs and S3 works just fine.
Overlapping windows and OpenGL
One of the things fixed by KMS this time is the impossibility to overlap or minimize Mesa clients without getting an annoying “ghost” flickering on the screen where the window was supposed to be, when kwin composition is enabled in XRender or OpenGL mode.
Production?
While KMS for R6xx/R7xx appears to be very stable at the moment, its performance doesn't seem to be on par with UMS in some configurations yet, although for all I know the bottleneck could be in libdrm or mesa, the DDX, or even kwin, rather than the kernel driver. This was a very nice experiment, and I could be using this driver for production right now if it wasn't for the missing Tux-on-Ice patch and the small degradation of DRI performance. But I think it's really nice to see where this is going (minus the firmware issues though!).
So, back to UMS and my 2.6.32.9-bluecore244-suspend2 kernel for now!