A couple of months ago I received a shiny new Sun SunFire T2000. It is a monster 1 CPU with 8 core, each capable of running 4 threads each (that is 32 concurrent threads) 8G of RAM and 2x73.4G Seagate SAS HDDs. The 2U case hides the power hidden away inside. Once powered up it sounds like a jet engine, but that is ok it is designed for the data center not a HTPC.
I obtained the box under the Sun Try n Buy Program for testing ubuntu 6.06LTS (aka dapper drake) and some PHP based web apps. I also wanted to play with Solaris and some other OSes on the box. I was also interested in Solaris Brands. I wanted to take Jonathan Schwartz up on his offer of running ubuntu on the box and getting to keep it. As I consider myself a Linux system admin of medium level competence I thought it should be easy enough. How wrong I was.
The first couple of times I tried to install dapper on the server I used a CD. I used both the 6.06LTS and 6.06.1LTS update CD and neither worked. It turns out there was a bug in the iso9660 support which shipped on these CD images. As of the time of writing no new official CD images have been released with the problem fixed, although the nightly build CD have the fix included.
After some research I discovered that "netboot"ing was the preferred way to install ubuntu on these boxes. Again it seemed relatively straight forward, setup rarpd and tftpd, grab the image and away we go. Unfortunately this wasn't the case. After running ethereal (now known as wireshark) on the debian server, I discovered that the T2000s experts to pull the boot image via tftp using the broadcast address (255.255.255.255). I later found out that both tftp and tftp-hpa which ship with edubuntu 6.06LTS and Debian 3.1 (aka sarge) don't like requests being made this way. I tracked down the author of tftp-hpa, H Peter Anvin, and discussed the behaviour I was experiencing. He pointed me to a newer release of tftp-hpa which contains a fix for problem. Peter considers the way the T2000s (and other Sun servers) handle tftp boot to be a bug in Sun's firmware and was rather unhappy about Sun's tftp client implementation. Peter stated "I still think Sun needs to be kicked in the ding-ding for not doing DHCP (or at least BOOTP, it's only a 20-year-old standard) and valid TFTP" [IRC on #syslinux on OFTC discussion 21-Oct-2006 14:17 AEST].
After removing the stock Debian tftp-hpa deb on my sarge box, I downloaded the tftp-hpa 0.43 onto my sarge box and complied it and installed it using check-install. This was a painless process.
I thought I could see the light at the end of the tunnel. I had RARP and tftp working, the server was getting an IP address, requesting and receiving the dapper boot image. I later realised that the light was actually an oncoming freight train and the T2000, the duck and myself were all heading for a train wreck.
I tried following the official ubuntu on sparc instructions. I found that they were rather light on. The documentation seemed to be written for users with no Linux experience but some SPARC experience. I am well aware that this is community generated documentation and so I should be grateful someone has put something together. I plan to help improve the page a little when I have more time. I have already added a note about the rev2 and dapper kernel issues.
I did manage to get the installer running pretty easily. It is really no different to a normal kvm based install on i386/amd64 based servers, except there are no virtual consoles. This might sound like a minor thing, but in practice it can lead to a lot of frustration. Over the years I have installed various versions of Linux on version machines. From time to time the installer decides it is taking its bat and ball and going home, with virtual consoles this isn't a problem, [ctrl]-[alt]-[f2] (or what ever) and you get the install log or a shell so you can start poking around to see what is (or isn't) going on. On the T2000 you have to watch the HDD lights or try ESP to see if it is still alive. My first few (3+) attempts I assumed that the installer had crashed during formatting the partitions. I assumed this as the screen would just be blue and the drive lights suggested very little disk activity. I now know that this assumption was wrong.
I had tried several times over the period of a month or so to try and get the install done. I had tried asking my good friend Google for help on getting it all working. I was not getting very far with it. By now Sun was starting to ask for their baby back.
One night I decided I was going to install dapper on the server at any cost. I was prepared. I had a stack of tabs open in firefox with the relevant documentation up. I updated to the latest firmware (again). I had connected to the server. I had cold beer in the fridge. I got very comfortable in the chair. First attempt I tried to partition the disk the way I wanted it, this seemed to fail after creating the /boot partition. I looked into the partitioning more and discovered that the partition table spills into the first 512Kb of the drive, and so you need to keep the first 512Kb (1Mb recommended) of the drive unused. The next attempt I tried again partitioning the drive the way I wanted it with 1Mb (8.2Mb was actually used) free at the start of the disk. I crossed my fingers and went to watch some tv. 20mins later I came back and found the lovely blue screen back and no real signs of life. This time I decided to try with 1MB (8.2Mb) free at the start of the disk and let ubuntu decide how to deal with the rest of the drive. swap and /boot both seemed to be ok about being formatted with the default EXT3 filesystem. Then as usual the screen went blue and everything seemed to have stopped. I took a few deep breathes, started abusing the box and Sun. I did some more poking around and couldn't find any more information.
It was getting late, but I decided no piece of scrap metal was going to beat me. This time I grabbed a new install image, just in case that was the problem. Again I started the install process. Again I let ubuntu decide how to handle things after the first 8.2Mb. I did a few other things while the installer was running, flicking back every minute or 2 to see what was going on. This time was the same as the previous attempts - it looked to me like it had failed. I tossed up between having a beer then going to bed or watching paint dry for the rest of the night, as I didn't have any paint, the beer and bed won. I was too annoyed with the T2000 to shut it down that evening.
The next morning I awoke to an ubuntu installer still running, very slowly but still running. It was wanting me to tell it about which driver to use for Xorg. I didn't care as the box had no video card in it. I decided to go with fbdev. The installer continued to run, albeit slower than I remember RH6 installing on my 486 many years ago. I let it go. It asked a couple more questions about X config along the way, which I just left at the default values. I noticed that the 2 drive lights were always one, except the drive where I was installing dapper, would flicker off for a split second every 3 to 5 seconds. I had read some stuff about slow i/o on these boxes, and assumed that maybe it was meant to be like this. I patiently waited, and waited and waited. Finally after 24hours of waiting, I had managed to install ubuntu 6.06.1LTS on my SunFire T2000. I danced, I was happy - really happy. Then I thought to myself, they can't really expect people to wait this long for an install to work.
As I am a sucker for punishment, I grabbed a new image, checked my notes and started installing dapper onto the other drive too. This time I kept on checking the installer. It had taken about 5 hours and 30 mins to fomrat a 70G EXT3 partition. All up it took around 24 hours to install on the second drive.
I finally decided that it wasn't me, it seemed like it was something hardware related. I logged a ticket with Sun. Then I decided to start digging for answers. Eventually I discovered that the T2000 rev2 uses a different SAS drive controller which isn't supported by the ubuntu 6.06LTS kernel. Fixes are available in newer kernels, but the ubuntu server team have indicated that they will not be backporting the fixes to 6.06LTS and that users should upgrade to 6.10 if they wish to run ubuntu on a SunFire T2000.
To check this was correct I tried installing the 6.10 on the server. I was shocked. It flew. Less than 15seconds to EXT3 format 70G all done in less than 2 hours.
After all this where does it leave people? As I see it you have 3 options if you want to run ubuntu linux on a SunFire T2000 rev2 box. The first is to install 6.06LTS and have it run slow, but this is a huge waste of of money, you would be better off buying a cheap second hand PII from somewhere, so this option isn't very practical. Option 2 is to run the latest and greatest version on it, 6.10 (aka edgy), there are some major downsides with this option, most notably the lack of certification and support is only available for 18 months instead of 5 years. The 3rd option is to wait a while and while you wait, encourage ubuntu, Sun and Canonical (the company that provided commercial backing to ubuntu) to work together to resolve this issue. As it stands at the moment all 3 players have made a big deal about ubuntu on Nigara and so all 3 players stand to face a customer back lash. Bad PR isn't good for any one.
Update: [13-Apr-2007 23:00] I have just got off the phone to Barton George, Group Manager, GNU/Linux Strategy and Product Management at Sun. The phone call follows on from an email exchange that started earlier this week. It seems that Sun and Canonical both want the problem fixed, they just have to work out how best to do it. So they are meeting today (US time) to try and come up with a plan to resolve the issues.
Although not mentioned on their website, as a work around Sun recommends using Ubuntu 6.10 (aka Edgy Eft).
I am awaiting a response from Sun about my request to be able to retest and submit an entry in the CoolThreads Performance Contest.
I will post any more info as I get it.
Disclosure: Sun is sending me a t-shirt, no string attached.
Like the builder with the worst looking house in the street, I was the guy who did web based work with no website. Now that has all changed - I finally have a website and a blog.
The website is a show case of the free/open source products and services offered by Dave Hall Consulting.
The blog will be a place for people to see what I am working on, playing with, discuss current issues and also provide a space for me to brain dump.
I hope you enjoy the site.