The minimum ram requirement for the climate model was 2 gig, which is what came with the computer. However, I decided to get 8 more gig for the system. At first, I left the original 2 gig in the machine (arranged according to the docs). However, this caused an enormous performance hit: the system was 12% slower running the model. Dropping the original 2 gig brought the speed back up, more or less, to my original performance figures. I'm still not sure that this was a memory arrangement problem or a mixed-card-size problem…
After attempting a slightly longer R15 run on the model, the performance stayed about the same. It runs at approximately 72 model years per day. I was hoping for more, but that's quite respectable.
Currently, I'm running my first T42 simulation. I was guessing I would get 7 years per day for this run. I was using a fairly linear CPU comparison with Gondwana, my older cluster, which uses 16 2 ghz Opteron processors, comes in just under 10 years per day.
Well, initial performance results say that my calculations were way too low. The model produces a month of data every 7-8 minutes so far… that translates to a year-per-day range of 15-17. This result was far better than I anticipated!
We'll see if these results hold…
After a long struggle trying to work with various compilers, today, I decided to try the Portland C/Fortran compilers. I've had success with them before. Fortunately for me, it worked! I actually got the model running at a low resolution! Tomorrow, I'll try a high resolution.
As of tonight, I've only run a single year at the low resolution. I have 8 processors, and only 2 gigs of ram. It took about 20 minutes to run the single year. Translated out to a full wallclock day, that's about 72 model years. I have to admit I'm a little disappointed in the speed. However, I am a little thin on the RAM, with only 2 gigabytes, which probably slows things down. CPU usage didn't seem to peek over 50%, so more ram may translate to more speed.
Once I looked at some comparisons with other system, I was a little happier. Take a look at this figure modified from that FOAM homepage:
Rodinia is my machine.
The 72 years-per-day speed, if it holds, compares very well with existing, high powered clusters. Of these systems, I've had some experience with JAZZ running a higher resolution model. Given that Rodinia's performance is twice as fast as JAZZ, I feel the higher resolution models will perform quite well on this system. It's also important to note that these other systems listed are not newer clusters, although many are still in operation. On the other hand, this system has a relatively low barrier of entry, it can double as a desktop mac when not in operation, and most importantly, no competition for cpu time.
I still have more configuration work to sort out. Getting the model to run via PBS on a user's account is the priority.
More details soon!
The climate model I'm planning to run is compatible with the g95 compiler, not gfortran. So, first thing was to download and install the g95 binaries.
The next step is to download and compile the NetCDF libraries from NCAR. Before configuring, I had to export the FC environment variable to g95 to get it to compile correctly with g95. Running make check also passed.
Other tools (at least for me) that are needed, but won't describe is svn, git, and nco.
Yesterday, after bringing the machine to my office, I ran into the first snag. It wouldn't boot! It would boot to a “blue screen” and die. So, I headed back to the apple store and they tested the machine. Annoyingly, it worked fine for them. It turns out that I was using a cheap 3rd party flatscreen with a DVI to VGA converter. Sadly, this wasn't good enough for this machine.
Luckily I had an older Apple display that used the old ADC connector. For $99, I picked up a adapter and now the machine runs just fine. An annoyance, but at least the machine wasn't DOA.
ROCKS is an NSF (National Science Foundation) supported project to develop and maintain cluster environments so they provide cluster installs for a wide variety of platforms. It contains the usual things, like PBS, ganglia, and many others.
To install, the first thing was to add a new SATA drive to the machine. I had a 250 gig lying around so I added it to bay 2 on the mac.
Installing rocks was easy. The procedure was relatively straightforward. One problem that turned out to be a non-issue is that Rocks said it would install itself on the first drive (I thought that would overwrite the macos on the first drive, and leave the drive in bay 2 blank). As it happened, it installed on the blank disk. All is well.
On reboot, it appears all the critical elements of the system was running: ganglia, PBS, etc.
I've posted unboxing photos at my personal website
Computer clusters can be expensive to buy and expensive to maintain. My current linux cluster has 18 CPUs, and takes up about 13U of rack space in a colocation facility, and uses 17-18 amps of power.
The current line of Apple Macintosh computers, on the other hand, have up to 8 cores at 3.2 ghz per machine. While even at this configuration, it technically is not as powerful as my large cluster, it is far cheaper to buy, costs much less to run, and can double as a powerful desktop. It also would have the added benefit that it would not have to use ethernet to communicate between processors (a major bottleneck).
My back of the envelope calculations on performance is impressing. For the model I'm using, my cluster yields just under 10 model years per day of running. For this machine, based on CPU speed alone, it should produce 6-7 model years per day. This calculation could over/under estimate the performance in key respects. First, the cluster uses opteron processors, and the mac is Xeon processors. In theory, the Xeons should be a little slower at the same clock speed. Second, since the mac does not have to use the ethernet to communicate, the system could be much faster since ethernet tends to be a bottleneck on performance. Time will tell, assuming the model runs on this platform.
I'll be using this blog to document the setup and performance of this machine as the system is put together.