Here is the reply he sent me. It helps explain some of the odd results.
Louis,
I don't have a MacRumors account, so if you could pass this on:
Different tests will stress different areas of the System. Xbench is by no means even, or completely fair, but it does strive to measure some of the major performance bottlenecks that typical applications will experience.
I'll try to list the factors that effect each test:
First, and most importantly, ALL tests can be impacted by open applications, or having little memory available. You should be able to use top or ProcessViewer prior to running the benchmark to help you determine if any background processes/applications are using CPU time, or large amounts of memory. If you run the benchmark, and you hear the disk running during tests other than the disk test, you may be starting to page into VM, which will lower your scores considerably.
1. CPU Test (single processor only - a test of one application doing single-threaded work)
A) GCD Recursion - almost entirely limited by the processor's register speed, L1 cache, and integer math
B) Floating Point Basic - measures single precision floating point operations (+, -, *, /)
C) AltiVec Basic - measures single precision floating point operations implemented with AltiVec operations
D) Floating Point Library - measures double precision math library operations like sin, cos, sqrt, etc.
Unfortunately, in b2, Floating Point and AltiVec tests are somewhat flawed - they rely too heavily on memory bandwidth, and may make low memory machines VM page, producing extremely low results. This will be corrected soon. Their "FLOP" rating may never be directly comparable to theoretical FLOPs, or a real-world FLOPs produced by another benchmark.
2. Thread Test (tests multiple processors - a test of multiple applications, or a single app doing multithreading)
A) Computation - measures 4 worker threads performing integer operations, plus some memory bandwidth - always faster on MP machines
B) Memory Contention - measures 2 worker threads doing memory copies (same as Stream Copy) - usually faster on MP machines.
Notice that 2 threads won't achieve the same throughput as a single thread, which doesn't compete for memory bandwidth.
C) Lock Contention - measures 4 threads quickly acquiring and releasing thread locks - usually faster on MP machines
- locking performance may be important in certain types of multithreaded code.
The thread tests aren't representative of all types of threaded code, but they do measure some of the factors that effect threaded applications.
3. Memory Test (tests ability to perform memory operations)
A) System
a) Allocate - measures the system's ability to allocate many varying-sized blocks of data, using standard system calls
b) Fill - measures the system's ability to fill a large block with data, using standard system calls
c) Copy - measures the system's ability to copy data from one block to another, using standard system calls
B) Stream (derived from the standard STREAM benchmark -
http://www.cs.virginia.edu/stream/)
These use 64-bit doubles and altivec cache prefetching when appropriate
a) Copy - measures copying speed between 2 large buffers,
b) Scale - measures load-float multiply-store operations between 2 large buffers
c) Add - measures load-load-add-store operations between 3 large buffers
d) Triad - measures load-load-multiply-add-store operations between 3 large buffers (scale and add in one step)
These tests will do better with more memory bandwidth. I expect DDR Macs to perform better, as their memory subsystem is optimized for big sequential reads/writes. On a system with very little memory, these may perform slowly.
I've seen Fill and Copy perform oddly on different machines - it appears that the OS or hardware is optimizing differently in some cases.
4. Quartz Graphics Test
A) Line - measures drawing lines of varying widths, colors and rotations at 50% alpha
B) Rectangle - measures drawing rects of varying widths, colors and rotations at 50% alpha
B) Circle - measures drawing circles of varying diameter, colors and rotations at 50% alpha
C) Bezier - measures drawing beziers of varying widths, colors and rotations at 50% alpha [same basic bezier curve repeated]
D) Text - measures drawing characters of varying font sizes and rotations at 100% alpha
This test does better with a better graphics card, more memory bandwidth, more CPU, altivec, etc. It's up to the system how to optimize the drawing.
5. Disk Test
A) Sequential
These tests measure typical throughput to the drive.
a) Uncached Write - measures writing in 256k blocks until a 100MB file is filled
b) Uncached Read - measures reading in a 100MB file in 256K blocks
B) Random
These tests will be slightly impacted by the disks's seek time
a) Uncached Write - measures writing 256k blocks in random locations into a 100MB file
b) Uncached Read - measures reading 256k blocks at random locations in a 100MB file
Most disks have slower throughput as they become more full and/or fragmented, so an extremely full disk may perform worse than expected.
Anyway, hope this helps out. Let me know if I can clarify any points.
I hope to continue to improve the accuracy, and provide comparison capabilities, so as to make Xbench more useful.
- Ladd