We've been running an Ultra 10 (Solaris 2.6, 1x UltraSparcII 440MHz
CPU, 2x 9Gb EIDE, 512Mb RAM) as a development platform for an
application that runs with Oracle 8.1.7 - performance is acceptable.
The test environment for the app is a Sun Blade 1000 (Solaris 2.8, 1x
UltraSparcIII 750MHz, 2x 36Gb FC-AL, 1Gb RAM), with exactly the same
DB configuration, and loads as the dev platform.
The intended production environment for the app will be a Sun Blade
1000 (Solaris 2.8, 2x UltraSparcIII 750MHz, 2x 36Gb FC-AL, 1Gb RAM).
All things being equal, can we assume that the Sun Blade configuration
will outperform the Ultra 10 ... ?
... well here's a tale:
A fairly disk intensive SQL query (run via sqlplus) to remove
duplicate postcodes from a table on the Ultra 10 takes approximately 2
minutes to return its record set.
On the test environment on the Sun Blade, with the UFS filesystems and
database tablespaces layout exactly the same as the Ultra 10, the same
query against the same volume of data takes 7 minutes.
Using top, sar, iostat, vmstat and mpstat all point to a massive
amount of waiting for io on the Sun Blade - generally 95-100% wio. On
the Ultra 10, WIO is maybe 50% for the duration of the query.
Our sys adm's have applied every patch cluster under the sun [sic] for
Solaris 8 on the Sun Blade, even HDD Firmware patches, switched UFS
logging on & off, low level disk scanning with format, etc, etc, while
the DBA's have been through both initora config's with a fine
toothcomb, set a huge sort_area_size in oracle, sized the SGA, set
/etc/system shared memory and semaphore settings accordingly. Even the
64-bit version of oracle was installed on the Sun Blade.
All to no avail: the Ultra 10 still seems to cut through the query so
much quicker than the Blade. Any improvement gained on the Sun Blade,
when the same config is made to the Ultra 10, well that just gets
quicker too.
The problem doesn't appear to be restricted to Oracle DB access
though: The Sun Blade seems to have a 'general sluggishness' about it.
If I perform the following command on the ultra 10 in a ufs file
system:-
$ time tar xf lmbench-patch1.tar
real 0m1.451s
user 0m0.080s
sys 0m0.290s
which unpacks 360 files into a tree under a directory LMbench. Then
deleting:
$ time /bin/rm -rf LMbench/
real 0m0.548s
user 0m0.010s
sys 0m0.090s
Running the same on the test Sun Blade, again the same tar file onto a
ufs file system:-
$ time tar xf lmbench-2.0-patch1.tar
real 0m4.670s
user 0m0.020s
sys 0m0.170s
and removing the resultant directory:-
$ time /bin/rm -rf LMbench
real 0m4.179s
user 0m0.020s
sys 0m0.070s
[reason I use /bin/rm is because i have alias rm='rm -i' in my
profile]
I know these are fairly crude tests, but it does illustrate that there
are real differences in doing the same commands on the same files on
the different machines: In both cases the real time taken is a lot
longer on the Sun Blade than Ultra 10, while the system and user time
are less. WIO stats jump to >95% ... Why should this be? At the time
of testing there was no load on either machine.
Unpacking & removing in a tmpfs filesystem (/tmp) is a lot lot
quicker, obviously because tmpfs is memory-based. Apart from having
logging on ufs in 2.8 (which whether switched on or off seems to make
no difference in above tests) what are the major differences between
ufs on 2.6 and 2.8.
These results have been bounced through to Sun, through our 3rd party
support agency, who have apparrently set up an Ultra 10 and Sun Blade
and have confirmed that their setup the Blade is better, and that our
issue is obviously down to the configuration or load on the machine.
So anyway, we've now taken delivery of the first Sun Blade to be used
in production, unpacked it, switched it on, installed 2.8 from fresh,
applied patches and run the same tar & rm tests... same preliminary
results as the test Sun Blade. Oracle has not been installed yet, so I
havent run the sql queries to see what it's like.
The obvious difference in the ultra 10 and the blade 1000, is the OS
used, 2.6 vs 2.8. I guess what I'm asking is there anything we have
glaringly missed out of the config of the Sun Blade? Is there such a
massive difference in 2.6 and 2.8 to cause this, or is the Sun Blade's
performance normal, and we have an (ahem) exceptionally fast Ultra 10?
If anyone can come up with a plausible explanation for this, or at
least a lead to work on,
TIA
Rich
[And just to throw a spanner in the works, our apps have been ported
to run on a Linux box (kernel 2.4.5, dual PIII 933MHz, 2x 9Gb SCSI 15K
rpm, 1Gb Ram, Oracle 8.1.7) and the app and all the tests run way, way
faster that the Ultra 10 and Sun Blade. Of course, all the usual
corporate reasons prevent us from using linux...]