UltraSPARC and SPARCcompilers 4.0 Improve Scientific Performance

UltraSPARC and SPARCcompilers 4.0 Improve Scientific Performance

Post by David G. Hough at valid » Sat, 27 Jan 1996 04:00:00



                A B S T R A C T

Over many publicly available scientific applications in Fortran-77 and C,
UltraSPARC systems provide significant performance improvements over previous
Sun systems, due to synergistic interaction of UltraSPARC hardware and
SPARCcompilers 4.0.  Various SC4.0 optimization technologies provide these
typical performance improvements for UltraSPARC hardware, relative to the
previous SC3.0.1 compilers optimizing for SuperSPARC:

           Technology                             Added Performance

           General SPARC V8                                   8-14%
           UltraSPARC-specific, V8-compatible                 9-19%
           UltraSPARC-specific, V8-incompatible                1-9%

           All three combined                                31-45%

     All comparisons reveal interesting performance anomalies well outside the
"typical" ranges shown above, rendering challenging the choice of the best
hardware and compiler options for a particular problem.

     The ASCII report posted to USENET is a condensed version of the complete
report, for which "tbl | troff -ms" source is available from d...@validgh.com.

                R E P O R T

Scope of Report

     SPARCcompilers 4.0 provides considerable run-time performance advantages
over previous SPARCcompilers releases when compiling for UltraSPARC systems.

     The results reported here compare  run  times  for  executables  produced
under SunOS 5.5 with the following SPARC C and Fortran compilers:

       sc40    SPARCcompilers 4.0 C and Fortran
       sc301   SPARCcompilers 3.0.1 C and Fortran
       sc201   SPARCcompilers 2.0.1 C and Fortran
       s1-10   SPARCcompilers 1.0 C and Fortran compiled on SunOS 4.1.4

using various compilation options.

     Additional results compare run times with the same executables running on
UltraSPARC, HyperSPARC, and SuperSPARC-II.

     This report was performed at and for Sun Microsystems, Inc.

Compilers and Options

     Test programs were compiled the following ways

-g   Universal debugging option.

-O   Universal optimizing option.

-fastSun's macro for "fast" compilation and execution.

max  Maximum optimization for sequential execution on one processor.

     "max" options used were as follows. Note that for any particular program,
these  combinations may not provide the best possible run times, although they
are often better than those obtained with "-O" or "-fast".

Compilation       SPARC Compilation options

SC1.0 acc max      -O2  -cg89 -dalign  -libmil -Bstatic -u _fix_libc_
SC1.0 f77 max      -O4  -cg89 -dalign  -libmil -Bstatic -u _fix_libc_

SC2.0.1 cc max    -xO4 -xcg92 -dalign -xlibmil -Bstatic
                  -fsingle -xlibmieee
SC2.0.1 f77 max    -O4  -cg92 -dalign  -libmil -Bstatic

SC3.0.1 cc max    -xO4 -xcg92 -dalign -xlibmil -fsimple -Bstatic
                  -fsingle -xlibmieee -lsunmath -lmopt -lcopt
SC3.0.1 f77 max   -xO4 -xcg92 -dalign -xlibmil -fsimple -Bstatic
                  -depend -xlibmopt

SC4.0 max301      same options as SC3.0.1 max

SC4.0 cc max      -xO5 -xtarget=ultra -dalign -xlibmil -fsimple=1 -Bstatic
                  -fsingle -xlibmieee -xdepend -lsunmath -lmvec -lmopt -lcx -lco
SC4.0 f77 max     -xO5 -xtarget=ultra -dalign -xlibmil -fsimple=1 -Bstatic
                  -depend -xlibmopt -stackvar -lsunperf -lmvec -lcx

SC4.0 maxv8p      same options as SC4.0 max plus -xarch=v8plus

SC4.0 maxv9       same options as SC4.0 max plus -xarch=v8plusa -xsafe=mem -fsim

     So

sc40.max301
      represents the same compiler options as sc301.max, but using the new
     compiler;

sc40.max
      represents new UltraSPARC options still permitting the executable to run
     on a V8 SPARC;

sc40.maxv8p and sc40.maxv9
     represent additional new options depending on V9 SPARC features - such
     executables won't run on V8 systems.

     Each of these steps in compile-time complexity provides an additional
increment of typical run-time performance.   But performance improvements are
never available universally or uniformly:

SC4.0 Compiler Performance Summary

     Using the same UltraSPARC hardware, in order to measure  performance  im-
provements with the SPARCcompilers 4.0 release, demonstrates

*)   better general and  SPARC-V8  optimization  technology:  sc40.max301  vs.
     sc301.max  is  typically  8-14%,  rarely as much as 2.5X faster, and 1.1X
     slower.

*)   specific SPARC V8-compatible optimizations for UltraSPARC:  sc40.max  vs.
     sc40.max301  is  typically 9-19%, rarely as much as 4.2X faster, and 1.2X
     slower.

*)   specific non-V8-compatible optimizations for UltraSPARC: sc40.maxv8p  vs.
     sc40.max  is  typically  1-9%,  rarely  as  much as 1.5X faster, and 1.7X
     slower.  Thus the "v8plus" optimizations can't be recommended in general,
     but are worth considering in special cases.

*)   the overall effect of the three foregoing optimizations: sc40.maxv8p  vs.
     sc301.max  is  typically  31-45%, rarely as much as 3.9X faster, and 1.1X
     slower.

*)   For SC4.0, -fast is typically 11-22% faster than -O,  and  max  optimiza-
     tions are typically 6-30% faster than -fast.   So these options are worth
     teaching to performance-oriented ISV's.

Test machines

     SPARC executables were run under SunOS 5.5 on a 143MHz UltraSPARC system
identified by uname -a as:

Ultra = SunOS ultrafoo 5.5 Generic sun4u sparc SUNW,Ultra-1

For comparison, additional "sc40.max301" results were obtained for an SS20
2x150MHz HyperSPARC system:

Hyper = SunOS ohdear 5.5 Generic sun4m sparc SUNW,SPARCstation-20

and for an SS20 2x85MHz SuperSPARC-II system (not a supported product):

Super = SunOS sigh 5.5 Generic sun4m sparc SUNW,SPARCstation-20

The extra HyperSPARC and SuperSPARC processors did not affect any of these
tests except 026.compress.

UltraSPARC Hardware Performance Summary

     Using the same executables optimized for SuperSPARC, in order to measure
performance improvements with UltraSPARC chips, demonstrates that

*)   compared to 150MHz HyperSPARC, 143MHz UltraSPARC is typically 7-15%,
     rarely as much as 2.4X faster, and 1.8X slower.

*)   compared to 85MHz SuperSPARC-II, 143MHz UltraSPARC is typically 1.3-1.6X,
     rarely as much as 2.2X faster, and 4X slower.  The dramatic slowdowns are
     usually in programs that underflow frequently.  For programs that can run
     acceptably with abrupt underflow to zero rather than IEEE 754 gradual
     underflow, Sun's -fnonstd compiler option improves performance on
     UltraSPARC and HyperSPARC.

*)   compared to 85MHz SuperSPARC-II, 150MHz HyperSPARC is typically 15-50%,
     rarely as much as 2.2X faster, and 3X slower.  The dramatic slowdowns are
     usually in programs that underflow frequently.

     As shown earlier, 10-30% additional UltraSPARC performance is available
by compiling specifically for UltraSPARC.  Although not evident in the kinds
of programs measured for this report, Sun's UltraSPARC-based systems also
offer considerable I/O and graphics performance enhancements over its
HyperSPARC and SuperSPARC systems.  And Sun has announced 200MHz UltraSPARC
systems for shipment in 1996 that will typically perform 1.4X faster than the
143MHz system measured for this report.  Sun has announced no faster
HyperSPARC or SuperSPARC systems than those measured for this report.

     To put these performance data into financial perspective, consider the
currently available hardware upgrade paths for my SS10/41, with normalized
prices and specrate_fp92 from the 11/28/95 Hardware Pricing Summary Guide:

               Upgrade To   Price   specrate_fp92   rate/price

               20/71        1                2875      2900
               20/151       1.07             3734      3500
               Ultra 140    1.21             7175      5900
               20/712MP     1.24             5439      4400
               Ultra 170    1.47             8323      5700
               20/152MP     1.54             8758      5700

Summary Performance Tables

     There are many ways to compare the performance of two systems.  This  re-
port  is  based  upon  ratios  of performance, summarized as SPEC ratios or as
medians over a variety of test programs. Looking at different subsets of data,
or  at  the same data in different ways, leads to somewhat different numerical
conclusions, a common hazard of performance analysis.

     The following two tables summarize the detailed performance tables  later
in this report, for specint92, specfp92, the median of "best-relative" perfor-
mance, the median of "relative" performance, the  minimum  "relative"  perfor-
mance,  and  the  maximum "relative" performance.  ("relative" means comparing
two systems directly, "best-relative" means comparing them to the best encoun-
tered in this study.) All comparisons are expressed as percentages, so 100 in-
dicates equal performance, and 109 indicates that one system is 9% faster than
the other.

     The "typical" performance ranges cited earlier are created from the  fol-
lowing  table  by  extracting the worst and the best among the specfp92, best-
relative median, and relative median columns; thus {109, 105, 101}  translates
to "1-9%".

        % Performance Ratios between Compiler Options with UltraSPARC

   Faster        Slower        spec    spec   best-rel    rel     rel   rel
                               int92   fp92    median    median   min   max

   sc40.max301   sc301.max       106    114     110       108     89    251
   sc40.max      sc40.max301      98    116     119       109     81    424
   sc40.maxv8p   sc40.max        101    109     105       101     60    151

   sc40.maxv8p   sc301.max       105    145     138       131     91    389

   sc40.max      sc40.fast       110    130     121       106     53    217
   sc40.fast     sc40.O          101    117     122       111     85    210

       % Performance Ratios between Systems with sc40.max301 Executables

       Faster   Slower    spec    spec   best-rel     rel     rel   rel
                          int92   fp92    median     median   min   max

       Ultra    Hyper       115    107      113       107      57    236
       Ultra    Super       132    160      139       135      25    223
       Hyper    Super       115    150      123       118      32    223

     In general, average performance increases about as expected in the
sequences

        -g => -O => -fast => max301 => max => maxv8p
and
        85MHz SuperSPARC-II => 150MHz HyperSPARC => 143MHz UltraSPARC

but there are plenty of individual anomalies evident in the extreme-statistics
tables, reflecting different hardware capabilities such as underflow handling
in hardware or software, and different sizes and organizations of external
caches.

The Purpose of Performance Analysis

     Many people fondly remember when upgrading their PC from 5 MHz to 8MHz
resulted in a uniform 1.6X improvement in every program they ran.   High
performance systems have never been like that.  As Jack Dongarra wrote in a
footnote to early versions of his Linpack benchmark compilation:

     The major difference between the CRAY 1-M and CRAY 1-S is in the memory
     speed, the CRAY 1-M having slower memory. The timings show the CRAY 1-M
     to be faster than the CRAY 1-S. After much discussion and examination of
     the generated assembly language code it was determined that, in fact, the
     CRAY 1-M was faster for this program. The code generated by the compiler
     causes the CRAY 1-S to miss a chain-slot. On the CRAY 1-M, because of
     slower memory, the chain-slot is not missed, thus the faster execution
     time.

     The median performance differences summarized above are of relatively
limited interest, because they depend so much on which programs are measured.
It's not widely appreciated, but the real goal of performance analysis is to
expose, and where possible explain, the anomalies of extremely good and
extremely poor performance between systems, because these are most helpful in
identifying the situations where each system is best deployed.  When
performance comparisons of dissimilar alleged high-performance systems contain
no anomalies or surprises, the reader may well wonder whether the data was
insufficiently gathered or overly filtered in analysis, and whether any useful
conclusions may be drawn from the results presented.

Some Specific Run-Time Performance Anomalies

     In the "maxv9" compilations, using -xsafe=mem without run-time profile
feedback may cause speculative loads of unneeded cache lines and even unneeded
pages.    That may penalize typical integer programs with more random memory
access patterns.

     An unfortunate interaction with the SC1.0 base conversion, which is
always used by SC1.0 f77 and is optional with SC1.0 acc, renders the 4.x
dynamic library binary compatibility mode incorrect on SunOS 5.5 when hardware
integer division is available.    This was worked around by compiling all s1-
10 f77 programs with -Bstatic.

     026.compress: Poor sc40.maxv9 performance is probably due to random data
access patterns when speculative loads are used via -xsafe=mem without
profiling data.

     eig{s.S,c.C}NEPT: These single-precision LAPACK timing programs appear to
generate many page faults, so SuperSPARC and HyperSPARC have an advantage over
UltraSPARC's software memory management.  SuperSPARC accrues additional
advantage over HyperSPARC from its 1MB external cache.

     intmc1000: Poor sc40.max* performance is due to "-stackvar" which is
unoptimal for this program.

     x[sd]huge,lin[sd].4: Poor sc40.maxv8p performance compared to sc40.max is
due to "-lsunperf" for which, on this program, the v8plus implementation
provides inferior performance to the v8 implementation.

Detailed Performance Table Format

     Ratios like specfp92 and specint92 are reported for each
compiler/option/host.

     Otherwise program compilations or executions are compared when both
comparison times are ten seconds or longer.  The comparison time is min(real
time, user time + system time), since real time was a function of the variable
loading of the test hardware.  Normally a uniprocessor program will show real
time >= user time + system time.

     Comparisons are based on relative performance, either between two
systems, or between one system and the best performance observed on all
systems tested.  Relative performance is expressed as percentages; 100 means
identical performance, 50 represents one system being twice as fast as
another, 200 represents the other system being twice as fast.

     Performance tables labeled simply "relative," such as sc301.O-sc40.O,
compare two compilers against each other without reference to any other
results.  Relative performance percentages are not bounded by 100%.  In the
case of "sc301.O-sc40.O", a test on which sc301 was 3X faster would be
reported as 33% while a test on which sc40 was 3X faster would be reported as
300%.  Thus these results are graphed on a log scale so that 33% and 300% are
equally distant from 100%.

     Relative example:

%    %    %    %    %  % 1 1  1    1    2    3    4
2    3    4    5    7  9 0 1  3    7    3    0    0
5    3    4    8    6  0 0 2  2    5    0    0    0       #    group

                      86-105--------------------425      109   sc301.O-sc40.O

     Performance tables labeled "best-relative" are relative to the best
correct performance measured on a particular test within this study.  The best
possible result is 100%. A test that ran 3X slower than the best on record
achieves 1/3 = 33% performance.  The absolute ideal best relative performance
would be the fastest possible time a particular computer could correctly solve
a particular problem with an ideal fully optimal compiler.  Since no such
compiler exists, the best time recorded so far on that test, with any
compilation options, approximates the time produced by that ideal compiler.

     All performance percentage quartile graphs indicate the zeroth-first-
second-third-fourth quartile performance percentages, number of tests
involved, and compiler(s).  The minimum performance percentage, median
performance percentage, and maximum performance percentage are listed.

     Best-relative example:

  %    %    %    %    %    %    %    %    %    %    1
  0    1    2    3    4    5    6    7    8    9    0
  0    0    0    0    0    0    0    0    0    0    0       #     comp  opt

            20--------------=====64=====-------92          100   sc40    O
           17------------======59=====--------90           100   sc301   O

This line reports the "best-relative" run-time performance of SC4.0 and
SC3.0.1 compilers at -O on 100 tests. Among these 100 for SC4.0, the median
performance percentage was 64%, meaning that the median program compiled with
SC4.0 at -O obtained 64% of the best performance ever recorded, i.e. ran 1.6X
slower.  The worst performance percentage was 20%, while the best was 92%.
One fourth of the results were contained in each of the intervals 20-52%, 52-
64%, 64-74%, and 74-92%.  Note that the anomalous test with 425% relative
performance in the "relative" comparison is completely concealed in this
"best-relative" comparison.

     The corresponding best-relative histogram shows:

sc40 O
  0 <= % <=   9 :   0
 10 <= % <=  19 :   0
 20 <= % <=  29 :   5 XXXXX
 30 <= % <=  39 :   5 XXXXX
 40 <= % <=  49 :  11 XXXXXXXXXXX
 50 <= % <=  59 :  13 XXXXXXXXXXXXX
 60 <= % <=  69 :  27 XXXXXXXXXXXXXXXXXXXXXXXXXXX
 70 <= % <=  79 :  21 XXXXXXXXXXXXXXXXXXXXX
 80 <= % <=  89 :  14 XXXXXXXXXXXXXX
 90 <= % <=  99 :   4 XXXX
100 <= %        :   0

The histogram marks an X for each test that falls in the indicated performance
percentage bracket.  Not surprisingly there is a clustering around the median.

     The corresponding best-relative extremes table lists the worst and best
case tests:

20   xdhuge          22   eigc.CNEPT          25   lind.4       sc40    O
91   f2c             91   jetset74            92   013          sc40    O

In this case the bad news is xdhuge, a large linear equation problem, and eigc
and lind, lapack timing programs, which benefit 4X-5X from UltraSPARC-specific
optimization.  In contrast, the good news is f2c and 013.spice2g6, mostly
integer-oriented, and jetset74, a Monte Carlo simulation, which achieved
within 9% of best recorded performance.

Performance Tables

     The following tables were produced mechanically.   SPEC-like  ratios  are
comparable to each other but not to those reported or obtained elsewhere.

                               ~SPEC~92~ratios

                  Comp     Opt         fp      int    Host

                  sc40      maxv9      274     119     Ultra
                  sc40      maxv8p     268     142     Ultra
                  sc40      max        245     140     Ultra
                  sc40      max301     211     143     Ultra
                  sc40      fast       188     127     Ultra
                  sc301     max        185     135     Ultra
                  sc301     fast       169     121     Ultra
                  sc201     max        163     122     Ultra
                  sc40      O          161     126     Ultra
                  sc201     O          147     110     Ultra
                  s1-10     max        146      90     Ultra
                  sc201     fast       142     109     Ultra
                  sc301     O          141     122     Ultra
                  s1-10     fast       130      85     Ultra
                  s1-10     O          119      86     Ultra
                  sc201     g           52      63     Ultra
                  s1-10     g           51      55     Ultra
                  sc301     g           43      57     Ultra
                  sc40      g           43      57     Ultra
                  sc40      max301     132     108     Super
                  sc40      g           32      49     Super
                  sc40      max301     198     124     Hyper

Relative run performance - two groups

%    %    %    %    %  % 1 1  1    1    2    3    4
2    3    4    5    7  9 0 1  3    7    3    0    0
5    3    4    8    6  0 0 2  2    5    0    0    0       #    group

                              135-------=======3671748    82   sc40.g-sc40.O
                         102-===157===--------352        102   s1-10.max-sc40.ma
                       88--====147==--------331          102   sc201.max-sc40.ma
                       89---==139===------------426      104   s1-10.max-sc40.ma
25-------------------------==135=----223                 106   sc40.max301.Super
                       91--==131====-----------389       102   sc301.max-sc40.ma
                       92--==129===-------------431      104   sc201.max-sc40.ma
                      83---=123===--------------412      104   sc301.max-sc40.ma
                     79---==122==---------------698      107   sc201.fast-sc40.f
                      83--==122==---------------887      107   s1-10.fast-sc40.f
    32--------------------=118=------223                 106   sc40.max301.Super
        40---------------=111=-------------300           107   s1-10.max-sc301.m
                     81--=111=------------------562      107   sc301.fast-sc40.f
                      85-=111=------210                  107   sc40.O-sc40.fast
                     81--=110------------272             107   s1-10.fast-sc301.
                     81--=109==-----------------424      104   sc40.max301-sc40.
                       89=109=------------------454      109   s1-10.O-sc40.O
                       89108=-----------251              107   sc301.max-sc40.ma
               57-------=107==--------236                106   sc40.max301.Hyper
                     79--107=---------230                107   sc201.fast-sc301.
              53--------=106---------217                 106   sc40.fast-sc40.ma
         42--------------105--------209                  107   sc201.max-sc301.m
                     79--105-------200                   110   sc201.O-sc40.O
                      86-105--------------------425      109   sc301.O-sc40.O
              55---------104--------------------448      110   s1-10.O-sc201.O
                  68-----104--------------292            110   s1-10.max-sc201.m
 26---------------------103==-------------------419      109   s1-10.O-sc301.O
                    78-=103=-------192                   110   s1-10.fast-sc201.
                60-------101--151                        102   sc40.max-sc40.max
                61-------100113                          102   sc40.maxv8p-sc40.
                   71--=100==-----182                     87   s1-10.g-sc201.g
                      86-100-----171                      89   sc301.g-sc40.g
25---------------------=99=------178                     109   sc201.O-sc301.O
        38---------===89==------166                       87   s1-10.g-sc40.g
        38---------==87==-------162                       87   s1-10.g-sc301.g
              53---==85==113                              87   sc201.g-sc40.g
        39---------==83=--118                             87   sc201.g-sc301.g

                      Relative performance extremes - two groups

 %     test            %     test           %     test          group

  55   spec77           79   eigc.CSEPT      79   xchuge        s1-10.O-sc201.O
 186   tc8             205   bdna           448   048.ora       s1-10.O-sc201.O
  26   trfd             70   spec77          78   ocean         s1-10.O-sc301.O
 182   seis.medium2    189   tc8            419   048.ora       s1-10.O-sc301.O
  89   spec77           91   xzhuge          91   zhuge         s1-10.O-sc40.O
 204   tc8             211   bdna           454   048.ora       s1-10.O-sc40.O
  78   xdhuge           79   spec77          79   eigc.CSEPT    s1-10.fast-sc201
 150   intmc1000       172   cslalom14      192   tc8           s1-10.fast-sc201
  81   lind.DT          82   xdhuge          84   lind.4        s1-10.fast-sc301
 169   ocean           213   tc8            272   mg3b          s1-10.fast-sc301
  83   xzhuge           84   lind.4          84   147           s1-10.fast-sc40.
 394   mg3b            408   ocean          887   trfd          s1-10.fast-sc40.
  71   intmc1000        74   gol4.m33        75   gol4.m34      s1-10.g-sc201.g
 140   seis.medium2    171   cslalom14      182   048.ora       s1-10.g-sc201.g
  38   intmc1000        41   dyfesm          42   gol4.m33      s1-10.g-sc301.g
 129   chuge           143   048.ora        162   cslalom14     s1-10.g-sc301.g
  38   intmc1000        42   gol4.m33        44   gol4.m34      s1-10.g-sc40.g
 126   seis.medium2    143   048.ora        166   cslalom14     s1-10.g-sc40.g
  68   spec77           77   eigc.CNEPT      92   huge8_SP      s1-10.max-sc201.
 152   094.fpppp       171   herwig57       292   tc8           s1-10.max-sc201.
  40   huge8_DP         43   huge16_DP       45   huge8_SP      s1-10.max-sc301.
 179   herwig57        206   #vpenta        300   tc8           s1-10.max-sc301.
  89   eigc.CNEPT      100   124            102   spec77        s1-10.max-sc40.m
 330   xshuge          333   lind.4         426   xdhuge        s1-10.max-sc40.m
 102   eigc.CNEPT      103   cslalom14      103   eigs.SGEPT    s1-10.max-sc40.m
 307   #vpenta         328   lind.4         352   tc8           s1-10.max-sc40.m
  25   trfd             67   mg3b            76   dnacompare    sc201.O-sc301.O
 141   linc.CT         146   intmc1000      178   eigc.CSEPT    sc201.O-sc301.O
  79   shuge            79   xshuge          83   hugeroll_SP   sc201.O-sc40.O
 163   spec77          163   linc.CT        200   eigc.CSEPT    sc201.O-sc40.O
  79   dnacompare       79   xshuge          83   photon100     sc201.fast-sc301
 159   ocean           178   eigc.CSEPT     230   mg3b          sc201.fast-sc301
  79   xshuge           83   147             86   dnacompare    sc201.fast-sc40.
 333   mg3b            384   ocean          698   trfd          sc201.fast-sc40.
  39   dyfesm           54   intmc1000       55   shuge         sc201.g-sc301.g
 103   #cfft2d         112   090.hydro2d    118   chuge         sc201.g-sc301.g
  53   intmc1000        55   shuge           57   gol4.m33      sc201.g-sc40.g
 106   reweight        112   090.hydro2d    113   chuge         sc201.g-sc40.g
  42   huge16_DP        42   huge8_DP        48   trfd          sc201.max-sc301.
 149   ocean           157   spec77         209   #vpenta       sc201.max-sc301.
  92   124              96   gol4.m35        97   3e2.jscc0     sc201.max-sc40.m
 335   lind.4          340   xshuge         431   xdhuge        sc201.max-sc40.m
  88   gol4.m35         90   gol4.m36        98   026           sc201.max-sc40.m
 263   xdhuge          312   #vpenta        331   lind.4        sc201.max-sc40.m
  86   intmc1000        94   zhuge           94   eigd.DGEPT    sc301.O-sc40.O
 143   ocean           175   mg3b           425   trfd          sc301.O-sc40.O
  81   147              82   094.fpppp       93   huge16_DP     sc301.fast-sc40.
 194   shuge           241   ocean          562   trfd          sc301.fast-sc40.
  86   seis.4           95   #cfft2d         96   094.fpppp     sc301.g-sc40.g
 112   056.ear         129   reweight       171   dyfesm        sc301.g-sc40.g
  83   eigc.CNEPT       89   124             90   intmc1000     sc301.max-sc40.m
 323   xshuge          353   huge8_SP       412   xdhuge        sc301.max-sc40.m
  89   hugeroll_DP      91   spec77          91   gamteb1000    sc301.max-sc40.m
 231   huge16_SP       240   huge8_DP       251   huge16_DP     sc301.max-sc40.m
  91   gol4.m36         92   gol4.m35        95   seis.4        sc301.max-sc40.m
 355   huge8_SP        373   huge8_DP       389   huge16_DP     sc301.max-sc40.m
  85   147              92   099.null        93   048.ora       sc40.O-sc40.fast
 195   shuge           205   dhuge          210   ocean         sc40.O-sc40.fast
  53   shuge            57   dhuge           62   hugeroll_SP   sc40.fast-sc40.m
 164   093.nasa7       190   052.alvinn     217   #vpenta       sc40.fast-sc40.m
 135   reweight        141   3e2.grey       157   ray.coin      sc40.g-sc40.O
1480   mg3b           1735   shuge         1748   #mxm          sc40.g-sc40.O
  60   xshuge           61   xdhuge          80   lins.4        sc40.max-sc40.ma
 142   chuge           149   lind.DT2       151   huge16_DP     sc40.max-sc40.ma
  81   eigc.CNEPT       86   3e2.jscc0       86   124           sc40.max301-sc40
 309   lind.4          339   xshuge         424   xdhuge        sc40.max301-sc40
  57   052.alvinn       75   eigs.SNEPT      78   reweight      sc40.max301.Hype
 206   047             213   arc2d          236   #vpenta       sc40.max301.Hype
  32   eigc.CNEPT       33   eigs.SNEPT      67   m300DP        sc40.max301.Supe
 205   090.hydro2d     217   034.mdljdp2    223   077.mdljsp2   sc40.max301.Supe
  25   eigs.SNEPT       26   eigc.CNEPT      59   eigc.4        sc40.max301.Supe
 188   047             204   034.mdljdp2    223   077.mdljsp2   sc40.max301.Supe
  61   026              71   f2c             80   huge4_SP      sc40.maxv8p-sc40
 108   077.mdljsp2     112   #cfft2d        113   eigs.SGEPT    sc40.maxv8p-sc40

Best-relative run performance

%    %    %    %    %    %    %    %    %    %    1
0    1    2    3    4    5    6    7    8    9    0
0    0    0    0    0    0    0    0    0    0    0       #    comp    opt      

            24-------------------------------====99100   100   sc40    maxv9    
            24---------------------------------==99100   100   sc40    maxv8p  
           21------------------------------===94100      100   sc40    max      
            24------------------=======79=====--100      100   sc40    max301  
          20------------------========78=======-100      100   sc40    fast    
            24-------------========72=====------99       100   sc301   max      
            24-------------=======70========----100      100   sc40    max301  
         17--------------=======65=====------93          100   sc301   fast    
         18----------------=====65========------100      100   sc201   max      
          20--------------=====64=====-------92          100   sc40    O        
            23---------=======61======---------96        100   s1-10   max      
         18-----------=======60======--------92          100   sc201   O        
         17------------======59=====--------90           100   sc301   O        
      11--------------======57=====---------90           100   s1-10   fast    
       14-------------======57=======-----------99       100   sc201   fast    
              27-------=====57=====-------------100      100   sc40    max301  
         18----------=====53=======--------89            100   s1-10   O        
   5--=====23========-----------67                        72   sc201   g        
  3--===18=========----------60                           72   sc301   g        
  3--===18=========-----------62                          72   sc40    g        
 2--===15========-----------------70                      98   s1-10   g        
 2--==14========--------50                                72   sc40    g        

                          Best-relative performance extremes

 %    test            %    test                %    test         comp    opt    

 18   048.ora         20   xdhuge              23   eigc.CNEPT   s1-10   O      
 82   147             84   3e2.grey            89   f2c          s1-10   O      
 18   xdhuge          19   eigc.CNEPT          22   eigs.SNEPT   sc201   O      
 90   f2c             90   013                 92   026          sc201   O      
 17   trfd            19   xdhuge              20   eigc.CNEPT   sc301   O      
 86   026             88   f2c                 90   seis.4       sc301   O      
 20   xdhuge          22   eigc.CNEPT          25   lind.4       sc40    O      
 91   f2c             91   jetset74            92   013          sc40    O      
 24   eigc.CNEPT      27   eigs.SNEPT          59   026          sc40    maxv9  
100   eigd.DNEPT     100   gamess.4           100   linz.ZT      sc40    maxv9  
 24   eigc.CNEPT      27   eigs.SNEPT          60   xshuge       sc40    maxv8p
100   linc.CT        100   linz.ZT            100   eigz.4       sc40    maxv8p
 24   xdhuge          25   eigs.SNEPT          26   eigc.CNEPT   sc40    max301
100   022.li         100   tc8                100   147          sc40    max301
 27   xdhuge          28   lind.4              32   xshuge       sc40    max301
100   eigs.SGEPT     100   eigc.CNEPT         100   eigs.SNEPT   sc40    max301
 24   xdhuge          29   #vpenta             31   lind.4       sc40    max301
100   ray.coin       100   seis.medium2       100   seis.4       sc40    max301
 23   eigc.CNEPT      23   xdhuge              24   eigs.SNEPT   s1-10   max    
 87   3e2.grey        95   f2c                 96   cslalom14    s1-10   max    
 18   eigc.CNEPT      23   xdhuge              25   eigs.SNEPT   sc201   max    
 99   026            100   gol4.m35           100   gol4.m36     sc201   max    
 24   xdhuge          25   huge16_DP           25   eigc.CNEPT   sc301   max    
 99   026             99   gol4.m36            99   cslalom14    sc301   max    
 21   eigc.CNEPT      26   eigs.SNEPT          65   eigc.4       sc40    max    
100   lins.4         100   geodetic8          100   lind.4       sc40    max    
  2   lins.4           2   xshuge               3   lind.4       s1-10   g      
 51   013             57   ray.coin            70   3e2.grey     s1-10   g      
  5   shuge            7   mg3b                 8   flo52        sc201   g      
 53   ray.coin        55   reweight            67   3e2.grey     sc201   g      
  3   shuge            5   flo52                5   dhuge        sc301   g      
 45   reweight        50   ray.coin            60   3e2.grey     sc301   g      
  3   shuge            5   flo52                5   mg3b         sc40    g      
 50   ray.coin        59   reweight            62   3e2.grey     sc40    g      
  2   shuge            3   mg3b                 3   dhuge        sc40    g      
 41   3e2.grey        49   026                 50   reweight     sc40    g      
 11   trfd            21   eigs.SNEPT          23   eigc.CNEPT   s1-10   fast  
 85   ray.coin        87   3e2.grey            90   f2c          s1-10   fast  
 14   trfd            18   xdhuge              19   eigc.CNEPT   sc201   fast  
 91   3e2.grey        92   026                 99   cslalom14    sc201   fast  
 17   trfd            19   xdhuge              20   eigc.CNEPT   sc301   fast  
 90   seis.medium1    91   seis.4              93   cslalom14    sc301   fast  
 20   xdhuge          24   eigc.CNEPT          25   lind.4       sc40    fast  
100   mg3b           100   gamess.thymine_t   100   intmc1000    sc40    fast  

Test Programs

     The tests used for the comparison were those from SPEC89 and SPEC92; per-
fect  1; and a number of others listed below.  Most are Fortran; (C) indicates
programs in C.  All test programs have been somewhat modified from their  ori-
ginal form.  Thus the results reported here are comparable with each other but
not with those obtained or reported elsewhere.  The modified versions of those
programs   listed   that   are   freely   distributable   are  available  from
d...@validgh.com.

Abbreviation   Description

               available from ste...@yoda.physics.unc.edu:
geodetic8      (C) geodesic distance in spacetime by Christensen and Fulling
               with PROBLEM_SIZE = 8

               available from BE...@SCIENCE.UTAH.EDU:
goliath4.*     exact rational system analyzer by Alfeld with various inputs
tc8            group theory computations by Chamberlin

               available from web...@hep.phy.cam.ac.uk:
herwig57       herwig 5.7 Monte Carlo hadron emission reactions physics

               available from TOR...@CERNVM.cern.ch:
jetset74       jetset 7.4 Monte Carlo jet fragmentation physics

               available from rayshade-requ...@cs.yale.edu:
ray.*          (C) rayshade 4.06 graphics rendering program with various inputs

               available from seym...@NPL.NPL.WASHINGTON.EDU:
reweight       Monte Carlo simulation of particles in detector by Prindle

               available from net...@ornl.gov:

f2c            (C) fortran to C translator
lf2c           (C) fortran to C run time library
blas           basic linear algebra subroutines library
lapack         LAPACK library
               xeig[sdcz], xlin[sdcz] - lapack timing programs
               eig[sdcz],lin[sdcz] - lapack timing programs using -lsunperf

cslalom14      (C) slalom benchmark code applied to 1399x1399 problem

               available from Los Alamos benchmarking group:

gamteb         LABMK21 - monte carlo gamma ray transport in carbon cylinder
hydro          2D Lagrangian Hydrodynamics
intmc          LABMK1 - monte carlo electronic lock - integer arithmetic
photon         photon transport through carbon cylinder

               available from d...@validgh.com:

chuge          32MB complex*8 linear equation using lapack
dhuge          32MB real*8 linear equation using lapack
shuge          32MB real*4 linear equation using lapack
zhuge          32MB complex*16 linear equation using lapack
<#various>     kernels from double precision NAS kernels
dnacompare     (C) compare DNA sequences by Huang and Miller

               available from UCB ERL; license required:

lib3e2         (C) SPICE 3E2 library
3e2.*          (C) spice3e2 run with various inputs

               available from PERFECT club; license required:

adm
arc2d
bdna
dyfesm
flo52
mdg
mg3b           perfect mg3d with modified I/O to intermediate file
ocean
qcd2
spec77
track
trfd

               available from SPEC; license required:

               SPEC 89:
m300           matrix300

               SPEC 92:
008            (C) 008.espresso
013            013.spice2g6 with greycode input
015            015.doduc
022            (C) 022.li
023            (C) 023.eqntott
026            (C) 026.compress
034            034.mdljdp2
039            039.wave5
047            047.tomcatv
048            048.ora
052            (C) 052.alvinn
056            (C) 056.ear
072            (C) 072.sc
077            077.mdljsp2
078            078.swm256
085            (C) 085.gcc
089            089.su2cor
090            090.hydro2d
093            093.nasa7
094            094.fpppp

               SPEC 95:
099            (C) 099.go
124            (C) 124.m88ksim
147            (C) 147.vortex

               SPEC HPC:
gamess
seis

Advertisement

     Performance reports like this one, comparing two compilers on the same
hardware or two hardware implementations with the same compilers, can be
produced on a contract basis.   Please contact d...@validgh.com for a business
announcement, project proposal, or copies of previously published reports.
--

David Hough                             d...@validgh.com
Consultant on system correctness, performance evaluation, and
IEEE 754 binary floating-point arithmetic --- Send for business announcement

 
 
 

1. Probs w/ GDB on SparcCompiler 4.0 binaries

     I've had no luck getting GDB v4.16 to work with SparcCompiler 4.0
binaries.  The debugging symbols are not pulled into the executable when
the original .c files are compiled with SC4.0.  If I compile them with
anything else (e.g. GCC, Centerline), the debugging symbols are pulling
into the the binary (e.g. dump -c output).

     .stabstr:
        <offset>     Name
        ...
        <93>         char:t(0,1)=bsc1;0;8;
        <115>        short:t(0,2)=bs2;0;16;
        <138>        int:t(0,3)=bs4;0;32;
        <159>        long:t(0,4)=bs4;0;32;
        ...

     Can GDB be made to work on SparcCompiler binaries?  Does gdb support
reading debug info by tracing down component object files (like the
SparcWorks debugger does), or is there a switch I missed on the compiler or
linker to cause it to generate debug info such that it'll be pulled into
the executable by ld?

--
_____________________________________________________________________________
    Randall Hopper  (AA8VB)   |  Picker International, CT Visualization

            *** WINDOWS, from the folks who brought you EDLIN ***
_____________________________________________________________________________

2. boot problem

3. UltraSPARC-IIi vs. UltraSPARC-I

4. XFree86 3.2 problem

5. UltraSparc performance and the Pentium III

6. Two questions on GNU-C++: Exception Handling and STL

7. sun4u: ultrasparc, ultrasparc II ?

8. Upgrading Kernels - Now Insmod fails...

9. Performance evaluation: UltraSPARC vs. P-III

10. Improving NFS performance.

11. pcsim question - advice to improve performance?

12. environment variables improving egrep performance

13. resources and improving performance