Not exactly the most exciting topic in the world but here it is. I was very
happy with gcc version 2.8.1 on my Sparc20 at home. For reasons unknown and
better left unsaid I thought that I could install 2.95.3 to see what improvement
in optimization I would get in some of my more numerically intensive code. I
was surprised to see that the same source code produced a different run time on
my Sparc20 when compiled with gcc 2.95.3. Considerably slower. I wonder what
causes that? Well, could be a lot of little things so I wrote a cute little
program that computes pi by using the most inefficient method that I know of.
Essentially (pi^2)/6 is equal to the infinite sum of 1/(n^2) for n>0 . Very
very slow and thus an estimate for pi accurate to about seven digits after the
decimal may be achieved with n=1073741823. That's a lot of iterations through
the central loop. Well, I compiled this program while using gcc 2.95.3 with
various optimization options but never get anything close to the performance
produced by using gcc 2.8.1. Here is the source :
$ cat pi.c
/* Standard PI calculation using an infinite series - Dennis Clarke */
/********************************************************************/
/* $ uname -a */
/* SunOS yay 5.7 Generic_106541-15 sun4m sparc SUNW,SPARCstation-20 */
/* */
/* $ psrinfo -v */
/* Status of processor 0 as of: 05/25/01 21:24:41 */
/* Processor has been on-line since 05/21/01 05:37:39. */
/* The sparc processor operates at 60 MHz, */
/* and has a sparc floating point processor. */
/* Status of processor 2 as of: 05/25/01 21:24:41 */
/* Processor has been on-line since 05/21/01 05:37:43. */
/* The sparc processor operates at 60 MHz, */
/* and has a sparc floating point processor. */
/********************************************************************/
#include <locale.h>
#include <stdio.h>
#include <sys/time.h>
#include <math.h>
int main(int argc, char *argv[]) {
double pi = (double) 0.0;
unsigned long i;
/*****************************************************/
/** sum the series 1/(x^2) **/
/*****************************************************/
fprintf ( stdout, "\n\n" );
for (i = 1; i < 1073741823; i++) {
pi = pi + (double)1.0/( (double)i * (double)i );
}
fprintf(stdout, " pi at n=%9u is %.12g \n", i, sqrt( pi * (double)6.0
));
exit(1);
Well, I was going to print the start and stop time in the code but decided toQuote:}
simply use the Solaris time program instead. In any case, here are the results
of my test :
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Using gcc 2.8.1 thus : gcc -Wall -v -O3 -c -o pi.o pi.c results in a file 24664 bytes in length and of type : ELF 32-bit MSB executable SPARC Version 1, dynamically linked, not stripped The run time is $ time -p ./pi pi at n=1073741823 is 3.14159264498 real 305.58 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Using gcc 2.95.3 20010315 (release) thus : gcc -Wall -v -O3 -c -msupersparc -mcpu=supersparc -mtune=supersparc -o pi.o pi.c results in a file 7076 bytes in length and of type : ELF 32-bit MSB executable SPARC Version 1, dynamically linked, not stripped The run time is $ time -p ./pi pi at n=1073741823 is 3.14159264498 real 377.32 -=-=-=-=-=-=-=-=-=- Compile with gcc 2.95.3 20010315 (release) thus : gcc -Wall -v -O3 -c -mcpu=v8 -mtune=v8 -o pi.o pi.c Run 3 $ file pi pi at n=1073741823 is 3.14159264498 real 377.38 -=-=-=-=-=-=-=-=-=-=-=-=- Run 4 $ gcc -Wall -O3 -c -mcpu=v8 -o pi.o pi.c pi at n=1073741823 is 3.14159264498 real 377.31 -=-=-=-=-=-=-=-=-=-=-=-=- $ gcc -Wall -O3 -c -o pi.o pi.c pi at n=1073741823 is 3.14159264498 real 377.40 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Run 6 $ gcc -Wall -O0 -c -o pi.o pi.c pi at n=1073741823 is 3.14159264498 real 557.21 Well, so there it is. The results seem to show that gcc 2.8.1 will produce a Dennis ps: I'll try the same thing on an Ultra2 using the SparcV9 cpu optimization
Run 1
gcc -v -o pi pi.o -lm
user 305.49
sys 0.02
Run 2
gcc -v -o pi pi.o -lm
user 377.25
sys 0.02
gcc -v -o pi pi.o -lm
pi: ELF 32-bit MSB executable SPARC Version 1, dynamically linked, s
tripped
$ time -p ./pi
user 377.35
sys 0.02
pi.c: In function `main':
pi.c:26: warning: unsigned int format, long unsigned int arg (arg 3)
$ gcc -o pi pi.o -lm
$ ls -lap pi
-rwxr-xr-x 1 dclarke staff 7068 May 25 19:09 pi
$ strip pi
$ ls -lap pi
-rwxr-xr-x 1 dclarke staff 4664 May 25 19:09 pi
$ time -p ./pi
user 377.26
sys 0.03
Run 5
pi.c: In function `main':
pi.c:26: warning: unsigned int format, long unsigned int arg (arg 3)
$ gcc -o pi pi.o -lm
$ ls -lap pi
-rwxr-xr-x 1 dclarke staff 7068 May 25 19:27 pi
$ strip pi
$ ls -lap pi
-rwxr-xr-x 1 dclarke staff 4664 May 25 19:27 pi
$ time -p ./pi
user 377.38
sys 0.01
pi.c: In function `main':
pi.c:26: warning: unsigned int format, long unsigned int arg (arg 3)
$ gcc -o pi pi.o -lm
$ ls -lap pi
-rwxr-xr-x 1 dclarke staff 7172 May 25 19:50 pi
$ strip pi
$ ls -lap pi
-rwxr-xr-x 1 dclarke staff 4768 May 25 19:51 pi
$ time -p ./pi
user 557.13
sys 0.02
faster binary with the same source with no special optimizations. Geez I wonder
why. Just how different can the machine code be? Maybe I should disassemble to
two and have a look see.
option but I don't expect much to be different. Then again , who knows.