Benchmarks for Linux multi-processor.

Benchmarks for Linux multi-processor.

Post by eak » Wed, 30 Dec 1998 04:00:00



Greetings,

    I have my eye on a new Linux server to join my ever growing cluster.
I'm looking at the VArStation XMP from VA Research among others and
wondered if anybody has done any performance comparisons?

Specifically the .5 -1 -2 Mb Cache upgrades with the Xeon processor how
much of a performance difference does it make under Linux?
Also using the included RH 5.2 does the second processor really come
into play ?  What if you self upgrade to the 2.1.x kernel?

This box is destined for scientific number crunching, database
functions, and some graphics. So performance definitely is appreciated,
the question is whether
it's worth 5K to jump from .5 to 1 Mb cache?

Thanks,

Eric A. Kihn

 
 
 

Benchmarks for Linux multi-processor.

Post by Gary Momariso » Wed, 30 Dec 1998 04:00:00



> Greetings,

>     I have my eye on a new Linux server to join my ever growing cluster.
> I'm looking at the VArStation XMP from VA Research among others and
> wondered if anybody has done any performance comparisons?

VA Research has some benchmark results on their site.

Find it via Gary's Encyclopedia at

http://www.aa.net/~swear/pedia/benchmarks.html

 
 
 

Benchmarks for Linux multi-processor.

Post by Frankie Eas » Sat, 09 Jan 1999 04:00:00



> Greetings,

>     I have my eye on a new Linux server to join my ever growing cluster.
> I'm looking at the VArStation XMP from VA Research among others and
> wondered if anybody has done any performance comparisons?

> Specifically the .5 -1 -2 Mb Cache upgrades with the Xeon processor how
> much of a performance difference does it make under Linux?
> Also using the included RH 5.2 does the second processor really come
> into play ?  What if you self upgrade to the 2.1.x kernel?

> This box is destined for scientific number crunching, database
> functions, and some graphics. So performance definitely is appreciated,
> the question is whether
> it's worth 5K to jump from .5 to 1 Mb cache?

> Thanks,

> Eric A. Kihn


Hi Eric,
    Is your cluster a "cluster" or do you mean your "collection?"  I ask
because for serious number crunching we use PVM on a Beowulf cluster.  The
communications overhead of a typical network of workstations that the
processors needn't be as high-end as the Xeons are.  We balance ours with a
Fast Ethernet swith and 32 pentium pro 200 boxes.  We do however use the
1mb cache pentium pros.  The cache issue depends on the calculations you
specifically intend to do and your data set sizes.

    My experience with SMP on Red Hat 5.2 is nil, but I did it on 5.0.
Both install a single processor kernel and require a kernel recompile
(after editing the makefile to uncomment the SMP declaration).
Then my results (using both the Byte benchmark code as well as timing a
series of 1000 fast fourier transforms) were not stellar.  The SMP kernel
is better at increasing total throughput between processes than at
increasing specific throughput as in the FFT calculation itself.  It *did*
show some speedup, just not alot.  With the SMP kernel you are primarily
going to use multi-threadeed programming.  Our results were greatly
improved by the addition of Portland Group compilers to that box.  (we got
a kit that has C/C++/HPF, and a graphic analysis tool for just under 700
bucks, these are the same Portland compilers used on our Crays which makes
moving code around simple).  They allow OpenMP style programming so that
you can specify exactly which processor will get which processes and the
pass data between them.  Thus you have the ability to "connect" your
processors in various parallel architectures.  Though these connections are
defined in software rather than in hardware it does allow you to set up
much quicker algorithms and define channels of communication between
processes.  This was *very* fast because you have a bus rather than a
network.  I would like to get my hand on a 16 processor machine to try
hypercube FFTs on internally.  That wold probable bang'em out nicely.

As far as the 2.1.x kernel, I don't know, if the SMP has been significantly
improved you may want to do it.  But if not then what's the difference?

Now, back to cache... cache hits are a source of the most significant
speedup you can get in your computer next to clock speed.  They are oders
of magnitude better than memory fetches.  So if your algorithm is one that
will use a specific chunk of data again and again then the larger cache is
worth its weight in gold.  That is why Intel pretty much*s you over
it.  Increasing cache size also increases the probability of cache hits
because you can cache so much more data.  Example, I do an image
transformation, then apply a filter to the result.  I can spread portions
of that image amongst the cpus I have using OpenMP and thus farm out the
job.  Since each processor is going to apply iterative techniques across
the same piece of data twice I can expect to deal purely at cache speed for
the second caclulation if my cache size is roughly equal or larger than the
image size I farmed out to that processor.  If the cache size is smaller
then I am stuck reading everything twice.  Say my cache is 2/3 my set
size.  Then on iteration number two I have the *last* 2/3 of the set
cached.  Unless I code my algorithm to turn around on the last piece of
data I will wind up needing the first third of the set which isn't cached.
As I read that first third in I am overwriting the second third.  As I read
the second third from memory since by the time I get to it it is no longer
cached I overwrite the last third of my data.  Thus I must fetch it all
from memory.  This sucks, and a well-chosen algorithm will get you around
2/3 of the memory reads in that example.  But that still leaves alot of
fetching to be done.

So, so far we've covered graphics and scientific computing.  Now as for
databases I am no expert.  But since you have a situation where the speeds
are still differing by orders of magnitude (cache hit versus memory fetch
versus RAID cache fetch versus disk medium access) I'd say the same...
cache is golden, especially if you have a set of data you often query.

...Frankie

BTW:  I've done work for the Department of Commerce in the Emerging
Technologies Center (ETC lab)at USPTO (and I believe NOAA is one of the
four parts of DOC).  Is this for NOAA?

 
 
 

Benchmarks for Linux multi-processor.

Post by ekih » Sat, 09 Jan 1999 04:00:00


Frankie,

        Thanks for a very through answer. You raise some interesting points.



> > Greetings,

> >     I have my eye on a new Linux server to join my ever growing cluster.
> > I'm looking at the VArStation XMP from VA Research among others and
> > wondered if anybody has done any performance comparisons?

> > Specifically the .5 -1 -2 Mb Cache upgrades with the Xeon processor how
> > much of a performance difference does it make under Linux?
> > Also using the included RH 5.2 does the second processor really come
> > into play ?  What if you self upgrade to the 2.1.x kernel?

> > This box is destined for scientific number crunching, database
> > functions, and some graphics. So performance definitely is appreciated,
> > the question is whether
> > it's worth 5K to jump from .5 to 1 Mb cache?

> > Thanks,

> > Eric A. Kihn

> Hi Eric,
>     Is your cluster a "cluster" or do you mean your "collection?"  I ask
> because for serious number crunching we use PVM on a Beowulf cluster.

I must admit it is getting hard to tell these days but I guess I mean
collection. I currently have
3 233 Mhz boxes that act as data servers to three 350 Mhz boxes which
have loads of RAM and act as the calculation
servers. The main server is running a CORBA ORB and the boxes each make
various functions of  available through it.
I also run a home brew "Global Data Server" which lets me run SQL
queries and retrieval through a single point.

I've loked briefly at Beowulf and decided it was to complex for my
needs. Was I mistaken? Hows your cluster
perform when crunching serious numbers?

Quote:> 1mb cache pentium pros.  The cache issue depends on the calculations you
> specifically intend to do and your data set sizes.

The meat of our calculations lie in Spherical Harmonic Analysis.
Typically this it ~ 50 Gb total
data for a run.

Quote:

>     My experience with SMP on Red Hat 5.2 is nil, but I did it on 5.0.
> Both install a single processor kernel and require a kernel recompile
> (after editing the makefile to uncomment the SMP declaration).
> Then my results (using both the Byte benchmark code as well as timing a
> series of 1000 fast fourier transforms) were not stellar.  The SMP kernel
> is better at increasing total throughput between processes than at
> increasing specific throughput as in the FFT calculation itself.  It *did*
> show some speedup, just not alot.
> As far as the 2.1.x kernel, I don't know, if the SMP has been significantly
> improved you may want to do it.  But if not then what's the difference?

The rumor I've heard is that the 2.2 kernel will make multi-processor
fly. I'm a numbers man by trade and so was hoping to see some real data
on this. I guess it may come later.

Quote:>Our results were greatly
> improved by the addition of Portland Group compilers to that box.  (we got
> a kit that has C/C++/HPF, and a graphic analysis tool for just under 700
> bucks, these are the same Portland compilers used on our Crays which makes
> moving code around simple).

That's the second plug I've gotten for the Portlan Group. Looks like
they'll be getting a K
or so of my software budget :).

Quote:

> Now, back to cache... cache hits are a source of the most significant
> speedup you can get in your computer next to clock speed.  They are oders
> of magnitude better than memory fetches.  

Still curious why Intel wants so much for .5 Mb cache. I'll have to get
more serious about my analysis of our needs. It will be hard to by .5 Mb
of cache instead of Gb's of RAM if the time comes though :).
I liked your cache discussion because it pointsout that no hardware
makes up for bad programming.

Quote:

> So, so far we've covered graphics and scientific computing.  Now as for
> databases I am no expert.  But since you have a situation where the speeds
> are still differing by orders of magnitude (cache hit versus memory fetch
> versus RAID cache fetch versus disk medium access) I'd say the same...
> cache is golden, especially if you have a set of data you often query.

The data I'm working with is time series oriented. Which means I load a
lot of stations for a time step,
run the model, then dump the data selom to be heard from again. Of
course in the post analysis we run Vis software
a great deal.

Quote:

> BTW:  I've done work for the Department of Commerce in the Emerging
> Technologies Center (ETC lab)at USPTO (and I believe NOAA is one of the
> four parts of DOC).  Is this for NOAA?

Yep I'm a fed but try not to hold that against me. I work for the DOC-
NOAA- National Geophysical Data Center - Solar Terrestrial Physics
division.

Eric Kihn

Physics division

 
 
 

Benchmarks for Linux multi-processor.

Post by strat.. » Sun, 10 Jan 1999 04:00:00



>Still curious why Intel wants so much for .5 Mb cache. I'll have to get
>more serious about my analysis of our needs. It will be hard to by .5 Mb
>of cache instead of Gb's of RAM if the time comes though :).
>I liked your cache discussion because it pointsout that no hardware
>makes up for bad programming.

About two years ago there was an article in PC Week about comparing quad proc
servers with PPro 200s with 256K and 512K caches.  The OS's used were NT and
netware being hit by about 20+ clients.  The systems were identical in each
case.  The 512K cache system had +40% performance increase because it didn't
have to flush the cache as much when it switch processes.  They used their
server bench test.  Obviously your performance will vary depending on the
software you are running and the number of clients/processes.

So you can see why Intel wants ~$3600 for a Xeon 450 with 2mb of cache.

Paul

 
 
 

Benchmarks for Linux multi-processor.

Post by Stephen E. Halp » Mon, 11 Jan 1999 04:00:00




>>Still curious why Intel wants so much for .5 Mb cache. I'll have to get
>>more serious about my analysis of our needs. It will be hard to by .5 Mb
>>of cache instead of Gb's of RAM if the time comes though :).
>>I liked your cache discussion because it pointsout that no hardware
>>makes up for bad programming.

>About two years ago there was an article in PC Week about comparing quad proc
>servers with PPro 200s with 256K and 512K caches.  The OS's used were NT and
>netware being hit by about 20+ clients.  The systems were identical in each
>case.  The 512K cache system had +40% performance increase because it didn't
>have to flush the cache as much when it switch processes.  They used their
>server bench test.  Obviously your performance will vary depending on the
>software you are running and the number of clients/processes.

>So you can see why Intel wants ~$3600 for a Xeon 450 with 2mb of cache.

It's also interesting to look at the workstation vendors such as SGI
and Sun who were charging $10,000 for a module with a CPU and large
cache for their workstations.  The fact of the matter is that PC CPUs
are cheap because of volume and complexity.  The cache on the first
Pentium Pro (256K) was a 15.5M transistor chip.  Going to a 1M cache
in the same form factor meant producing a significantly more complex
chip in far lower quantity, which likely had a lower yield.  You also
had to amortize extra engineering costs to design a far more complex
chip, along with managing all the thermal problems of dissipating 50%
more heat from the same carrier.  Low volume and  high complexity
result in high costs, and the same rules apply to the RISC chips as
well as the higher end Xeons.  As some would say, "it's the cost of
doing business.."

Quote:>Paul

-Steve
 
 
 

Benchmarks for Linux multi-processor.

Post by strat.. » Mon, 11 Jan 1999 04:00:00




>>About two years ago there was an article in PC Week about comparing quad proc
>>servers with PPro 200s with 256K and 512K caches.  The OS's used were NT and
>>netware being hit by about 20+ clients.  The systems were identical in each
>>case.  The 512K cache system had +40% performance increase because it didn't
>>have to flush the cache as much when it switch processes.  They used their
>>server bench test.  Obviously your performance will vary depending on the
>>software you are running and the number of clients/processes.

>>So you can see why Intel wants ~$3600 for a Xeon 450 with 2mb of cache.

>It's also interesting to look at the workstation vendors such as SGI
>and Sun who were charging $10,000 for a module with a CPU and large
>cache for their workstations.  The fact of the matter is that PC CPUs
>are cheap because of volume and complexity.  The cache on the first
>Pentium Pro (256K) was a 15.5M transistor chip.  Going to a 1M cache
>in the same form factor meant producing a significantly more complex
>chip in far lower quantity, which likely had a lower yield.  You also
>had to amortize extra engineering costs to design a far more complex
>chip, along with managing all the thermal problems of dissipating 50%
>more heat from the same carrier.  Low volume and  high complexity
>result in high costs, and the same rules apply to the RISC chips as
>well as the higher end Xeons.  As some would say, "it's the cost of
>doing business.."

Definitely.  I remember how hard it was to get 256K PPro 200s when they first
came out.  166s were easy.  Those $10,000 SGI and Sun CPU modules typically
had approximately 2MB of cache I'll guess.  The DEC ones had ones had anywhere
from 1 to 2 MB on them.

Back to Intel:  I know the the problems they had getting the PPro 512K and 1MB
cache version out so the Xeon was probably equally tough.  So maybe you can
answer this question.  Do chips like the Xeon have their yield enhanced having
chunks of the cache switchable if there is a defect?  That is I start with a
Xeon having a 2MB cache with a few defects can Intel change the microcode to
make it into a Xeon that works with 1 MB of cache?

Paul

.

 
 
 

Benchmarks for Linux multi-processor.

Post by Stephen E. Halp » Mon, 11 Jan 1999 04:00:00





>>>About two years ago there was an article in PC Week about comparing quad proc
>>>servers with PPro 200s with 256K and 512K caches.  The OS's used were NT and
>>>netware being hit by about 20+ clients.  The systems were identical in each
>>>case.  The 512K cache system had +40% performance increase because it didn't
>>>have to flush the cache as much when it switch processes.  They used their
>>>server bench test.  Obviously your performance will vary depending on the
>>>software you are running and the number of clients/processes.

>>>So you can see why Intel wants ~$3600 for a Xeon 450 with 2mb of cache.

>>It's also interesting to look at the workstation vendors such as SGI
>>and Sun who were charging $10,000 for a module with a CPU and large
>>cache for their workstations.  The fact of the matter is that PC CPUs
>>are cheap because of volume and complexity.  The cache on the first
>>Pentium Pro (256K) was a 15.5M transistor chip.  Going to a 1M cache
>>in the same form factor meant producing a significantly more complex
>>chip in far lower quantity, which likely had a lower yield.  You also
>>had to amortize extra engineering costs to design a far more complex
>>chip, along with managing all the thermal problems of dissipating 50%
>>more heat from the same carrier.  Low volume and  high complexity
>>result in high costs, and the same rules apply to the RISC chips as
>>well as the higher end Xeons.  As some would say, "it's the cost of
>>doing business.."

>Definitely.  I remember how hard it was to get 256K PPro 200s when they first
>came out.  166s were easy.  Those $10,000 SGI and Sun CPU modules typically
>had approximately 2MB of cache I'll guess.  The DEC ones had ones had anywhere
>from 1 to 2 MB on them.

Sun recently moved up to 4MB with the 360MHz module for the Ultra 60.
Either way, for integer work the Xeon with 2MB of cache is still a cost
effective chip for certain applications.

Quote:>Back to Intel:  I know the the problems they had getting the PPro 512K and 1MB
>cache version out so the Xeon was probably equally tough.  So maybe you can
>answer this question.  Do chips like the Xeon have their yield enhanced having
>chunks of the cache switchable if there is a defect?  That is I start with a
>Xeon having a 2MB cache with a few defects can Intel change the microcode to
>make it into a Xeon that works with 1 MB of cache?

I wouldnt be suprised if they did have a way to recover partially
functional chips.  I know another manufacturer that recovered chips
by disabling faulty sections because I used to work with those chips.
I also remember reading that the first 486SXs were 486DXs with the
floating point units disabled to be able to sell the chips that didnt
have working floating point units.  Supposedly when the 486DX yields
went up a separate die was produced for the 486SX.

Quote:>Paul

-Steve
 
 
 

1. Upgrading A Single Processor To A Multi-Processor

I have a Sun Ultra 2 running Solaris 2.5.1 with a single 200MHz
processor and 128 MB RAM. I now have a spare 200MHz preocessor for an
Ultra 2 that I want to add to it.
My question is this...
do I have to recompile the kernel for its new role as a multi-
processor, does it recognize this role automatically and take advantage
of the 2nd processor(not likely), or does the system need to be
reinstalled.
I have installed the 2nd processor and the system recognizes that it is
there, but there is no performance difference, and it is using the same
kernel as before. I HOPE that I can just recompile the kernel, and not
have to reinstall the system.
I have not messed with the kernel on a Solaris system, only Linux and
SunOS's, so I am not sure where to start for this.

Thanks!
Stephen

Sent via Deja.com http://www.deja.com/
Before you buy.

2. st_conf.c replacement?

3. Q)if mksysb with Multi-Processor can replace Uni-Processor.

4. can logon to isp, but cannot ping etc

5. Solaris multi-threaded and multi-processor programming tools

6. problems on gtk

7. Linking multi-threaded app for single- and multi-processor

8. Kernel Upgrade

9. Multi-processor/multi-threaded programs on Solaris X86

10. Solaris multi-threaded and multi-processor programming tools

11. Multi-Processor Development for Linux (PCI bus)

12. Linux on multi-processor system

13. porting Linux to multi-processor PPC platform