Cluster: what type to use?

Cluster: what type to use?

Post by Morris Ebbet » Tue, 11 Oct 2005 04:00:28



I work in a scientific computation group.  In the next year or so we want to
migrate from our Sun servers (one v1280 and one e4500; they're old hence
slow) to a set of Linux workstations.

To share resources, we'd like to cluster the workstations.  The cluster
should have the following properties:
* Load balancing
* Each node should be able to access our RAID without having to use a slow
connection like NFS
* Software for cluster should be relatively easy to install and maintain,
and shouldn't be inordinately expensive (say <~ $30 K)

We _don't_ need the cluster to be capable of true parallelism, because the
computations we run don't need that kind of power (and the software we use,
primarily matlab and some freeware stuff, isn't really set up for parallel
computation anyway).  (Argument here is that we don't need Beowulf, and I'm
supposing that because it offers true parallel computing, there is likely
some higher "cost" in terms of installation/maintenance, but I could be dead
wrong about that.)

Anyone have any suggestions for cluster packages that would meet our needs?

TIA,

S

 
 
 

Cluster: what type to use?

Post by Juha Laih » Wed, 12 Oct 2005 03:30:26



Quote:>I work in a scientific computation group.  In the next year or so we want to
>migrate from our Sun servers (one v1280 and one e4500; they're old hence
>slow) to a set of Linux workstations.

>To share resources, we'd like to cluster the workstations.  The cluster
>should have the following properties:
>* Load balancing
>* Each node should be able to access our RAID without having to use a slow
>connection like NFS
>* Software for cluster should be relatively easy to install and maintain,
>and shouldn't be inordinately expensive (say <~ $30 K)

I'm not saying I have (any kinf of) answer for you, but I have couple
of questions which may help others in finding the answer.

- looks like you're not looking for failover/fault-tolerance, right?
- please elaborate on load balancing -- you said in your other text that
  "true parallelism" is not needed -- so, please explain what it is you
  mean by load balancing; what it should do?
- "each node should be able to access our RAID" -- should there be a file
  system shared across the nodes (each node having simultaneous and equal
  read/write access to a set of files), or do you just have a bunch of
  disk space you want to provide to the machines, with no need for sharing?
  Do you have the disk server already, or should this be part of the
  specification you're looking for -- if you have the disk server/disk
  subsystem already, it would help to know what it is.

Then, NFS necessarily isn't that slow -- esp. if you can provide it
a switched segment of its own (say, 1Gbit/s segment with jumbo frames
enabled). Just make sure that flooding protections in the switch don't
kick in. Perhaps more than one connection trunked to the server so that
server has more bandwidth than any single client.
--
Wolf  a.k.a.  Juha Laiho     Espoo, Finland

         PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
"...cancel my subscription to the resurrection!" (Jim Morrison)

 
 
 

Cluster: what type to use?

Post by Morris Ebbet » Thu, 13 Oct 2005 03:38:41




> >I work in a scientific computation group.  In the next year or so we want
to
> >migrate from our Sun servers (one v1280 and one e4500; they're old hence
> >slow) to a set of Linux workstations.

> >To share resources, we'd like to cluster the workstations.  The cluster
> >should have the following properties:
> >* Load balancing
> >* Each node should be able to access our RAID without having to use a
slow
> >connection like NFS
> >* Software for cluster should be relatively easy to install and maintain,
> >and shouldn't be inordinately expensive (say <~ $30 K)

> I'm not saying I have (any kinf of) answer for you, but I have couple
> of questions which may help others in finding the answer.

Thanks for your kind, informative reply.

Quote:> - looks like you're not looking for failover/fault-tolerance, right?

Correct.

Quote:> - please elaborate on load balancing -- you said in your other text that
>   "true parallelism" is not needed -- so, please explain what it is you
>   mean by load balancing; what it should do?

Just that it wouldn't be efficient if one of the CPUs got many heavy jobs
and the others were free.

If we had separate workstations and users could log into any workstation
they liked, by random chance it might occur that an unreasonable fraction of
the heavy jobs were placed on one CPU.

Quote:> - "each node should be able to access our RAID" -- should there be a file
>   system shared across the nodes (each node having simultaneous and equal
>   read/write access to a set of files), or do you just have a bunch of
>   disk space you want to provide to the machines, with no need for
sharing?
>   Do you have the disk server already, or should this be part of the
>   specification you're looking for -- if you have the disk server/disk
>   subsystem already, it would help to know what it is.

Hmm...don't know enough about disk storage to answer in detail.  What I mean
is the following.
* We already have a RAID, equipped with a Veritas FS.  I think it's being
run by the Sun v1280 right now, but am not sure.
* There's a lot of data.
* To my admittedly naive eye, it doesn't make sense to assign particular
disk space to particular machines, because that would tie particular CPUs to
particular disk space.

Quote:> Then, NFS necessarily isn't that slow -- esp. if you can provide it
> a switched segment of its own (say, 1Gbit/s segment with jumbo frames
> enabled). Just make sure that flooding protections in the switch don't
> kick in. Perhaps more than one connection trunked to the server so that
> server has more bandwidth than any single client.

OK.  Don't know much about that stuff either, but it's very informative to
know that NFS isn't necessarily a terrible bottleneck, whatever else its
faults might be.

Thanks,

S

- Show quoted text -

> --
> Wolf  a.k.a.  Juha Laiho     Espoo, Finland

>          PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
> "...cancel my subscription to the resurrection!" (Jim Morrison)

 
 
 

Cluster: what type to use?

Post by Juha Laih » Fri, 14 Oct 2005 01:14:22






>> >I work in a scientific computation group. In the next year or so
>> >we want to migrate from our Sun servers (one v1280 and one e4500;
>> >they're old hence slow) to a set of Linux workstations.

>> >To share resources, we'd like to cluster the workstations.
...
>> - please elaborate on load balancing -- you said in your other text that
>>   "true parallelism" is not needed -- so, please explain what it is you
>>   mean by load balancing; what it should do?

>Just that it wouldn't be efficient if one of the CPUs got many heavy jobs
>and the others were free.

>If we had separate workstations and users could log into any workstation
>they liked, by random chance it might occur that an unreasonable fraction of
>the heavy jobs were placed on one CPU.

Ok - I think I have the picture now. At least some years ago there was
a product called LSF from Platform Computing, Inc., which might fit your
needs. I think there could be some open source alternatives available as
well, but haven't been following the situation.
--
Wolf  a.k.a.  Juha Laiho     Espoo, Finland

         PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
"...cancel my subscription to the resurrection!" (Jim Morrison)
 
 
 

Cluster: what type to use?

Post by Morris Ebbet » Fri, 14 Oct 2005 04:34:42







> >> >I work in a scientific computation group. In the next year or so
> >> >we want to migrate from our Sun servers (one v1280 and one e4500;
> >> >they're old hence slow) to a set of Linux workstations.

> >> >To share resources, we'd like to cluster the workstations.
> ...
> >> - please elaborate on load balancing -- you said in your other text
that
> >>   "true parallelism" is not needed -- so, please explain what it is you
> >>   mean by load balancing; what it should do?

> >Just that it wouldn't be efficient if one of the CPUs got many heavy jobs
> >and the others were free.

> >If we had separate workstations and users could log into any workstation
> >they liked, by random chance it might occur that an unreasonable fraction
of
> >the heavy jobs were placed on one CPU.

> Ok - I think I have the picture now. At least some years ago there was
> a product called LSF from Platform Computing, Inc., which might fit your
> needs. I think there could be some open source alternatives available as
> well, but haven't been following the situation.

OK; thanks for the tips!

- Show quoted text -

> --
> Wolf  a.k.a.  Juha Laiho     Espoo, Finland

>          PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
> "...cancel my subscription to the resurrection!" (Jim Morrison)

 
 
 

Cluster: what type to use?

Post by Douglas O'Nea » Sat, 15 Oct 2005 02:28:27




> <snip>

> Ok - I think I have the picture now. At least some years ago there was
> a product called LSF from Platform Computing, Inc., which might fit your
> needs. I think there could be some open source alternatives available as
> well, but haven't been following the situation.

One open-source alternative to LSF is Grid Engine (formerly Sun Grid Engine
and also available commercially from Sun as N1 Grid Engine).  Look at
http://gridengine.sunsource.net for more info.

Doug

 
 
 

1. How do I make a Beowulf type Cluster?

I was sitting here, thinking, why can't these 486 boards be used for
something....
My immediate goal is to take say 4 486 motherboards w/cpu&ram and
connect them
somehow (ethernet most likely?). To use for RC5-64.
Someone suggested using 'old riser boards' and 'power Y'ing them off a
230W supply'
and use bootable netcards...

I have visited www.beowulf.org but most of their info is either not
applicable
or above my level of understanding. I need to know specifics on hardware
installation.
From what I can understand, the software side should be much simpler
through Linux.

Please Email any info in addition to a newsgroup post :)

2. tcgetattr failed for Motorola MODEMSURFR 56K

3. What's wrong with this box on a beowulf-type cluster ?

4. Current process count

5. Sun Cluster 3.0 vs Veritas Cluster Server

6. Please Help me?

7. New Whitepaper: Sun Cluster 3.0 Cluster File System

8. Lost Cursor in console mode - how to recover

9. To Cluster or Not to Cluster

10. HACMP : Same cluster id for several clusters on a network

11. Veritas Cluster versus SUN Cluster

12. libCstd.so.1 (Developer system support cluster vs. Core Solaris cluster )

13. Best Practices "Migrating Veritas VCS Cluster to SUN Cluster"