I'm looking for a solution (and I'm not afraid of devving one if
necessary) to load-balance SSH logins over several mostly identical
I administer a cluster with multiple interactive head nodes. These
head nodes are used by users to edit and compile their code and submit
jobs to the cluster's login-restricted compute nodes. The head nodes
are for all intents and purposes identical in terms of home directories
(shared via NFS), software tools, connectivity, and of course OS.
Currently users choose a head node themselves and log in to head-1,
head-2, etc. This is fine but as more such head nodes come online it
would be nice to provide a means by which users don't have to select
the head node themselves.
We have considered and rejected the use of simple round-robin over the
available machines. We have some users who tax systems more than
others, and our ideal solution would take into account system load by
various metrics (CPU, VM and network load, probably).
I can't think I'm the first genius to try something like this, but I
haven't found any relevant resources on the net.
One way to achieve this would be to define a 'director' box (or service
on one of the existing boxes), set up with an appropriate alias--say
'cluster'. The director would maintain a NAT rule redirecting traffic
for 'cluster' to a particular head node. The director would query each
of the head nodes periodically to find out their system load (some of
this is provided in the default SNMP and is easily extended for new
metrics). The results would be weighed and, if appropriate, the NAT
rule would be replaced with one directing new traffic towards the
Any thoughts? Am I missing something obvious? Any help greatly