SUMMARY: To "nice" or not to "nice" [LONG]

SUMMARY: To "nice" or not to "nice" [LONG]

Post by Sheryl Coppeng » Fri, 31 May 1991 00:04:51



My original posting was:

>Historically (since before I worked here), there was a policy for
>users running large jobs which ran along these lines:
>    1)  Run them in the background
>    2)  Start them up with "nice" to lower priority
>    3)  Only one such job per machine
>I have a user challenging that policy on the grounds that UNIX will
>take care of it automatically.  I am aware that some systems have
>that capability built in to the kernel, but I am not sure to what
>extent ours do or how efficient they are.  I have looked
>in the mauals for both of our systems (Sun and HP) and in the
>Nemeth book, but they are pretty sketchy.

I went on to ask if other sites had such a policy and if anyone
had information specific to the machines we used (SunOS 4.1/4.1.1, HP-UX 7.0).
I waited until after the holiday weekend to summarize, in case other sites
have short expires on their news articles.

Response was mostly from adminstrators and overwhelmingly FOR the policy.  
Details varied depending upon the type of machine, the environment and other
factors (including, perhaps, the temperament of the administrator).  

People who said the policy was unnecessary seemed to be, like the user
challenging the policy here, quoting the general texts about what SHOULD
happen in the UNIX operating system (the Maurice Bach book or the BSD Daemon
Book).  The administrators were more likely to quote O'Reilly & Associates'
_System Performance Tuning_.  I was read a statement over the phone along
the lines of "Users will tell you that 'nice' doesn't have any effect --
don't believe them".  Some of those responding either had or were writing
software to automatically renice or start and stop processes.  I have a copy of
one package and will try to get others and experiment here.

Some replies contained assumptions about the type of programs being run and
pointed out that programs which were I/O bound or doing a lot of paging would
not be affected by nicing.  Something along those lines may be what's
happening when ksh or finger processes run wild and take over the CPU.

Unfortunately, I haven't had a chance to run experiments here.  Blair Houghton
was kind enough to do so and post the results here, but since they were for
Ultrix I doubt I will get the same results on our systems.

Some interesting statements taken out of context:

        "nohup" automatically nices jobs

                (True on SunOS but not HP, and part of the problem is that
                users are running from the shell and not backgrounding jobs)

        Kernels that do renice default to 4 which is insufficient.  On SunOS,
        a "nice +4" will still allow large jobs to interfere with nfsd and
        inhibit file server function.

                (NFS interference was noticed here, and often we found large
                jobs on file servers because users called in to complain
                that they couldn't login to a workstation or got NFS "not
                responding" errors.)

        SunOS won't renice processes but HP-UX will.  However, HP-UX will
        pop the priority up high again after a time.

                (I heard about the automatic renicing first in an HP context.
                HP-UX handles realtime priorities, and I think you have the
                option of loading daemons as realtime processes in order
                to improve NFS, etc. Large processes have been less of a
                problem on our HPs, but our graphics users notice a difference
                when they're trying to run animations on an HP9000/835 and
                jobs are running in the background.  We also have a problem
                with runaway ksh processes and the kernel never seems to
                detect those and lower the priority enough to allow interactive
                users to get their work done).

        Ksh and csh do NOT change the priority of background jobs, but the
        Bourne shell will.

                (We run ksh mostly, occasionally csh or bash).

        Users should be required to use the "batch" command instead of the
        "nice" command, because "batch" lowers priority.

                (I can find no evidence in the man pages that batch does
                this.  In the most recent version of the policy, we require
                the users to batch AND nice jobs.  Batch schedules according
                to load according to the manual.  It also lets users run
                multiple jobs serially).

Many thanks to all who responded (and to those who probably will respond
to this posting too).

Below I include edited copies of the replies I received by mail.  If anyone
didn't see the follow-up postings, I will be glad to mail them a copy (I
have 4, I think).

===============================================================================

From n...@fwi.uva.nl Thu May 16 05:18:23 1991

  Renice is only used by the scheduler to give each process a Priority. The
  evaluation of priority takes into account recent CPU usage, and is weighted
  heavily in favour of interactive processes that require CPU time in short
  bursts. Typically a 'renice 19' on big processes has little effect, since
  thay will be constantly paging, which is unaffected by nice values ( its in
  the bottom half of the kernel I believe ). A `renice -19` will however quite
  possibly stop your system, by giving other processes virtually no CPU activity.

    One of the best sources of info on this side of UN*X is the BSD Daemon book
  by Leffler, McCusik and Karels. IF you don't have access to the book I can
  find out the full name and ISBN number I you need it. The book is _Mega_, a
  sort of BSD bible.

From onward@freefall Thu May 16 09:49:33 1991

  There is no definite answer to the question you have.
  As much of it is a matter of etiquette as it is OS specific,
  plus, it depends very much on what the jobs do.

  However, here are some points to think about:

  0.  Unix does not take care of it automatically.  It only tries.

  1.  nice only modifies the Scheduling priority, not the execution/cpu
      priority. (Internals).

  2.  processess lose priority if they are constantly runnable (ie. more
      or less cpu bound).  When they get an I/O interrupt, their priorities
      jump up, so that they can complete their I/O call, but if they then
      hangs on the cpu, the priority drop real quick again. (Job Type).

  3.  multiple large jobs do drag down a machine.  Statistically, this is
      NOT due to the cpu resource being exhausted, but due to the amount
      of paging involved with large processes.  (Job Size)
      Suns do not do context switching too well when the number of runnables
      go past 8 (or was it 16 - there was much discussion about this
      in comp.arch 2 months ago)

  3a. if the machine is diskless and 8 MB in real memory, even one large
      job is noticeable if someone is also working on the workstation.

  4.  Policy suggestion: on hp90000s300, s400, s3 and s4, don't worry
      too much if they are functioning as single user workstations and not
      multiuser servers.  s800 machines were designed to be multiuser
      servers, except perhaps for the 815, so you may not want more than
      3 or 4 large jobs running on it simultaneously.

  5.  Try to make a balance between:

        a) online user response time
        b) turnaround time for large jobs
        c) machine resources (machines with lots of real memory tends
           to run large processes much better)
        d) machine dedication
        e) do your users really need to run that many large jobs ? or
           are they just letting the computer do their thinking for
           them ?  Remember the old days when resources were REALLY
           expensive, and people tried out their models by hand before
           putting them through the system.

From ke...@remus.ee.byu.edu Thu May 16 10:43:23 1991

        We have about 50 hp 9000/300 and we always ask users to start long
  jobs niced as much as they can (19).  We have enough machines that one can
  usually be found with not much running.  This does not hurt the owner
  because as you probably know if nothing else is going on they will still
  get all of the cycles even if they are niced.

        Interactive users pay the price if the job occupies a lot of memory
  as the process swaps in and out.  We have solved this problem for most
  users with a program called real-nice.  It monitors key strokes about every
  minute and will not swap a large job in unless the keyboard has been idle
  for more than a minute.

        I have had no problems convincing users to use these tools and the users
   pretty much police each other.  Once in a while I get a complaint and so we
  have written a renice for hp-ux and that solves all of these complaints.

From c...@pender.ee.upenn.edu Thu May 16 12:31:51 1991

  I administrate a Sun 4/280.  During my tenure it ran everything from
  4.0 to 4.1.  It is used for CPU intensive processes that last from
  seconds to weeks.  It is also our mail and news server for the
  department, so we have to keep interactive performance up.

  After a year of hand nicing processes in various combinations, I have
  chosen the, and written a program to implement it. This policy was
  designed to meet the following criteria:

  1) Interactive use should be not be significantly degraded by system
  load.

  2) Since people frequently run CPU intensive processes in foreground,
  some other way must be used to distinguish interactive from
  non-interactive use.

  3) Since people frequently will use screenlock rather than logout on
  their personal workstations, interactive processes may accumulate
  significant total CPU usage.

  4) The implementation of this policy should not require users to do
  anything.  

  5) People running CPU-intensive jobs should each get an equal portion
  of the CPU.  Specifically, someone running two processes should not
  get twice as much CPU as someone running one process.

  Here's the procedure I use now, with comments about what I'd like to
  improve.  I am planning on rewriting this program over the summer so
  that I can use it on all of the machines that I administrate, and so
  that I can distribute it to other sysadmins.

        <some garbage was inserted here.  I'm not sure how much was lost>

  specific values were determined empiricly.

  I have a program that runs every six minutes.  It creates a list of
  all processes that have used more than 2.5 CPU minutes.  In my
  environment this excludes two week old emacs sessions while catching
  most CPU intensive processes in the first 6 or 12 minutes.  

  This list is then sorted by user, and each process is niced according
  to the total number of processes owned by that user.  IE, each user's
  processes all run at the same nice value. Remember that we are
  ignoring all interactive processes, so they are not reniced.

  Nice values are assigned according to the following table:

  Number of jobs        1       2       3       4       5       6       7
  "nice" value        5       9       12      14      15      16      17

  In general, I have had great success with this system.  The users
  prefer it to getting yelled at when they forget (or didn't know to)
  renice their jobs.  The users who did renice their jobs like the fact
  that they don't have to bother, and no one else can "cheat".  The
  interactive users like the fact that system performance is pretty
  stable.

  Here are the things I'd like to improve:

  1) These values "encourage" people to run one job at a time.  I two
  people are running one job each, and two people are running two jobs
  each, and another person is running three jobs, the last person's jobs
  are effectively stopped.  A better scheme would be to renice the first
  job to 4 and all other jobs to some high nice value.  When the first
  job finished, the next job would be reniced to 4, etc.  I am
  concerned, though, about the possability of someone running a long,
  CPU intensive pipeline of commands.  I haven't come up with a better
  way to handle this while still maintaining "fairness".  

  When I notice "stopped" jobs of this sort, I send a form letter to the
  user explaining that while running multiple jobs is not forbidden,
  they would run much faster if done sequentially.  This letter explains
  how to use "batch" to run jobs sequentially.  Most of my users were
  not specifically choosing to run in parallel, but were simply trying
  to run all three job overnight.

  2) I'd like to add a test for the size of the jobs, so that one user
  cannot use up the entire virtual memory of the machine.  I am
  considering killing single jobs if they use more than 48 Meg, and
  multiple jobs if they total more than 32 Meg.  

  The reasoning is that while generally I don't want people using more
  than 32 Meg, I understand that some jobs legitimately need more.  But
  if you are running a huge job tghat requires over 32 Meg, you
  shouldn't be running other jobs at the same time.

  I realize that this information is somewhat disorganized.  Please feel
  free to write me for further explanation or more information.

From k...@ee.eng.ohio-state.edu Thu May 16 12:48:16 1991

  Most BSD derived systems will do that, but the reduction is
  insignificant.  They will nice the long running background job to
  level 2 by default (SunOS is an example of this), but that still is
  high enough to seriously interfere with a multi-user system.

     What are other system administrators doing about this issue?

  We have a policy that varies with the type of machine.  On our Sparc2s
  we allow more long running jobs, but ask people to keep the load below
  8.  In general, our policy is that anything that's going to take more
  than 1/2 hour should be niced to level 20, a user can run one job on a
  machine and no more than 2 longterm jobs on the network at once.
  Further, for most of our machines (SLCs, Sun3s) we require that no
  more than 2 longterm jobs be running, but we allow up to 8 on our
  Sparc2s.  Penalties for violating policy include a warning the first
  time, a conference with the advisor and offender the second time and
  the death penalty the third time (with no appeal possible).

     If there are good reasons for the policy, I want to be able to
     justify it as well as enforce it.

  The reason is that we want people to be able to get work done.  We
  have had a few grad students who submitted 8 jobs to one machine and
  brought everybody else's work to a halt.  He didn't last long ;-)

  -rich

  ps.  One thing you'll notice is that "nice" has no effect on I/O
  limited programs.  That's a small trouble around here, though, since
  much of our work is numerically intensive.

From bernh...@qtp.ufl.edu Thu May 16 12:52:19 1991

  We use a similar policy around here, on our network of Sun 3/50s and
  4/380 file servers, 4/490, FPS-500, and IBM RS/6000-530 compute
  servers.  The number of jobs depends on the machine, and is subject to
  revision, since we are trying to find the best balance.  On the 3/50s
  we don't care how many jobs -- they are all on desktops and allocated
  to individuals.  On the file servers, we currently allow two at a time
  - one long-running and one less than 1 hr.  On the comupte servers,
  the limits are somewhat higher, but we try to avoid having too many
  jobs at once so that there is ample virtual memory for the running
  jobs.

  The Suns, running SunOS 4.1.1, perform much better with the jobs
  niced.  Otherwise they are competing against nfsds on basically an
  equal level, which impairs the file server function.  Running them
  "nice" shifts the balance towards the file server capability --
  basically the batch jobs run in the "holes".  SunOS's scheduling
  algorithm doesn't seem to do this "automatically" -- at least not to
  the extent we want.

  The RS/6000 and FPS-500 are run as compute servers, so we're less
  concerned about niceness on them, though on the FPS, we are using
  different levels of niceness to give priority to the group that paid
  for the machine over those who get a free ride.

From appmag!curly...@hub.ucsb.edu Thu May 16 14:04:00 1991

  Don't know about HPUX's.  Back at Carnegie Mellon, our bsd 4.[23] would
  renice anything to +4 that had accumulated more than 5-10 minutes of
  CPU time.  And it wasn't enough.  Empirically, `nice +8' (csh syntax)
  was better, i.e. it would preserve interactive response.  The
  interactive users would get all the cycles they needed, and the
  background jobs would compete for the rest.  That's for CPU cycles.
  Now if a memory hog came in, the machine could very well start to
  thrash or run out of paging space.  In this case only, it would be
  important to limit the number of jobs.  This was on a VAX 785.

  I repeat, the system's handling was inadequate.  I had to periodically
  post instructions on the local bboard, because too many users didn't
  know how to lower their priority manually.

  On workstations AIX and DGUX (both SysV derivatives) I never noticed
  any attempt by the system to change priorities.  If I want to keep my
  interactive response, I have to nice jobs to 12.  If I had lots of
  naive users, I would probably write a little renicing daemon...

From octela!octelb.octel.com!...@mips.com Thu May 16 14:25:28 1991

  I run SunOS (mostly 4.0.3, but some 4.1.1) so can't speak for HP-UX.
  I don't believe that SunOS "automagically" prioritizes jobs for you
  (oh that it were true!). I have users fire up multiple large jobs that
  beat the heck out of the machine. Nice'ing these makes a world of difference,
  particularly for interactive response (my servers are CPU & NFS servers, and
  handle 10-20 logins). Multiple large jobs really kill performance, quickly
  putting the machine into thrashing mode (particularly compiles on the same
  spindle). Of course these are 3/480's, my 4/490 does a little better :)

  The kernel may handle prioritizing things like NFS service vs. local I/O, but not
  multiple user jobs. Unless you count multi-tasking, which means you can run
  multiple jobs, and they should get near equal time (depending on a lot of
  factors).  But what you really want is to prioritize interactive vs. long
  term CPU jobs such that big compiles don't affect joesphine user's rn session:)

  But it all boils down to politics, and what are the local "policies", what
  managment can/will support and how creative the users get at submiting jobs.
  My experience is once you get management to agree to specific policies, stick
  to them, once you allow an exception you open the floodgates. But, it is a
  good idea to have the policy specify exceptions, and when/how they are allowed.
  So, at the end of the fiscal year, with deadlines looming, we can say "Yes, you
  can run multiple jobs, but it requires xxx permission". The neat trick is to
  get xxx to understand when to give permission.

  Are you running sps/ps/vmstat to look at what the system is doing? This might
  help "prove" the OS isn't scheduling intelligently. I also found "System
  Performance Tuning" (O'Reilly & Assoc.) useful.

From jmatt...@UCSD.EDU Thu May 16 14:35:47 1991

  Here, we have about 40 Sun-3's and about 35 Sun-4's running 4.1.1, along
  with s few other oddballs (HP, Vaxen, etc.).  We are responsible for
  faculty, staff, and graduate student machines in offices and labs.  Faculty
  and staff are rarely a problem, but the graduate students are working with
  insufficient computing resources, and there have been several problems with
  people being inconsiderate of others.

  On the primary graduate student machine (a sun 4/370 w/32mb of memory), we
  don't allow long-running jobs at all.  We run a daemon that enforces nice 19
  on all jobs with over 5 minutes accumulated CPU time (with the exception of
  shells, editors, etc.).  Furthermore, if the system performance gets really
  bad for interactive use, we will look for any long-running jobs that are in
  violation of the usage policy and ask their owners to kill them.

  We do provide one machine explicitly for long-running jobs (a sun 4/280 with
  56mb of memory and LOTS of swap).  The same daemon enforces nice 4 on all
  jobs with over 5 minutes accumulated CPU time here (same exceptions).

  We also have problems with a graduate student lab of 12 Sparcstations.
  People tend to leave long-running jobs on these machines which can really
  degrade interactive performance (especialy since these machines only have
  16mb of memory).  Here, the same daemon will STOP any job with over 5
  minutes accumulated CPU time if there is an active (idle less than 5
  minutes) console user, and if the job doesn't belong to the console user.
  Stopped jobs will be continued when the console user logs off or goes idle
  for more than 5 minutes.

  We have found that the biggest problem with long-running jobs is not their
  CPU usage, but their memory usage.  On a machine like a Sparcstation with
  SCSI drives, paging is just too slow.  Once the physical memory of these
  machines is exhausted, paging starts and performance drops by an absolutely
  incredible amount.  (The machine can be idle 75% of the time waiting on disk
  pages, even with several jobs in the run queue.)  This is the primary reason
  for not allowing long-running jobs when there are interactive users on the
  machines.

  SunOS 4.1.1 uses the same priority scheme that 4.2 BSD used.  Jobs with the
  best priority are scheduled in round-robin fashion.  Every second,
  priorities are recalculated, so that jobs which have not obtained much CPU
  "recently" will get better priorities.  This ensures that no one starves.
  The nice value is used in the priority calculation, to reduce the demand
  that a particular job makes on the CPU.  However, even a very nice job will
  get some CPU every now and then--even on a heavily loaded system.  The
  problem is that if there are high demands on physical memory, the nice job
  will probably have lost all of its pages while waiting, and it will
  immediately page fault when it gets scheduled to run.  With enough big jobs
  running in the background, your machine will start to thrash.  Nothing in
  SunOS checks for this or attempts to do anything to alleviate it.

From chs!d...@jetson.UUCP Thu May 16 15:14:29 1991

  Well, experiment will probably quickly convince you that for instance
  running multiple troff jobs at once will be slower than running the
  same jobs sequentially.  Unix will attempt to be fair about running
  several large jobs, in the sense that it will attempt to give them all
  equal parts of the cpu over relatively short periods of time
  (seconds).  Because of the time spent context switching between jobs,
  and (if they're large enough) the swapping resulting from using more
  than available physical memory, several large jobs at once will take
  longer to run than the same jobs in sequential order.

  The scheduler may have some bias toward interactive jobs built in, but
  it is quite easy for large jobs on Sun's to make life miserable for the
  interactive user.  Nicing these jobs will help.  Running only one at a
  time will help.  Running large jobs at odd hours (using at) will help.

  The scheduler does not have the smarts to do these things itself.  I
  don't think any internal knowledge about Unix is necessary; the effects
  of running large jobs is immediately evident in slower response time.
  If you don't see slower response time, then it's probably not worth
  worrying about.  If you do, experiment with renicing the job(s) in
  question.

  The "top" command (available from the sun archives at Rice) is helpful
  in telling what jobs are actually eating up the cpu.  You might want to
  try running that.

From ch...@suntan.ncsl.nist.gov Thu May 16 15:28:52 1991

  Someone already posted about specific systems that automatically
  renice a cpu-bound process.  Most don't, however.  It's a good
  solution for those who don't follow the policies you've outlined.

  I would omit the third policy though.  If a process is niced, I
  haven't seen any significant performance degradation if there are one
  or five of them.  That is, processes sitting in the ready-to-run queue
  (but not running due to low priority) have little effect on system
  performance on SunOS systems I've worked with.  You should perform the
  same experiment on your system.  Yes, the load average WILL go up (all
  that tells you are the # of processes *ready* to run, not actually
  running), but interactive response should be more than adequate.

  However, I've taken this problem and cut it off at the head.  All our
  users run "tcsh" which executes /etc/Login if it exists.  All
  workstations have this file; my personal Sun 386i workstation running
  Sunos 4.0.2 is called "suntan":

      # tcsh file exec-ed by all users before ~/.cshrc
      #
      if ( $HOST == suntan && $USER != chris && $USER != root ) then
            /etc/renice +20 $$ >& /dev/null
            echo "System response may seem a bit sluggish..."
      endif

      # Stan is a special case.  On *all* systems he get's niced.
      #
      if ( $USER == stan ) /etc/renice +15 $$ >& /dev/null

  A little confusing, but basically if it's not me (Chris) or root
  logging into my system, their login shell gets reniced severely and
  all their subprocesses inherit the login shell's nice level.  Other
  machines they don't get reniced at all since I don't use them.  :-)

  This has the unfortunate side-effect that although their cpu-intensive
  processes don't interfere with me, all their processes run at the same
  priority.  Thus, if they launch something into the background, a current
  editing session will run at the same (low) priority.  Now that I think
  about it, I should nice them in their login shell to 15 so they can
  nice their background jobs to 20 should they desire.

  Surprisingly, this setup works REALLY well!  Most users don't even
  notice the subtle login message when they get reniced.

  You might want to run your shell's executable through "strings" to see
  if it executes any files prior to the users home .login/.cshrc/.profile.

From rev...@uunet.uu.net Thu May 16 18:46:03 1991

  I think your user may have been talking about "nohup"'ed jobs.  Nohup
  increments the priority by 5.  I don't know of any systems that alter
  the priority just because the job is changed to the background.

From @jhereg.osa.com:nightowl!...@tcnet.uucp Fri May 17 04:57:21 1991

  What shell(s) are your user's using?  Ksh and Csh do not change the priority of
  jobs in the background while Sh will automatically "nice" the job by four.
  Here is the results of the command "sleep 300 &" under the shells ksh, sh, and
  csh, respectively:

   F S   UID   PID  PPID  C PRI NI     ADDR     SZ    WCHAN TTY      TIME COMD
  10 S  1001 13787 13784  0  39 20   4026b4     11 e0000000 ttyF01   0:00 sleep
  10 S  1001 13792     1  0  39 24   4026b4     11 e0000000 ttyF01   0:00 sleep
  10 S  1001 13814 13813  0  39 20   4026b4     11 e0000000 ttyF01   0:00 sleep
                              ^^

From can...@nrccsb3.di.nrc.ca Fri May 17 18:02:44 1991

  I insist my users use "batch" instead of nice.  I forbid them to use &
  because it penelizes too much interactive users.  "batch" runs at a
  lower priority AND sends any generated output to the user via mail.  

  You can also limit the number of "batch" jobs running on the system by
  modifying /usr/spool/cron/queuedefs.  This way, users can send
  many jobs for execution, but if the maximum limit is reached, those jobs
  will be queued simply to be run when other complete.

From mcorri...@UCSD.EDU Fri May 17 23:32:34 1991
  >
  >  1)  Run them in the background
  >
                Yes,

  >  2)  Start them up with "nice" to lower priority
  >
                Yes.

  >  3)  Only one such job on a machine
  >
             Depends on how *big* since 2 mediums could make one big

  >I have a user challenging that policy on the grounds that UNIX
  >will take care of it automatically.  I am aware that some systems
  Not so. Some UNIXes ( BSD ) will automatically renice a job to
  a nice of 4 after a certain number of minutes, but HP-UX does
  not do this. It is true that the UNIX scheduling algorithm
  lowers the priority of a job based on how time it has gotten recently
  but then the priority pops back up if it got low for a while.
  The algorithms are sophisticated with at least 2 regimes of
  scheduling, but all these are intended for interactive response
  to be maintained at an acceptable level , or that a fair share be given
  at all times to ALL jobs. When A job lasts for 50 hours, then
  it just doesn't make sense to allow it do get a fair share with the
  interactive users. If you lower the priority to as low as it can go ( nice
  == 19 ) then I find that the job may get no time for part of the
  day but whenever the system is idle the job gets right back in
  there for 100% of the cpu ( like from midnight to 9 am).

  For canned software packages I do the reniceing myself
  by wriintg a C program that is the one that is in the path
  that nices itself and then calls the real package with all the
  same args but now runs at low priority to start with.

From bach!chu...@ncr-mpd.ftcollinsco.NCR.COM Mon May 20 19:22:32 1991

  see batch(1).  It's part of SVr[34] and included on Suns.  Don't know about
  the others.

From c...@hawkwind.utcs.toronto.edu Tue May 21 22:58:08 1991

   Blair Houghton has already posted some nice numbers on this subject
  (and some formulae). My local experience has been that a nice value of
  between 10 and 15 will keep the interactive users from feeling the
  extra load, even if the niced jobs are thrashing around on the disk a
  fair bit. I've run parallel kernel builds at +10 to +19 on not too
  studly Vaxen and only had people monitoring the load notice (load
  averages of around 20+). However, even one or two processes grinding
  away at nice 4 (the default 'renice' value on the few kernels that do
  this to processes) will be easily noticed by the users.

   You might want to see if you can get some sort of job batching
  system; there are a number of nice ones floating around. The better
  ones do things like stop the running job(s) once the load average
  climbs to high, or stop the running job(s) when N people are logged
  on, and so on. Better yet, you get the source, so you can put in
  custom hacks if necessary to adapt them to local convetions.

--

Sheryl Coppenger    SEAS Computing Facility Staff       she...@seas.gwu.edu
                    The George Washington University    (202) 994-6853          

 
 
 

1. "Nice" or not to "nice" large jobs

Historically (since before I worked here), there was a policy for
users running large jobs which ran along these lines:

        1)  Run them in the background

        2)  Start them up with "nice" to lower priority

        3)  Only one such job on a machine

I have a user challenging that policy on the grounds that UNIX
will take care of it automatically.  I am aware that some systems
have that capability built in to the kernel, but I am not sure
to what extent ours do or how efficient they are.  I have looked
in the manuals for both of our systems (Sun and HP) and in the Nemeth
book, but they are pretty sketchy.  In my previous job, I was doing
support for realtime systems written on HP 9000/800-series and I
am fairly sure about what happens in realtime but that doesn't
help in this case.  

What are other system administrators doing about this issue?  Can
any internals experts point me to something definitive about my
particular OSs?  We have

        HP 9000s - 300, 400 and 800 series, running HPUX 7.0
        SUN 3s running 4.1 and SUN 4s running 4.1.1

I'm looking for the specific, not the general.

If there are good reasons for the policy, I want to be able to
justify it as well as enforce it.

Thanks in advance

--

  Sheryl L. Coppenger        

  (202) 994-6853          

2. using windows to navigate with linux...

3. GETSERVBYNAME()????????????????????"""""""""""""

4. getline missing?

5. """"""""My SoundBlast 16 pnp isn't up yet""""""""""""

6. Need Help with ttmkfdir

7. Type "(", ")" and "{", "}" in X...

8. Problems compiling 2.2.1 kernel

9. "umsdos" vs "vfat" vs "looped ext2"

10. "Novell-like","non-TCP/IP","networking" OS to place Unix

11. "netstat -nr" should show "default" or "0.0.0.0"?

12. "write" "to" "flon" commands

13. Problems with "df" and "du" on "/var"