ISC V3.01 system(network) hang with no activities on the system

ISC V3.01 system(network) hang with no activities on the system

Post by Eam » Fri, 19 Mar 1993 11:46:44



Software: Interactive UNIX System V/386 Release 3.2.2.1
          TCP/IP V1.2 with UPDATE SSU.4a
          NFS V2.1 with UPDATE SSU.4b
          STREAMS facilities Version 2.2

Sympton :
  When it happened(It happened intermittent), the kernel was alive
  (By setting a diagnostic terminal connected to the system with
  u386mon, a kernel monitor software utility. When hung up happened
  the utility displayed the kernel was still active.), but the network
  was down, which means I could not ping it, NFS mount time out, telnet,
  ftp all failed.

  To follow up this problem, I put a script in the cron job
  to create some report of the system activities before it
  hung. There was nothing perticular interested me but the
  "netstat -m" report. Here is partial report of the "netstat -m"
  before system hung up. Somehow at some point, the dblocks
  fail number raise sharply, and it increase in a speed of
  4500 for every 10 minutes till the network totally hung(die).

                 alloc   inuse     total     max    fail
streams:           256      99       923     102       0
queues:           2048     548      5406     566       0
mblocks:          2170     568  16725347     632       0
dblocks:          1736     568  14161660     632  218936
dblock class:
    0 (   4)       256       2     42905       6       0
    1 (  16)       256      49     26162      69       0
    2 (  64)       256     223   8266563     230  218936
    3 ( 128)       512     286   3310483     344       0
    4 ( 256)       128       0   1798171      17       0
    5 ( 512)       128       8    544497      20       0
    6 (1024)        64       0    158976      12       0
    7 (2048)        80       0     13621      14       0
    8 (4096)        56       0       282       3       0

The last "netstat -m " before it hung:

dblocks:          1736     807  24255181     813  4651552
dblock class:
    0 (   4)       256       3     50422       6       0
    1 (  16)       256      49     29413      69       0
    2 (  64)       256     230  10398470     231  4163671
    3 ( 128)       512     460   9045030     460  487881
    4 ( 256)       128      32   3443772      38       0
    5 ( 512)       128      32    996438      34       0
    6 (1024)        64       0    274679      12       0
    7 (2048)        80       1     16533      14       0
    8 (4096)        56       0       424       3       0

  I did some tune up of the kernel parameters related to network
  (For example, NBLK64,NBLK128...) to a much larger
  number. It did not solve the problem (but improved so system
  would not hung every other day).

My question:
  I understand the hung up is because the kernel resources
  is not enough as showed in "netstat -m".  But what I do not
  understand why this happpened even when there were no
  system or network activities? What could trigger the failure?
  And this hung up seems related to the network traffic.
  I say this because this system only needs to talk to one other
  system on the net.  If I used a simple local network to the
  other system instead of connected to the company's backbone,
  the system never hung up. I do not understand this. I thought the
  network board would only pick up the packet on the backbone
  that has it's network physical address. In other words,
  there should be no differences between putting the system on the
  backbone or on the local net.  Correct me if I am wrong.
  Thanks!

  HELP! This problem is killing me, because the system will just
  sit there and die. Any information is welcome.