nfsroot, Bus error, comm. error, 3c503+9

nfsroot, Bus error, comm. error, 3c503+9

Post by joost witteve » Thu, 19 Sep 1996 04:00:00



I frequently use the nfsroot filesystem.
However, sometimes (this seems to happen more often when there
is something going on on the network) a random programme started
on a system using an nfsroot filesystem will abort with a
"bus error", or signal 7. When restarting the programme it usualy
runs succesfully.

As the nfsroot systems usually start 50-something processes, this
results in a failure rate of approx 50%, and it thus becomes a
real problem.

When looking at the boot process with tcpdump, the first thing
that differs between a succesfull boot an an unsuccesful one seems
to be: "ERROR: Communication error on send [|nfs]". It was my understanding
that such errors are in principle harmless, and should only result
in the client trying to re-read the file. This indeed happens, but,
apparently with some error.

I would really appreciate any info -- eighter how I can get better
diagnostics, or other things I may try.

System:
  server
    Linux-{2.0.14,20.0.20},      ethernetcard: 3c503
  client  
    Linux-{2.0.15,2.0.18,2.0.20} ethernetcard: 3c509, 3c503
   (I tried most permutations with server/client kernel, no diff)

$ /usr/sbin/rpc.nfsd -v
Universal NFS Server 2.2beta16

A slightly longer tcpdump (mail me of more or any other info):

rulvsc is the nfsroot client, rulcmc is the server.

tcpdump -s 200 host rulvsc
[..]
22:52:05.723068 rulvsc.LeidenUniv.nl.1c0d18ba > rulcmc.leidenuniv.nl.nfs: 116 lookup fh Unknown/1 "lib"
22:52:05.723068 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18ba: reply ok 128 lookup fh Unknown/1
22:52:05.723068 rulvsc.LeidenUniv.nl.1c0d18bb > rulcmc.leidenuniv.nl.nfs: 128 lookup fh Unknown/1 "ld-linux.so.1"
22:52:05.753069 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18bb: reply ok 128 lookup fh Unknown/1
22:52:05.753069 rulvsc.LeidenUniv.nl.1c0d18bc > rulcmc.leidenuniv.nl.nfs: 108 getattr fh Unknown/1
22:52:05.773069 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18bc: reply ok 96 getattr REG 100755 ids 0/0 sz 99308

22:52:05.803070 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18bd: reply ok 1124 read

22:52:05.813070 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18be: reply ok 1124 read

22:52:05.833070 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18bf: reply ok 1124 read

22:52:05.843071 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18c0: reply ok 1124 read

22:52:05.863071 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18c1: reply ok 1124 read

22:52:05.873071 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18c2: reply ok 28 read ERROR: Communication error on send [|nfs]

22:52:05.873071 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18c3: reply ok 1124 read

22:52:05.883071 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18c4: reply ok 1124 read

22:52:05.883071 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18c5: reply ok 1124 read

22:52:05.893072 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18c6: reply ok 720 read

(this is the end, here all communication ends)

Thanks very much,