nfsroot, Bus error, comm. error, 3c503+9

nfsroot, Bus error, comm. error, 3c503+9

Post by joost witteve » Thu, 19 Sep 1996 04:00:00

I frequently use the nfsroot filesystem.
However, sometimes (this seems to happen more often when there
is something going on on the network) a random programme started
on a system using an nfsroot filesystem will abort with a
"bus error", or signal 7. When restarting the programme it usualy
runs succesfully.

As the nfsroot systems usually start 50-something processes, this
results in a failure rate of approx 50%, and it thus becomes a
real problem.

When looking at the boot process with tcpdump, the first thing
that differs between a succesfull boot an an unsuccesful one seems
to be: "ERROR: Communication error on send [|nfs]". It was my understanding
that such errors are in principle harmless, and should only result
in the client trying to re-read the file. This indeed happens, but,
apparently with some error.

I would really appreciate any info -- eighter how I can get better
diagnostics, or other things I may try.

    Linux-{2.0.14,20.0.20},      ethernetcard: 3c503
    Linux-{2.0.15,2.0.18,2.0.20} ethernetcard: 3c509, 3c503
   (I tried most permutations with server/client kernel, no diff)

$ /usr/sbin/rpc.nfsd -v
Universal NFS Server 2.2beta16

A slightly longer tcpdump (mail me of more or any other info):

rulvsc is the nfsroot client, rulcmc is the server.

tcpdump -s 200 host rulvsc
22:52:05.723068 > 116 lookup fh Unknown/1 "lib"
22:52:05.723068 > reply ok 128 lookup fh Unknown/1
22:52:05.723068 > 128 lookup fh Unknown/1 ""
22:52:05.753069 > reply ok 128 lookup fh Unknown/1
22:52:05.753069 > 108 getattr fh Unknown/1
22:52:05.773069 > reply ok 96 getattr REG 100755 ids 0/0 sz 99308

22:52:05.803070 > reply ok 1124 read

22:52:05.813070 > reply ok 1124 read

22:52:05.833070 > reply ok 1124 read

22:52:05.843071 > reply ok 1124 read

22:52:05.863071 > reply ok 1124 read

22:52:05.873071 > reply ok 28 read ERROR: Communication error on send [|nfs]

22:52:05.873071 > reply ok 1124 read

22:52:05.883071 > reply ok 1124 read

22:52:05.883071 > reply ok 1124 read

22:52:05.893072 > reply ok 720 read

(this is the end, here all communication ends)

Thanks very much,