NFS client locking hangs for period

NFS client locking hangs for period

Post by Christian Rei » Sat, 26 Apr 2003 07:10:06

Well, since I've more or less moved on from my original problems, I
should probably post a summary of what was going on, and what I did to
work around it.

Details can be read out from [1]: after a certain amount of time a
number diskless clients, which were mounting everything from the same
NFS server, started getting hung lock requests from the server. The
server ran 2.4.20, reiserfs over RAID-1 mounted with 2 SCSI disks on an
Adaptec 29160. The clients were debian woodys running 2.4.20.

Our diskless setup is a bit unusual: all the clients mount the same root
partition. I tried to be very careful to make sure no files were written
to on /, but I never got to the point where the clients could mount the
directory read-only. I used devfs to make sure that the /dev directories
were `localized' and syslog/console ownership and permissions kept sane.

The locking problem, however, was not related to the root filesystem --
it seems to have happened with files on the /var/log mount, which is
separate for each box (but still coming from a shared filesystem
/export/root on the server, which contains all the client directories).
If I mounted /var/log with the nolock option, they ran fine. This took
me a very long time to figure out, and I'd advise anyone with locking
problems to give it a go.

I should point out that this *does* seem to be a bug in the NFS server
code. I think it is associated with reiserfs, being that I haven't seen
it happen on other partition types. Rebooting the server cleared up the
problem. Erasing or changing files in /var/lib/nfs did not. While I was
initially using a volatile /var/lib/nfs directory on the *clients*, I
changed this on Trond's suggestion [2]. It did not fix the problem.

However, since I know little about the code itself, and it's not very
clear how one should debug, I was unable to pinpoint the exact source of
the problem, which very much saddens me.  The workaround, however, was
quite effective.


Take care,
Christian Reis, Senior Engineer, Async Open Source, Brazil. | [+55 16] 261 2331 | NMFL
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at
Please read the FAQ at


1. NFS client locking hangs for period

Hello Neil,

I've been trying to get at this problem for a while now, and had been
concentrating on the client-side of the problem (and consequently
bothering Trond about it) [1,2]. I am now pretty much convinced this is a
server-side problem, and as I've patched 2.4.20 with all the NFS patches
pending (that didn't have to do with the kernel lock breaking) and still
see the issue, I decided to report this bug.

The scenario is: a set of NFS clients with root mounted over nfs from a
single server. Clients run vanilla 2.4.20, server runs 2.4.20 patched
with your server-side patches I mentioned above. The clients run okay
for a period, and then one of them will start to hang for long periods
of time for certain operations (it happens on startup and shutdown, for
instance). Once the client hangs start the server needs to be rebooted
for it to clear up.

It seems to be reproducible by having the client hang or reboot without
shutting down properly. Another tip is that the server gets files left
over in /var/lib/nfs/sm/ for the hanging client(s).

I've been trying to track this down for a while, but since I'm not very
proficient with debugging at this level, I haven't had much luck. It's
really a problem because I need to reboot and make 20 people stop
working when the problem gets serious. Trond has had a hand trying
to help me, but we still haven't uncovered anything. I wonder if you
have any clue what could be happenning?

The other details are standard: the clients are debian woodys with
nfs-utils 1.0.1 installed, and the server has the same version. The
server runs reiserfs over RAID-1 partitions (using the kernel md
driver). Could it be triggered because of this perhaps unusual

Some of the messages I point out below have some info about the issue -
including tcpdumps and traces of nlm_debug on the server and client.

Mount options follow for the client filesystems:

anthem:/export/root/    /   nfs defaults,rw,rsize=8192,wsize=8192,nfsvers=2 0 0
anthem:/home    /home   nfs defaults,rw,rsize=8192,wsize=8192,nfsvers=3 0 0

I have checked and, yes, root is mounted using version 2 and the rest as
version 3. Perhaps I should try getting the kernel to mount root using
version 3?


Thanks for any help you can give.

Take care,
Christian Reis, Senior Engineer, Async Open Source, Brazil. | [+55 16] 261 2331 | NMFL
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at
Please read the FAQ at

2. Help with Chap setup!!!

3. NFS locks hang, FreeBSD 5.0 client, Linux server

4. Signal that caused EINTR ?

5. Non-blocking lock requests during the grace period ===> unlock during grace period?

6. /proc/self/map

7. Box hangs when workstation locked for extended periods

8. Are the Linux activists mailing-lists archived somewhere?

9. Linux NFS client fails after a period of time

10. NFS clients hang redhat linux 7.3 and limbo clients accessing auspex ns2000

11. Cisco VPN Client -> network hang or system lock

12. netscape mail hangs (nfs locking problem?)

13. bootup hangs at nfs file locking services