NFS locks hang, FreeBSD 5.0 client, Linux server

NFS locks hang, FreeBSD 5.0 client, Linux server

Post by Richard Jone » Fri, 28 Feb 2003 20:14:14



Hi:

I'm having a serious problem with FreeBSD 5.0 NFS client running
against a Linux (Red Hat Linux 7.1) server. I have no control over
the server, so don't ask me to change it.

When a process tries to acquire a lock for a file on the server
it hangs. This is especially serious because it prevents me from
using 'mutt' or installing OpenOffice.

A simple Perl script which demonstrates the problem:

  perl -e 'open F, ">locked"; flock (F,6) or die "$!"'

This hangs at the 'flock' operation.

All the Linux clients interoperate fine with this server of course.

Rich.

--

http://www.annexia.org/

 
 
 

NFS locks hang, FreeBSD 5.0 client, Linux server

Post by David Malo » Sat, 01 Mar 2003 01:00:19



>When a process tries to acquire a lock for a file on the server
>it hangs. This is especially serious because it prevents me from
>using 'mutt' or installing OpenOffice.
>A simple Perl script which demonstrates the problem:
>  perl -e 'open F, ">locked"; flock (F,6) or die "$!"'
>This hangs at the 'flock' operation.

It is probably worth checking that rpc.lockd and rpc.statd are
running on your machine. If this doesn't help, you could try
using rpc.lockd's "-d" option to find out what's going on.
tcpdumping the relivant network traffic would probably be helpful
too.

        David.

 
 
 

NFS locks hang, FreeBSD 5.0 client, Linux server

Post by Richard Jone » Sat, 01 Mar 2003 01:39:28



> It is probably worth checking that rpc.lockd and rpc.statd are
> running on your machine. If this doesn't help, you could try
> using rpc.lockd's "-d" option to find out what's going on.
> tcpdumping the relivant network traffic would probably be helpful
> too.

This is the output from rpc.lockd:

Feb 27 16:31:19 wandsworth rpc.lockd: process ID: 74157
Feb 27 16:31:19 wandsworth rpc.lockd: fh_len 24, fh \01\00\00\02\00\03\00\03\01\80\00\00\a5\00\54\00\95\31\1d\07\33\02\54\00
Feb 27 16:31:19 wandsworth rpc.lockd: start 0; len 0; pid 0; type 3; whence 0
Feb 27 16:31:19 wandsworth rpc.lockd: wait was not set
Feb 27 16:31:19 wandsworth rpc.lockd: lock request: V4: write to 172.16.68.3
Feb 27 16:31:19 wandsworth rpc.lockd: Found CLIENT* in cache

(keeps repeating the above until ^C is hit)

This is tcpdump output ('wandsworth' is the client and 'brent' is the
server):

16:33:40.661288 wandsworth.2041802486 > brent.nfs: 100 lookup fh Unknown/1 "locked"
16:33:40.661669 brent.nfs > wandsworth.2041802486: reply ok 232 lookup fh Unknown/1 (DF)
16:33:40.661771 wandsworth.2041802487 > brent.nfs: 96 access fh Unknown/1 003f
16:33:40.661945 brent.nfs > wandsworth.2041802487: reply ok 120 access c 000d (DF)
16:33:40.662007 wandsworth.2041802488 > brent.nfs: 128 setattr fh Unknown/1
16:33:40.662285 brent.nfs > wandsworth.2041802488: reply ok 144 setattr [|nfs] (DF)
16:33:40.662815 wandsworth.51375 > brent.1029: udp 220
16:33:40.663022 brent.1029 > wandsworth.51375: udp 28 (DF)

Here's another tcpdump snapshot:

16:35:27.550228 wandsworth.2041802728 > brent.nfs: 92 access fh Unknown/1 003f
16:35:27.550573 brent.nfs > wandsworth.2041802728: reply ok 120 access c 001f (DF)
16:35:27.550661 wandsworth.2041802729 > brent.nfs: 96 access fh Unknown/1 003f
16:35:27.550837 brent.nfs > wandsworth.2041802729: reply ok 120 access c 000d (DF)
16:35:27.550889 wandsworth.2041802730 > brent.nfs: 100 lookup fh Unknown/1 "locked"
16:35:27.551078 brent.nfs > wandsworth.2041802730: reply ok 232 lookup fh Unknown/1 (DF)
16:35:27.551159 wandsworth.2041802731 > brent.nfs: 128 setattr fh Unknown/1
16:35:27.551441 brent.nfs > wandsworth.2041802731: reply ok 144 setattr [|nfs] (DF)
16:35:27.552132 wandsworth.51388 > brent.1029: udp 220
16:35:27.552334 brent.1029 > wandsworth.51388: udp 28 (DF)

Here are the two packets sent across the wire when the rpc.lockd
repeats its attempt to get the lock:

16:37:37.753236 wandsworth.51407 > brent.1029: udp 220
16:37:37.753562 brent.1029 > wandsworth.51407: udp 28 (DF)

(Not sure how to get tcpdump to actually interpret the contents of those
packets properly ...)

Rich.

--

http://www.annexia.org/

 
 
 

1. NFS locking issues - Linux server - Solaris clients

Hi there,

We are running RedHat 7.2 (2.4.9-13smp)on a Dell 2550, with attached
SCSI storage.

This machine acts as an NFS server for users using Cadence (a
schematic and layout design environment) on Solaris clients. Anybody
doing IC work will be very familiar with Cadence.

Since we started using the server we have been experiencing issues
that appear to  be related to NFS locking. For example, a user
dragging a component across a layout will have their session hang for
over a minute when they drop it.

Fixing this involves one or more of the following
Restarting the nfs lock daemon on the Linux server
Running a program on the client that requests a lock from the server.

Sometimes the lock request is enough, and sometime the restart and the
request are required. Sometimes I have to do this several times.
Sometimes it affects one user, sometimes multiple users. Somestimes
one client machine, sometimes multiple.I have not yet found any
pattern.

It's very frustrating because I cannot reproduce the problem. Or even
reproduce the solution in the same sequence.

Anyone seen any NFS lock related problems?

Best regards,
John McDonnell,
Sysadmin,
Ireland.

2. hostname gets reset wrong by activating eth0 with DHCP

3. NFS client locking hangs for period

4. Print from Linux

5. install x-windows

6. NFS clients hang redhat linux 7.3 and limbo clients accessing auspex ns2000

7. SendTo/multicast : permission denied

8. NFS problems with Linux 2.2.x server, freebsd client

9. FreeBSD NFS server with Solaris and Linux clients

10. NFS efficiency of linux(client) and freebsd(server)?

11. rpc.lockd hangs process locking files on NeXT NFS server

12. NFS Locking issue with Solaris 8 client and Mandrake 8.2 server