Sol 2.7 NFS bug?

Sol 2.7 NFS bug?

Post by Paul Egge » Sat, 26 Jun 1999 04:00:00




>The NFS RFC's (1813 for NFSv3) define atime/mtime/ctime as unsigned 32
>bit numbers.  Solaris 7's NFS enforces this more strictly than previous
>versions and won't play nicely with files who have negative timestamps.
>Whether this is a feature or a bug depends on who you ask.

It's clearly a problem, and I wish Sun would fix it.

Practically speaking, it means that Solaris 7 won't interoperate with
anyone else.  We've had several problems with our Solaris 7 clients and
Netapp file servers due to files with timestamps before 1970.  I can't
tell users, with a straight face, that this is because Netapp is broken;
after all, Netapp works with all our _other_ clients.

The worst consequence of this problem is that if you want to have
reliable file utilities in Solaris 7, you _must_ compile them in 64-bit
mode and run the 64-bit OS.  Large-file mode won't cut it, and the
32-bit OS won't cut it either.  E.g., suppose I do the following on a
Solaris 2.5.1 client with a Netapp server:

sol251$ export TZ=UTC0
sol251$ touch -t 191811111100 armistice
sol251$ ls -l armistice
-rw-rw-r--   1 eggert   eggert         0 Nov 11  1918 armistice

Then suppose I try to use `find', or `ls' or any shell script based on
standard utilities on Solaris 7.  They'll*up royally:

sol7$ ls -l armistice
armistice: Value too large for defined data type

I have to build GNU ls in 64-bit mode, and run it in a 64-bit OS,
for `ls' to not*up royally:

sol7$ gnuls -l armistice
-rw-rw-r--    1 eggert   eggert          0 Dec 17  2054 armistice

The time stamp may not be interpreted compatibly with Solaris 2.5.1,
due to Solaris 2.5.1 converting it to nfstime3 incorrectly,
but that's OK: at least I can ls the file!

Here's what Sun should do, if it wants to do things right:

1. Ship 64-bit safe file utilities so that my shell scripts keep working.
   This is just as important as making the utilties large-file safe.
   I shouldn't have to worry that `dd' or `find' will bug out on me;
   they should work on all files.

2. The 32-bit OS should not reject negative NFS timestamps, since there's
   no way for the poor user to recover in that case -- there's no
   64-bit mode to escape to.  In other words, the 32-bit OS should
   continue to be compatible with Solaris 2.6 in this matter.

 
 
 

Sol 2.7 NFS bug?

Post by Paul Ingra » Sat, 26 Jun 1999 04:00:00


Quote:> It's clearly a problem, and I wish Sun would fix it.

> Practically speaking, it means that Solaris 7 won't interoperate with
> anyone else.  We've had several problems with our Solaris 7 clients and
> Netapp file servers due to files with timestamps before 1970.  I can't
> tell users, with a straight face, that this is because Netapp is broken;
> after all, Netapp works with all our _other_ clients.

I appreciate you saying this, especially the part about Netapp. We are a
small company producing our own multi-protocol
file servers (nfs smb/cifs, http) we have seen this problem on one site that
just upgraded to solaris 2.7.
Because we are the 'little guy' we are the ones who must be at fault and who
get stomped on.
'Someone as big as Sun can't make a mistake like this'

Quote:> Here's what Sun should do, if it wants to do things right:

> 1. Ship 64-bit safe file utilities so that my shell scripts keep working.
>    This is just as important as making the utilties large-file safe.
>    I shouldn't have to worry that `dd' or `find' will bug out on me;
>    they should work on all files.

> 2. The 32-bit OS should not reject negative NFS timestamps, since there's
>    no way for the poor user to recover in that case -- there's no
>    64-bit mode to escape to.  In other words, the 32-bit OS should
>    continue to be compatible with Solaris 2.6 in this matter.

Right On!

Regards

Pi

--
Paul Ingram
Technical Support

Workstations U.K. Ltd. Amersham, England
Tel.  01494 724 498
Fax. 01494 433 375

 
 
 

Sol 2.7 NFS bug?

Post by Frank Cusac » Sat, 26 Jun 1999 04:00:00




> >The NFS RFC's (1813 for NFSv3) define atime/mtime/ctime as unsigned 32
> >bit numbers.  Solaris 7's NFS enforces this more strictly than previous
> >versions and won't play nicely with files who have negative timestamps.
> >Whether this is a feature or a bug depends on who you ask.

> It's clearly a problem, and I wish Sun would fix it.

I don't agree about the "enforces this more strictly" part. Whether the
times are defined as signed or unsigned shouldn't make a difference to
the *OS*, as long as they are the correct size (bitwise). The 32-bit nfs
driver is simply broken if returns an error instead of returning large
unsigned values. [See Paul's example below.] IMHO it's RFC1813 that's
broken. File times should reflect the (defacto ?) time_t standard,
ie signed 32-bit. (presumably 64-bit for v4)

[...]

Quote:

> The worst consequence of this problem is that if you want to have
> reliable file utilities in Solaris 7, you _must_ compile them in 64-bit
> mode and run the 64-bit OS.  Large-file mode won't cut it, and the
> 32-bit OS won't cut it either.  E.g., suppose I do the following on a
> Solaris 2.5.1 client with a Netapp server:

> sol251$ export TZ=UTC0
> sol251$ touch -t 191811111100 armistice
> sol251$ ls -l armistice
> -rw-rw-r--   1 eggert   eggert         0 Nov 11  1918 armistice

> Then suppose I try to use `find', or `ls' or any shell script based on
> standard utilities on Solaris 7.  They'll*up royally:

> sol7$ ls -l armistice
> armistice: Value too large for defined data type

> I have to build GNU ls in 64-bit mode, and run it in a 64-bit OS,
> for `ls' to not*up royally:

> sol7$ gnuls -l armistice
> -rw-rw-r--    1 eggert   eggert          0 Dec 17  2054 armistice

--
* I am Pentium of Borg. Division is futile. You will be approximated. *

 
 
 

Sol 2.7 NFS bug?

Post by Casper H.S. Dik - Network Security Engine » Mon, 28 Jun 1999 04:00:00


[[ PLEASE DON'T SEND ME EMAIL COPIES OF POSTINGS ]]


>It's clearly a problem, and I wish Sun would fix it.

A partial workaround is (in /etc/system):

        set nfs:nfs_32_time_ok = 0xffffffff

However, this workaround appears to be broken on 64 bit kernels,
except for 64 bit applications.

The NFS RFC is pretty clear about requiring times to be unsigned; whether
time_t is signed or not is irrelevant; NFS was designed to be more or
less independent from Unix.

Casper
--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

 
 
 

Sol 2.7 NFS bug?

Post by Paul Egge » Mon, 28 Jun 1999 04:00:00



Quote:>A partial workaround is (in /etc/system):
>    set nfs:nfs_32_time_ok = 0xffffffff

Thanks for the tip!  This will save me a lot of grief.

Quote:>However, this workaround appears to be broken on 64 bit kernels,
>except for 64 bit applications.

Also, the workaround only lets you _see_ negative time_t values; it
doesn't let you _generate_ them.  This fixes the problem for many
system utilities, but not all:

$ ls -l armistice
-rw-rw-r--   1 eggert   eggert         0 Nov 11  1918 armistice
$ cp -p armistice x
cp: cannot set times for x: Value too large for defined data type

Shall I file a bug report or have you done it already?  (I feel a bit
odd filing a bug report on an undocumented feature, even if it is
extremely useful....)

Quote:>The NFS RFC is pretty clear about requiring times to be unsigned; whether
>time_t is signed or not is irrelevant; NFS was designed to be more or
>less independent from Unix.

All your statements are correct, and yet still it is a botch.

NFSv3 should have made nfstime3 signed, because that's what the vast
majority of NFS-using systems interpret time_t as.  Since we can't fix
NFSv3, NFSv4 should make nfstime4 a 64-bit signed quantity.  In the
meantime, NFS clients and servers should work around the problem by
properly propagating times whose 32-bit timestamps have the top bit on,
regardless of how thoese timestamps are interpreted by the OS; this is
better than the pedantic approach taken in Solaris 7, which breaks
a lot of applications.

 
 
 

Sol 2.7 NFS bug?

Post by Casper H.S. Dik - Network Security Engine » Tue, 29 Jun 1999 04:00:00


[[ PLEASE DON'T SEND ME EMAIL COPIES OF POSTINGS ]]



>>A partial workaround is (in /etc/system):
>>        set nfs:nfs_32_time_ok = 0xffffffff
>Thanks for the tip!  This will save me a lot of grief.

You can't use it on NFS servers, though :-( (not when they run 32 bit
kernels, that is).

This is the default setting in 64 bit Solaris.

Quote:>>However, this workaround appears to be broken on 64 bit kernels,
>>except for 64 bit applications.
>Shall I file a bug report or have you done it already?  (I feel a bit
>odd filing a bug report on an undocumented feature, even if it is
>extremely useful....)

There's a bug on how the time are handled inconsistentlty; it lists
this variable as a workaround, however, in <nfs/nfs.h>

/* Test if time_t (signed long) can be sent over the wire */
#define NFS_TIME_T_OK(tt)                                               \
        (((tt) >= 0) && ((tt) <= (time_t)nfs_32_time_ok))

Guess what happens on 32 bit system?  If you change nfs_32_time_ok,
the value always evaluates to false and you can't tranmit any time
over the wire; an NFS server will be completely broken, a client just
can't set the times (they usually don't, but "cp -p" and such will fail.

Quote:>All your statements are correct, and yet still it is a botch.

Agreed.

Quote:>NFSv3 should have made nfstime3 signed, because that's what the vast
>majority of NFS-using systems interpret time_t as.  Since we can't fix
>NFSv3, NFSv4 should make nfstime4 a 64-bit signed quantity.  In the
>meantime, NFS clients and servers should work around the problem by
>properly propagating times whose 32-bit timestamps have the top bit on,
>regardless of how thoese timestamps are interpreted by the OS; this is
>better than the pedantic approach taken in Solaris 7, which breaks
>a lot of applications.

I agree, being pedantic is not the right way to approach this.

Casper
--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

 
 
 

Sol 2.7 NFS bug?

Post by Mike Eisl » Tue, 29 Jun 1999 04:00:00




>>The NFS RFC is pretty clear about requiring times to be unsigned; whether
>>time_t is signed or not is irrelevant; NFS was designed to be more or
>>less independent from Unix.

>All your statements are correct, and yet still it is a botch.

>NFSv3 should have made nfstime3 signed, because that's what the vast
>majority of NFS-using systems interpret time_t as.  Since we can't fix
>NFSv3, NFSv4 should make nfstime4 a 64-bit signed quantity.  In the

It will.
--
-Mike Eisler                    Solaris NFS group

remove the prefix 'NO_' and suffix '_SPAM' to reply.
 
 
 

Sol 2.7 NFS bug?

Post by Mike Eisl » Wed, 30 Jun 1999 04:00:00





>>A partial workaround is (in /etc/system):

>>        set nfs:nfs_32_time_ok = 0xffffffff

>Thanks for the tip!  This will save me a lot of grief.

>>However, this workaround appears to be broken on 64 bit kernels,
>>except for 64 bit applications.

>Also, the workaround only lets you _see_ negative time_t values; it
>doesn't let you _generate_ them.  This fixes the problem for many
>system utilities, but not all:

>$ ls -l armistice
>-rw-rw-r--   1 eggert   eggert         0 Nov 11  1918 armistice
>$ cp -p armistice x
>cp: cannot set times for x: Value too large for defined data type

>Shall I file a bug report or have you done it already?  (I feel a bit
>odd filing a bug report on an undocumented feature, even if it is
>extremely useful....)

I've filed bug id 4250537 on the issue you raised. Since this feature should be
documented, I filed 4250546.
--
-Mike Eisler                    Solaris NFS group

remove the prefix 'NO_' and suffix '_SPAM' to reply.
 
 
 

Sol 2.7 NFS bug?

Post by Mike Eisl » Wed, 30 Jun 1999 04:00:00




>NFSv3, NFSv4 should make nfstime4 a 64-bit signed quantity.  In the
>meantime, NFS clients and servers should work around the problem by
>properly propagating times whose 32-bit timestamps have the top bit on,
>regardless of how thoese timestamps are interpreted by the OS; this is
>better than the pedantic approach taken in Solaris 7, which breaks
>a lot of applications.

What about applications that deliberately set the timestamp to a value in
the future? I've heard of financial instituions using the timestamp in
such a way to mark records of bonds and loans that come due 30 or more
years from now. So if those applications reasonably expect NFS with its
Y2106 capability to handle it, and it doesn't, do I tell them that
despite what RFC 1813 says, we are going to ignore it?

There there's the Y2028 crisis. Yeah sure, no one will be using Solaris
code in 2039, so its safe to usurp that bit for pre-epoch timestamps,
just like we aren't using any circa 1970 Cobol programs that have two
digit years.

The NFS V[23] protocols cannot represent pre-eopoch times. That's bad,
but what is worse is having implementations "lie" out of the box. I do
agree that there should be a work around.

What really scares me about the Y2k debacle is not that the world won't
survive it. I now think the world will, albeit with some glitches here
and there. What scares me is that people will wait for 2001 to arrive
and say, yes, life is good, no more to worry about. Except that are are
going to be Y20XX dates to worry about for the rest of the next
century. The Big ones are Y2038 and Y2036 (seconds since Y1900).

At least with NFS, we'll have until Y2106 to get NFS V4 out the door.
:-)
--
-Mike Eisler                    Solaris NFS group

remove the prefix 'NO_' and suffix '_SPAM' to reply.

 
 
 

Sol 2.7 NFS bug?

Post by Paul Egge » Wed, 30 Jun 1999 04:00:00




>>[NFSv3] clients and servers should work around the problem by
>>properly propagating times whose 32-bit timestamps have the top bit on,
>>regardless of how those timestamps are interpreted by the OS; this is
>>better than the pedantic approach taken in Solaris 7, which breaks
>>a lot of applications.
>What about applications that deliberately set the timestamp to a value in
>the future?

32-bit apps should work the same across NFS as they do with UFS.
This is the simplest to explain, and will cause the fewest problems.
In other words, the timestamp range for 32-bit apps should be 1901-2038
for NFS, just as it is for UFS.

64-bit clients and servers should also read and write NFSv3 timestamps
in the range 1901-2038.  That way, they'll be compatible with their
32-bit counterparts.

It might be useful to have an option to move the NFSv3 timestamp range
to 1970-2107, to conform pedantically to the NFSv3 spec; but the
default ought to cater to existing practice.  Perhaps you don't agree
with my opinion about the default, but at the very least there must be
an option to behave compatibly with existing practice.

I realize that my suggestion is not in the spirit of the NFS spec for
timestamps past 2038, but in practice it's better than the alternative
of being pedantic, which causes interoperability headaches.
If technical compliance is a real issue, one might even argue
that my suggestion technically complies with the NFSv3 spec
(but please don't make me argue that! :-).

Quote:>I've heard of financial institutions using the timestamp in
>such a way to mark records of bonds and loans that come due 30 or more
>years from now.

Your example hasn't come up in any of the financial apps that our
company has written or maintained.  Our apps typically keep financial
timestamps in a database, where they belong.  They do use time_t for
logs and such, but not for arbitrary timestamps.

But obviously you must keep even your weird customers happy, and you
can do this by telling them to enable the option to be pedantic about
NFS timestamps.

I can't resist mentioning that it's not in Sun's interest to make the
default cater to the weird case that you mentioned, since Solaris 7 UFS
does not allow file timestamps past 2038.  As things stand, your weird
customers _must_ use NFS, and I'd guess that their NFS servers can't be
running vanilla Solaris 7 either.  Surely you'd rather have the default
encourage your customers to use Sun servers to solve their problems,
and surely you also don't want to force them to use NFS.

Quote:>At least with NFS, we'll have until Y2106 to get NFS V4 out the door.

Unfortunately, we don't have that long.  The problems we've been
discussing will get worse as 2038 gets closer, due to the mismatch
between spec and actual practice.  The best way to fix this is to leave
actual practice as it is (by fixing Solaris 7 to conform to it) and
getting NFSv4 out the door well before 2038.  Unless NFS dies, NFSv4
will happen soon enough anyway, so it's not a big deal to defer the
Y2038 fixes until NFSv4.

In the meantime, Solaris 7 shouldn't be pedantic about NFSv3 timestamps
with the top bit on, since in practice this pedanticism causes far more
problems than it cures.

Sun's story about time_t should be simple: to fix the Y2038 problem,
Sun is upgrading its software to 64-bit time_t.  Please don't
complicate the story by talking about unsigned 32-bit integer
timestamps -- they're not worth the aggravation.

 
 
 

Sol 2.7 NFS bug?

Post by Paul Ingra » Thu, 01 Jul 1999 04:00:00



> >Shall I file a bug report or have you done it already?  (I feel a bit
> >odd filing a bug report on an undocumented feature, even if it is
> >extremely useful....)

> I've filed bug id 4250537 on the issue you raised. Since this feature should be
> documented, I filed 4250546.

I have since heard that a bug id has already been raised on this or a similar issue,
like the
one I raised at the top of this tree. A lot of people have been talking about the
32/64 bit date issue,
but the issue we have is the NFSv3 fix for the v2 race condition between LOOKUP and
CREATE.

v3 cured this with putting a 64 bit 'guard' in place of the mtime/atime field which
checked to see if the
file exists during CREATE. This is a very clever idea, but it can only work if
EVERYONE
makes sure they follow it up with a SETATTR on mtime/atime.

Sol 7 doesnt SETTATTR, it does a GETATTR first, which screws everything up.

The bug id in question is 4218508

--
Paul Ingram
Technical Support

Workstations U.K. Ltd. Amersham, England
Tel.  01494 724 498
Fax. 01494 433 375

 
 
 

Sol 2.7 NFS bug?

Post by Mike Eisl » Fri, 02 Jul 1999 04:00:00




Quote:>The bug id in question is 4218508

Yes. It is fixed in the next full release after Solaris 7.

To get it patched in Solaris 7 will require that you escalate through CTE.
--
-Mike Eisler                    Solaris NFS group

remove the prefix 'NO_' and suffix '_SPAM' to reply.

 
 
 

Sol 2.7 NFS bug?

Post by Paul Egge » Fri, 02 Jul 1999 04:00:00


Mike Eisler writes about Sun bug 4218508:

Quote:>It is fixed in the next full release after Solaris 7.

Is the problem fixed even in the light of the current discussion?

E.g. suppose a 32-bit client does a CREATE and then crashes, so that
the server stores a weird 64-bit timestamp into the metadata.
After the client reboots, a client app attempts to stat the file, but
this attempt fails since the 32-bit client can't access files whose
timestamps are outside the 32-bit range.

 
 
 

Sol 2.7 NFS bug?

Post by Mike Eisl » Sat, 03 Jul 1999 04:00:00




>Mike Eisler writes about Sun bug 4218508:

>>It is fixed in the next full release after Solaris 7.

>Is the problem fixed even in the light of the current discussion?

>E.g. suppose a 32-bit client does a CREATE and then crashes, so that
>the server stores a weird 64-bit timestamp into the metadata.
>After the client reboots, a client app attempts to stat the file, but
>this attempt fails since the 32-bit client can't access files whose
>timestamps are outside the 32-bit range.

In an exlcusive create, the Solaris7 server always masks off the high
order bit of the verifier before shoe horning it into the mtime when
the file doesn't exist. When it does exist, the server masks off the
high order bit of the verifier the client gave it before comparing it
to the mtime.
--
-Mike Eisler                    Solaris NFS group

remove the prefix 'NO_' and suffix '_SPAM' to reply.
 
 
 

Sol 2.7 NFS bug?

Post by David Com » Sun, 04 Jul 1999 04:00:00






>>The bug id in question is 4218508

>Yes. It is fixed in the next full release after Solaris 7.

>To get it patched in Solaris 7 will require that you escalate through CTE.
>--
>-Mike Eisler                        Solaris NFS group

>remove the prefix 'NO_' and suffix '_SPAM' to reply.

Quick question: roughly when will that "next full release"
show up?  Before y2k?  (Am using 5.5.1).

Also, what will it be called?  8?

Thanks!

 
 
 

1. Upgrade Problem Sol 2.5.1 --> Sol 2.7

I try to upgrade tonight a Sun E3000 from Solaris 2.5.1 to Solaris 2.7.
I am using DiskSuite for all my filesystems.

I followed step by step the recommendation in that case and when booting
my new system, after upgrading DiskSuite from 4.1 to 4.2, I have the
Message "Fast data access mmu Miss" on the console and the return to the
boot prompt.

Any ideas ?

thanks.

2. gcc 2.2.2 question

3. problem with LINUX (NFS server) and Solaris 2.7 (intel ) NFS client.

4. How to make kernel ppp (pppd) work?

5. upgrade Sol 2.5 > 2.7 without CD-Rom player inside

6. SuSE and KDE .....

7. good pci video card for Sol 2.7?

8. QUERY: NCSA httpd 1.4 & HP-UX 9.0.5 - unable to set GID

9. endian.h on Sol 2.7

10. OAS 4.0.7/Oracle 8i/Sol 2.7 connectivity problems

11. libgcc errors on sol 2.7

12. Sol 2.7: x86: Freezes momentarily..

13. Drivers for diamond fire gl 1000 pro on Sol. 2.7