Move "used FPU status" into new non-atomic thread_info->status field.

Move "used FPU status" into new non-atomic thread_info->status field.

Post by David S. Mille » Tue, 11 Mar 2003 21:20:26




   Date: Mon, 10 Mar 2003 18:20:35 +0000

Linus said, in a recent BK changelog:

        Also, fix x86 FP state after fork() by making sure the FP is unlazied
        _before_ we copy the state information. Otherwise, if a process did a
        fork() while holding the FP state lazily in the registers, the child
        would incorrectly unlazy bogus state.

At least on sparc{32,64}, we consider FPU state to be clobbered coming
into system calls, this eliminates a lot of hair wrt. FPU state
restoring in cases such as fork().

Are you preserving FPU state across fork() on x86?  If so, what do you
think might rely on this?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Move "used FPU status" into new non-atomic thread_info->status field.

Post by Linus Torvald » Tue, 11 Mar 2003 21:30:26



> At least on sparc{32,64}, we consider FPU state to be clobbered coming
> into system calls, this eliminates a lot of hair wrt. FPU state
> restoring in cases such as fork().

We could _probably_ do it on x86 too. The standard C calling convention on
x86 says FPU register state is clobbered, if I remember correctly.
However, some of the state is "long-term", like rounding modes, exception
masking etc, and even if we didn't save the register state we would have
to save that part.

And once you save that part, you're better off saving the registers too,
since it's all loaded and saved with the same fxsave/fxrestor instruction
(ie we'd actually have to do _more_ work to save only part of the FP
state).

Quote:> Are you preserving FPU state across fork() on x86?  If so, what do you
> think might rely on this?

Probably nothing per se. HOWEVER, we'd still need to save the state for
rounding etc, so we might as well save it all.

As it was, the x86 state was pretty much random after fork(), and that can
definitely lead to problems for real programs if they depend on things
like silent underflow etc.

(Now, in _practice_ all processes on the machine tends to use the same
rounding and exception control, so the "random" state wasn't actually very
random, and would not lead to problems. It's a security issue, though).

                Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Move "used FPU status" into new non-atomic thread_info->status field.

Post by David S. Mille » Tue, 11 Mar 2003 21:40:10



   Date: Mon, 10 Mar 2003 11:25:55 -0800 (PST)

   > Are you preserving FPU state across fork() on x86?  If so, what do you
   > think might rely on this?

   Probably nothing per se. HOWEVER, we'd still need to save the state for
   rounding etc, so we might as well save it all.

I see.

We preserve the rounding/etc. modes on sparc, we merely zap the actual
FPU registers around the system call.

And that's like 4 L2 cache lines of registers on sparc64, so there
really is a benefit from only saving the mode register across a system
call.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Move "used FPU status" into new non-atomic thread_info->status field.

Post by Chris Friese » Tue, 11 Mar 2003 22:10:06




>>At least on sparc{32,64}, we consider FPU state to be clobbered coming
>>into system calls, this eliminates a lot of hair wrt. FPU state
>>restoring in cases such as fork().

> We could _probably_ do it on x86 too. The standard C calling convention on
> x86 says FPU register state is clobbered, if I remember correctly.
> However, some of the state is "long-term", like rounding modes, exception
> masking etc, and even if we didn't save the register state we would have
> to save that part.

> And once you save that part, you're better off saving the registers too,
> since it's all loaded and saved with the same fxsave/fxrestor instruction
> (ie we'd actually have to do _more_ work to save only part of the FP
> state).

Does this open the door for using FP in the kernel?

Chris

--
Chris Friesen                    | MailStop: 043/33/F10
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Move "used FPU status" into new non-atomic thread_info->status field.

Post by Linus Torvald » Tue, 11 Mar 2003 22:20:08



> > And once you save that part, you're better off saving the registers too,
> > since it's all loaded and saved with the same fxsave/fxrestor instruction
> > (ie we'd actually have to do _more_ work to save only part of the FP
> > state).

> Does this open the door for using FP in the kernel?

Not any wider than it already is.

For a while now, x86-specific optimizations (and all such stuff is by
nature very much architecture-specific) have been able to do

        kernel_fpu_begin();
        ...
        kernel_fpu_end();

and use the FP state in between. It generally sucks if the user-mode
process had touched FP state (we'll force it saved), but most of the time
that isn't true, and the only thing it does is to temporarily clear the
TS bit so that the FPU works again (and then sets it again in fpu_end,
although if this was a common thing we _could_ make that be a "work"
thing that is only done at return-to-user-mode).

Of course, clearing TS isn't exactly fast, so this really only works if
you have tons of stuff that you _really_ want to use the FPU for. And
since the FP cache is per-CPU, the whole region in question is
non-preemptible, so this can only be used for non-blocking stuff.

In other words: it's still very much a special case, and if the question
was "can I just use FP in the kernel" then the answer is still a
resounding NO, since other architectures may not support it AT ALL.

                Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Move "used FPU status" into new non-atomic thread_info->status field.

Post by Andi Klee » Tue, 11 Mar 2003 23:10:12



> (Now, in _practice_ all processes on the machine tends to use the same
> rounding and exception control, so the "random" state wasn't actually very
> random, and would not lead to problems. It's a security issue, though).

Oh it does. Together with Marcus Meissner I just tracked down a 32bit
emulation problem on x86-64 with Wine today. The program running in
Wine would randomly crash on a flds with an floating point exception.  

Turned out the 32bit ptrace unlazy FPU path shared two lines too many
with with the 32bit signal FPU saving path and was resetting the
used_fpu flag. Result was that the FPU state of the child could be
reinitialized in some circumstances on ptrace accesses.  Wine actually
does use ptrace between the Wine server and the emulated process for
some complicated calls. It did one unlucky ptrace and then the FPCR was
at the linux defaults again - program crashed.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Move "used FPU status" into new non-atomic thread_info->status field.

Post by David S. Mille » Tue, 11 Mar 2003 23:10:14



   Date: 10 Mar 2003 22:01:17 +0100

   Turned out the 32bit ptrace unlazy FPU path shared two lines too many
   with with the 32bit signal FPU saving path and was resetting the
   used_fpu flag. Result was that the FPU state of the child could be
   reinitialized in some circumstances on ptrace accesses.

So what it depended upon was the FP control register state,
not the state of the individual FPU registers, across fork()
right?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Move "used FPU status" into new non-atomic thread_info->status field.

Post by Andi Klee » Tue, 11 Mar 2003 23:40:08



>    Turned out the 32bit ptrace unlazy FPU path shared two lines too many
>    with with the 32bit signal FPU saving path and was resetting the
>    used_fpu flag. Result was that the FPU state of the child could be
>    reinitialized in some circumstances on ptrace accesses.

> So what it depended upon was the FP control register state,
> not the state of the individual FPU registers, across fork()
> right?

Yes. The IA32 ABI says the FPU registers are clobbered in a function
call. And fork is a function call. Same with the SSE registers.

Unfortunately it is much more expensive to save individual registers
(SSE and x87 stack) than to just save everything using FXSAVE.
FXSAVE uses lazy saving and saves only the x87 registers that are
actually uses.

For SSE registers it may make sense, but then FXSAVE does that already
too and you always have to handle the x87 register stack too.

I doubt it would be a good idea to not use FXSAVE on i386. The microcode
can do a better job here because it has more information. In addition
it also promises to handle future new Intel registers transparently.

x86-64 ABIs have similar semantics.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

1. Move "used FPU status" into new non-atomic thread_info->status field.

Do you mean x87 control or the x87 stack here?

Sorry for being dense, but can you clarify: will current 2.{2,4,5}
kernels preserve or destroy the parent process' FPU control at fork()?

We're using unmasked FPU exceptions on x86 (and Solaris/SPARC) in the
runtime system for the Erlang telecom systems programming language.
This gives a noticeable performance improvement, but it relies on
the FPU control not changing beneath it: the FPU control is only
initialised at startup and when SIGFPU has occurred.

/Mikael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2. RH7 Upgrade...Anybody done?

3. Mail system reporting "failed to fix up status field"

4. problems with setsid under i86_solaris?

5. return status of commands run in su - <name> -c "<command>"

6. poor nfs server performance with 2.4.19-preX kernels vs. 2.4.17. Due to XFS and VM?

7. Does select return "read ready" or "won't block" status?

8. inode to file name?

9. New: "can't resubmit intr status -19" when disconnecting USB device

10. New: Receiving "Bus master arbitration failure, status ffff" error

11. Help: use SED to move or replace "New-line" & "return" char

12. GETSERVBYNAME()????????????????????"""""""""""""