C++, volatile member functions, and threads

C++, volatile member functions, and threads

Post by Bryan O'Sulliva » Tue, 08 Jul 1997 04:00:00



e> No, I disagree with your assessment.

I'm afraid that your disagreement is incorrect.  You are certainly
welcome to maintain your position, but Dave will still be right.

e> Accessing a volatile is still far more efficient than the function
e> call overhead typically required for obtaining a mutex lock.

That is not necessarily true; even if it were, it is still not
helpful.  You should not be caring about nigglingly small points of
efficiency, but about correctness and larger issues of efficiency.

e> volatile is necessary in this context because you need to force the
e> compiler to flush the update to memory, even if that flush doesn't
e> occur immediately due to the vagaries of particular SMP
e> environments.

Using the volatile keyword is still insufficient to give you reliable
(i.e. correct) semantics, and this is the point that Dave and I have
been pushing.  The sequence you should be following is like this:

1.  Develop your code.  Make sure it is correct.

2.  Benchmark it in realistic conditions.

3.  Is it fast enough?  If so, you're done.  If not, continue.

4.  Look at the algorithms and data structures you're using, and the
    overall structure of the synchronisation you are doing.  See if
    you can make any changes that would have a large impact.  Go to
    step 1.

5.  Once you have everything working sensibly in the large and you
    still aren't getting quite the performance you need, start
    worrying about those inner loops.

Only during the last step should you start worrying about ways to
improve the performance of your code with respect to individual
mutexes or condition variables.

At this point, you may be thinking about rewriting your inner loops in
assembly language and doing other platform-dependent things, depending
on how much you need to care about speed and portability, so more or
less anything goes, perhaps including use of your own synchronisation
code.

This is the sort of thing you will only need to pay attention to if
you have a lot of time to spare and performance is of utmost
importance, though; up until near the end of step 5, you should use
whatever portable vendor-provided synchronisation constructs are
appropriate to your task, and you will find that this suffices for
99.99% of all your programming needs.

I absolutely guarantee you that trying to write your own portable
synchronisation code in C or C++ is a quick route to insanity and
humbleness.  If you think you know enough to get it right without
having used your code in production work for a year or three, you just
haven't been bitten often enough by the subtle bugs in your code.

        <b

--
Let us pray:



 
 
 

C++, volatile member functions, and threads

Post by Dave Butenho » Wed, 09 Jul 1997 04:00:00



>      No, I disagree with your assessment.

That's fine. Disagree. I won't even bother to argue, because I notice
I've already started trying to simply rephrase the same information in
hopes that someone who didn't understand originally will suddenly see
the light. I'm tired of this, and I don't intend to rephrase yet again
for this particular discussion. I'm going to do some work today,
instead.

Suffice it to say that I believe someone following your advice will get
themselves into trouble. The trouble will be subtle and difficult to
diagnose. They will regret it. Perhaps you'll feel guilty, though you'll
probably never even know.

Quote:> I presume the library forces a full write cache
> flush whenever a mutex is unlocked or a mutex would[n't] work for the
> purpose it was intended.

Ah, this is a tangent that's more interesting, and about which I have
something to say that's new -- at least in the context of this
discussion. Use of mutexes does NOT imply a cache flush. Rather, proper
use of a mutex implements a memory coherency PROTOCOL that ensures one
thread, via a mutex, can pass a consistent view of memory to another
thread. That second thread (the thread that next locks the mutex after
one thread unlocks the mutex) does indeed "inherit" a consistent view of
memory -- but that is a result not merely of the UNLOCK, but of the
UNLOCK in one thread combined with the LOCK in the next thread.

A mutex unlock is generally a memory barrier followed by clearing the
lock bit. The memory barrier ensures that the current processor cannot
reorder the writes that occurred within the locked region past the
unlock itself. It does not, however, ensure that those protected writes
occur immediately, or even soon. A lock is an atomic set (test-and-set,
swap, whatever) followed by a memory barrier. The barrier ensures that
any data written within the locked region cannot be reordered past
(before) the lock itself.

In combination, this means that a thread locking a mutex can be sure
that it will see all data written prior to the previous thread's unlock.
But that is NOT the same as a "cache flush". It's merely an orderly
limitation on memory operation reordering.

(Note that many older systems do not have this concept of a memory
barrier, and that confuses a lot of people. Most of us are used to
guaranteed hardware read/write ordering, where all memory transactions
occur IN ORDER -- and often atomically, as well. This is NOT true of
modern high-performance multiprocessor memory systems. You don't need to
worry about this, as long as you "follow the rules" -- but to break the
rules successfully you'd better understand every detail of your
hardware, and don't expect the behavior to be remotely portable!)

The implications of this distinction are subtle, but the important
consideration is that BOTH sides must use a mutex, or there is no
synchronization (or visibility guarantees). WRITING a variable under a
mutex and READING it in another thread without a mutex provides no
visibility guarantees.

This is one of the ways in which volatile falls down as a "substitute"
for synchronization. The volatile attribute forces the COMPILER to
generate write (and read) instructions in a few places where it might
not otherwise, but it has no effect on the hardware caching. In
particular, it does not "flush cache", or enforce any ordering on memory
operations. It prevents the compiler from keeping values in registers
that would benefit from being kept in registers. The volatile attribute
is useful for situations where you want to read or write a variable from
a signal handler, or after a longjmp, or where the variable's address is
bound to a hardware register (e.g., direct mapped I/O) such that each
change in value may be critical to the operation of the hardware. The
volatile attribute does not make operations on the variable atomic, nor
does it create a protocol that provides synchronization or visibility
across threads/processes operating in parallel.

/---------------------------[ Dave Butenhof ]--------------------------\

| 110 Spit Brook Rd ZKO2-3/Q18       http://members.aol.com/drbutenhof |
| Nashua NH 03062-2698       http://www.awl.com/cp/butenhof/posix.html |
\-----------------[ Better Living Through Concurrency ]----------------/

 
 
 

C++, volatile member functions, and threads

Post by Todd Murra » Wed, 09 Jul 1997 04:00:00



> e> No, I disagree with your assessment.

> I'm afraid that your disagreement is incorrect.  You are certainly
> welcome to maintain your position, but Dave will still be right.

> e> Accessing a volatile is still far more efficient than the function
> e> call overhead typically required for obtaining a mutex lock.

> That is not necessarily true; even if it were, it is still not
> helpful.  You should not be caring about nigglingly small points of
> efficiency, but about correctness and larger issues of efficiency.

Don't forget MAINTAINABILITY and PORTABILITY.  I would hope that any
professional software engineer wouldn't write code that depends on
processor tricks or alternative synchronization methods.  Someone else
will invariably end up maintaining the code or trying to port it to
another platform.  Suddenly, code that worked on one system (relying on
a read operation to be atomic, for instance) won't work on another
system.  The code might be 3000 lines deep in a file, under a comment
saying, "// Note: This next line depends on a read of an unsigned int to
be atomic across threads."

Quote:> e> volatile is necessary in this context because you need to force the
> e> compiler to flush the update to memory, even if that flush doesn't
> e> occur immediately due to the vagaries of particular SMP
> e> environments.

> Using the volatile keyword is still insufficient to give you reliable
> (i.e. correct) semantics, and this is the point that Dave and I have
> been pushing.  The sequence you should be following is like this:

> 1.  Develop your code.  Make sure it is correct.

> 2.  Benchmark it in realistic conditions.

> 3.  Is it fast enough?  If so, you're done.  If not, continue.

> 4.  Look at the algorithms and data structures you're using, and the
>     overall structure of the synchronisation you are doing.  See if
>     you can make any changes that would have a large impact.  Go to
>     step 1.

> 5.  Once you have everything working sensibly in the large and you
>     still aren't getting quite the performance you need, start
>     worrying about those inner loops.

> Only during the last step should you start worrying about ways to
> improve the performance of your code with respect to individual
> mutexes or condition variables.

Very true.  I couldn't agree more.  And I'd have to ask if the speed of
the code is more important than the correctness and maintainability of
the code.  The customer isn't going to see a 10% speed increase in an
inner loop in most cases.  But, if that customer has a bug caused by
incorrect thread synchronization (by someone relying on a trick that
looks like it works, but fails a very small percentage of the time), the
customer will get very upset and call frequently until the problem is
fixed.

Quote:> At this point, you may be thinking about rewriting your inner loops in
> assembly language and doing other platform-dependent things, depending
> on how much you need to care about speed and portability, so more or
> less anything goes, perhaps including use of your own synchronisation
> code.

> This is the sort of thing you will only need to pay attention to if
> you have a lot of time to spare and performance is of utmost
> importance, though; up until near the end of step 5, you should use
> whatever portable vendor-provided synchronisation constructs are
> appropriate to your task, and you will find that this suffices for
> 99.99% of all your programming needs.

Not only that, but even if YOU understand what you're doing by writing
your own synchronization, chances are that some poor shmuck trying to
maintain your code won't.  (And more often than not, I've been that poor
shmuck, trying to understand someone's code when they've implemented
something that looked like an optimization when it was really a
poorly-thought out hack.  If my career ever progresses to the point
where I actually spend some time doing design and new implementation, as
opposed to fixing old hacked-up fire hazard code, you can bet I'll
design things cleanly and make it obvious what I'm doing. *)

Quote:> I absolutely guarantee you that trying to write your own portable
> synchronisation code in C or C++ is a quick route to insanity and
> humbleness.  If you think you know enough to get it right without
> having used your code in production work for a year or three, you just
> haven't been bitten often enough by the subtle bugs in your code.

(* Two gripes about my workplace deleted.  E-mail me for the deleted
bits.)
--

Slightly deranged mountain biker, snow skater, and keeper of the
'97 Wrangler FAQ: http://www.visi.com/~tam/tjfaq.html
Don't remove "nospam" from my E-mail address.
 
 
 

C++, volatile member functions, and threads

Post by Bryan O'Sulliva » Thu, 10 Jul 1997 04:00:00


e> I think my idea can be used reasonably safely, especially in its
e> capacity as a tag.

It's even stupid to use it as a tag:

- You are using the volatile keyword in a way that it was not meant to
  be used.  Since you don't have control over the compiler, the
  compiler is going to work under the assumption that you are not some
  kind of eccentric with funny ideas, and will inhibit optimisations
  that it could otherwise perform safely, resulting in unnecessary
  memory traffic that may degrade your code's performance
  substantially on cached multiprocessor systems.  You are forcing
  semantics onto a language construct that are not there and have
  never been implied to be there, which is dumb.

- You assume that some other human will see the volatile keyword and
  know that what *you* meant was not the usual semantics of volatile,
  but something else.

I've had enough of debating this stuff, since you seem intent on
perseverating over unimportant issues in ways that will make them
important for all the wrong reasons if you ever put your ideas into
practice.

        <b

--
Let us pray: