listen/notify argument (old topic revisited)

listen/notify argument (old topic revisited)

Post by Jeff Dav » Wed, 03 Jul 2002 18:46:56



A while ago, I started a small discussion about passing arguments to a NOTIFY
so that the listening backend could get more information about the event.

There wasn't exactly a consensus from what I understand, but the last thing I
remember is that someone intended to speed up the notification process by
storing the events in shared memory segments (IIRC this was Tom's idea). That
would create a remote possibility of a spurious notification, but the idea is
that the listening application can check the status and determine what
happened.

I looked at the TODO, but I couldn't find anything, nor could I find anything
in the docs.

Is someone still interested in implementing this feature? Are there still
people who disagree with the above implementation strategy?

Regards,
        Jeff

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

 
 
 

listen/notify argument (old topic revisited)

Post by Bruce Momji » Thu, 04 Jul 2002 02:33:12



> A while ago, I started a small discussion about passing arguments to a NOTIFY
> so that the listening backend could get more information about the event.

> There wasn't exactly a consensus from what I understand, but the last thing I
> remember is that someone intended to speed up the notification process by
> storing the events in shared memory segments (IIRC this was Tom's idea). That
> would create a remote possibility of a spurious notification, but the idea is
> that the listening application can check the status and determine what
> happened.

I don't see a huge value to using shared memory.   Once we get
auto-vacuum, pg_listener will be fine, and shared memory like SI is just
too hard to get working reliabily because of all the backends
reading/writing in there.  We have tables that have the proper sharing
semantics;  I think we should use those and hope we get autovacuum soon.

As far as the message, perhaps passing the oid of the pg_listener row to
the backend would help, and then the backend can look up any message for
that oid in pg_listener.

--
  Bruce Momjian                        |  http://candle.pha.pa.us

  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

 
 
 

listen/notify argument (old topic revisited)

Post by Bruce Momji » Thu, 04 Jul 2002 14:51:44




> > Why can't we do efficient indexing, or clear out the table?  I don't
> > remember.

> I don't recall either, but I do recall that we tried to index it and
> backed out the changes.  In any case, a table on disk is just plain
> not the right medium for transitory-by-design notification messages.

OK, I can help here.  I added an index on pg_listener so lookups would
go faster in the backend, but inserts/updates into the table also
require index additions, and your feeling was that the table was small
and we would be better without the index and just sequentially scanning
the table.  I can easily add the index and make sure it is used properly
if you are now concerned about table access time.

I think your issue was that it is only looked up once, and only updated
once, so there wasn't much sense in having that index maintanance
overhead, i.e. you only used the index once per row.

(I remember the item being on TODO for quite a while when we discussed
this.)

Of course, a shared memory system probably is going to either do it
sequentailly or have its own index issues, so I don't see a huge
advantage to going to shared memory, and I do see extra code and a queue
limit.

Quote:> >> A curious statement considering that PG depends critically on SI
> >> working.  This is a solved problem.

> > My point is that SI was buggy for years until we found all the bugs, so
> > yea, it is a solved problem, but solved with difficulty.

> The SI message mechanism itself was not the source of bugs, as I recall
> it (although certainly the code was incomprehensible in the extreme;
> the original programmer had absolutely no grasp of readable coding style
> IMHO).  The problem was failure to properly design the interactions with
> relcache and catcache, which are pretty complex in their own right.
> An SI-like NOTIFY mechanism wouldn't have those issues.

Oh, OK, interesting.  So _that_ was the issue there.

--
  Bruce Momjian                        |  http://candle.pha.pa.us

  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

 
 
 

listen/notify argument (old topic revisited)

Post by Bruce Momji » Thu, 04 Jul 2002 16:12:58


Let me tell you what would be really interesting.  If we didn't report
the pid of the notifying process and we didn't allow arbitrary strings
for notify (just pg_class relation names), we could just add a counter
to pg_class that is updated for every notify.  If a backend is
listening, it remembers the counter at listen time, and on every commit
checks the pg_class counter to see if it has incremented.  That way,
there is no queue, no shared memory, and there is no scanning. You just
pull up the cache entry for pg_class and look at the counter.

One problem is that pg_class would be updated more frequently.  Anyway,
just an idea.

---------------------------------------------------------------------------



> > Is disk i/o a real performance
> > penalty for notify, and is performance a huge issue for notify anyway,

> Yes, and yes.  I have used NOTIFY in production applications, and I know
> that performance is an issue.

> >> The queue limit problem is a valid argument, but it's the only valid
> >> complaint IMHO; and it seems a reasonable tradeoff to make for the
> >> other advantages.

> BTW, it occurs to me that as long as we make this an independent message
> buffer used only for NOTIFY (and *not* try to merge it with SI), we
> don't have to put up with overrun-reset behavior.  The overrun reset
> approach is useful for SI because there are only limited times when
> we are prepared to handle SI notification in the backend work cycle.
> However, I think a self-contained NOTIFY mechanism could be much more
> flexible about when it will remove messages from the shared buffer.
> Consider this:

> 1. To send NOTIFY: grab write lock on shared-memory circular buffer.
> If enough space, insert message, release lock, send signal, done.
> If not enough space, release lock, send signal, sleep some small
> amount of time, and then try again.  (Hard failure would occur only
> if the proposed message size exceeds the buffer size; as long as we
> make the buffer size a parameter, this is the DBA's fault not ours.)

> 2. On receipt of signal: grab read lock on shared-memory circular
> buffer, copy all data up to write pointer into private memory,
> advance my (per-process) read pointer, release lock.  This would be
> safe to do pretty much anywhere we're allowed to malloc more space,
> so it could be done say at the same points where we check for cancel
> interrupts.  Therefore, the expected time before the shared buffer
> is emptied after a signal is pretty small.

> In this design, if someone sits in a transaction for a long time,
> there is no risk of shared memory overflow; that backend's private
> memory for not-yet-reported NOTIFYs could grow large, but that's
> his problem.  (We could avoid unnecessary growth by not storing
> messages that don't correspond to active LISTENs for that backend.
> Indeed, a backend with no active LISTENs could be left out of the
> circular buffer participation list altogether.)

> We'd need to separate this processing from the processing that's used to
> force SI queue reading (dz's old patch), so we'd need one more signal
> code than we use now.  But we do have SIGUSR1 available.

>                    regards, tom lane

--
  Bruce Momjian                        |  http://candle.pha.pa.us

  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command

 
 
 

listen/notify argument (old topic revisited)

Post by Bruce Momji » Thu, 04 Jul 2002 16:20:59




> > Of course, a shared memory system probably is going to either do it
> > sequentailly or have its own index issues, so I don't see a huge
> > advantage to going to shared memory, and I do see extra code and a queue
> > limit.

> Disk I/O vs. no disk I/O isn't a huge advantage?  Come now.

My assumption is that it throws to disk as backing store, which seems
better to me than dropping the notifies.  Is disk i/o a real performance
penalty for notify, and is performance a huge issue for notify anyway,
assuming autovacuum?

Quote:> A shared memory system would use sequential (well, actually
> circular-buffer) access, which is *exactly* what you want given
> the inherently sequential nature of the messages.  The reason that
> table storage hurts is that we are forced to do searches, which we
> could eliminate if we had control of the storage ordering.  Again,
> it comes down to the fact that tables don't provide the right
> abstraction for this purpose.

To me, it just seems like going to shared memory is taking our existing
table structure and moving it to memory.  Yea, there is no tuple header,
and yea we can make a circular list, but we can't index the thing, so is
spinning around a circular list any better than a sequential scan of a
table.  Yea, we can delete stuff better, but autovacuum would help with
that.  It just seems like we are reinventing the wheel.

Are there other uses for this? Can we make use of RAM-only tables?

Quote:> The "extra code" argument doesn't impress me either; async.c is
> currently 900 lines, about 2.5 times the size of sinvaladt.c which is
> the guts of SI message passing.  I think it's a good bet that a SI-like
> notify module would be much smaller than async.c is now; it's certainly
> unlikely to be significantly larger.

> The queue limit problem is a valid argument, but it's the only valid
> complaint IMHO; and it seems a reasonable tradeoff to make for the
> other advantages.

I am just not e*d about it.  What do others think?

--
  Bruce Momjian                        |  http://www.veryComputer.com/

  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

 
 
 

listen/notify argument (old topic revisited)

Post by Tom La » Thu, 04 Jul 2002 16:41:23



> Why can't we do efficient indexing, or clear out the table?  I don't
> remember.

I don't recall either, but I do recall that we tried to index it and
backed out the changes.  In any case, a table on disk is just plain
not the right medium for transitory-by-design notification messages.

Quote:>> A curious statement considering that PG depends critically on SI
>> working.  This is a solved problem.
> My point is that SI was buggy for years until we found all the bugs, so
> yea, it is a solved problem, but solved with difficulty.

The SI message mechanism itself was not the source of bugs, as I recall
it (although certainly the code was incomprehensible in the extreme;
the original programmer had absolutely no grasp of readable coding style
IMHO).  The problem was failure to properly design the interactions with
relcache and catcache, which are pretty complex in their own right.
An SI-like NOTIFY mechanism wouldn't have those issues.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------

 
 
 

listen/notify argument (old topic revisited)

Post by Bruce Momji » Thu, 04 Jul 2002 16:46:52




> > I don't see a huge value to using shared memory.   Once we get
> > auto-vacuum, pg_listener will be fine,

> No it won't.  The performance of notify is *always* going to suck
> as long as it depends on going through a table.  This is particularly
> true given the lack of any effective way to index pg_listener; the
> more notifications you feed through, the more dead rows there are
> with the same key...

Why can't we do efficient indexing, or clear out the table?  I don't
remember.

Quote:> > and shared memory like SI is just
> > too hard to get working reliabily because of all the backends
> > reading/writing in there.

> A curious statement considering that PG depends critically on SI
> working.  This is a solved problem.

My point is that SI was buggy for years until we found all the bugs, so
yea, it is a solved problem, but solved with difficulty.

Do we want to add another SI-type capability that could be as difficult
to get working properly, or will the notify piggyback on the existing SI
code.  If that latter, that would be fine with me, but we still have the
overflow queue problem.

--
  Bruce Momjian                        |  http://candle.pha.pa.us

  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command

 
 
 

listen/notify argument (old topic revisited)

Post by Jeff Dav » Thu, 04 Jul 2002 17:15:45



Quote:> Let me tell you what would be really interesting.  If we didn't report
> the pid of the notifying process and we didn't allow arbitrary strings
> for notify (just pg_class relation names), we could just add a counter
> to pg_class that is updated for every notify.  If a backend is
> listening, it remembers the counter at listen time, and on every commit
> checks the pg_class counter to see if it has incremented.  That way,
> there is no queue, no shared memory, and there is no scanning. You just
> pull up the cache entry for pg_class and look at the counter.

> One problem is that pg_class would be updated more frequently.  Anyway,
> just an idea.

I think that currently a lot of people use select() (after all, it's mentioned
in the docs) in the frontend to determine when a notify comes into a
listening backend. If the backend only checks on commit, and the backend is
largely idle except for notify processing, might it be a while before the
frontend realizes that a notify was sent?

Regards,
        Jeff

> ---------------------------------------------------------------------------



> > > Is disk i/o a real performance
> > > penalty for notify, and is performance a huge issue for notify anyway,

> > Yes, and yes.  I have used NOTIFY in production applications, and I know
> > that performance is an issue.

> > >> The queue limit problem is a valid argument, but it's the only valid
> > >> complaint IMHO; and it seems a reasonable tradeoff to make for the
> > >> other advantages.

> > BTW, it occurs to me that as long as we make this an independent message
> > buffer used only for NOTIFY (and *not* try to merge it with SI), we
> > don't have to put up with overrun-reset behavior.  The overrun reset
> > approach is useful for SI because there are only limited times when
> > we are prepared to handle SI notification in the backend work cycle.
> > However, I think a self-contained NOTIFY mechanism could be much more
> > flexible about when it will remove messages from the shared buffer.
> > Consider this:

> > 1. To send NOTIFY: grab write lock on shared-memory circular buffer.
> > If enough space, insert message, release lock, send signal, done.
> > If not enough space, release lock, send signal, sleep some small
> > amount of time, and then try again.  (Hard failure would occur only
> > if the proposed message size exceeds the buffer size; as long as we
> > make the buffer size a parameter, this is the DBA's fault not ours.)

> > 2. On receipt of signal: grab read lock on shared-memory circular
> > buffer, copy all data up to write pointer into private memory,
> > advance my (per-process) read pointer, release lock.  This would be
> > safe to do pretty much anywhere we're allowed to malloc more space,
> > so it could be done say at the same points where we check for cancel
> > interrupts.  Therefore, the expected time before the shared buffer
> > is emptied after a signal is pretty small.

> > In this design, if someone sits in a transaction for a long time,
> > there is no risk of shared memory overflow; that backend's private
> > memory for not-yet-reported NOTIFYs could grow large, but that's
> > his problem.  (We could avoid unnecessary growth by not storing
> > messages that don't correspond to active LISTENs for that backend.
> > Indeed, a backend with no active LISTENs could be left out of the
> > circular buffer participation list altogether.)

> > We'd need to separate this processing from the processing that's used to
> > force SI queue reading (dz's old patch), so we'd need one more signal
> > code than we use now.  But we do have SIGUSR1 available.

> >                       regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

 
 
 

listen/notify argument (old topic revisited)

Post by Hannu Krosi » Thu, 04 Jul 2002 19:07:19



> > Of course, a shared memory system probably is going to either do it
> > sequentailly or have its own index issues, so I don't see a huge
> > advantage to going to shared memory, and I do see extra code and a queue
> > limit.

> Is a shared memory implementation going to play silly *s with the Win32
> port?

Perhaps this is a good place to introduce anonymous mmap ?

Is there a way to grow anonymous mmap on demand ?

----------------
Hannu

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.veryComputer.com/

 
 
 

listen/notify argument (old topic revisited)

Post by Hannu Krosi » Thu, 04 Jul 2002 19:14:59




> > Is disk i/o a real performance
> > penalty for notify, and is performance a huge issue for notify anyway,

> Yes, and yes.  I have used NOTIFY in production applications, and I know
> that performance is an issue.

> >> The queue limit problem is a valid argument, but it's the only valid
> >> complaint IMHO; and it seems a reasonable tradeoff to make for the
> >> other advantages.

> BTW, it occurs to me that as long as we make this an independent message
> buffer used only for NOTIFY (and *not* try to merge it with SI), we
> don't have to put up with overrun-reset behavior.  The overrun reset
> approach is useful for SI because there are only limited times when
> we are prepared to handle SI notification in the backend work cycle.
> However, I think a self-contained NOTIFY mechanism could be much more
> flexible about when it will remove messages from the shared buffer.
> Consider this:

> 1. To send NOTIFY: grab write lock on shared-memory circular buffer.

Are you planning to have one circular buffer per listening backend ?

Would that not be waste of space for large number of backends with long
notify arguments ?

--------------
Hannu

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

 
 
 

listen/notify argument (old topic revisited)

Post by Rod Tayl » Thu, 04 Jul 2002 20:20:40





> > > Of course, a shared memory system probably is going to either do it
> > > sequentailly or have its own index issues, so I don't see a huge
> > > advantage to going to shared memory, and I do see extra code and a queue
> > > limit.

> > Disk I/O vs. no disk I/O isn't a huge advantage?  Come now.

> My assumption is that it throws to disk as backing store, which seems
> better to me than dropping the notifies.  Is disk i/o a real performance
> penalty for notify, and is performance a huge issue for notify anyway,
> assuming autovacuum?

For me, performance would be one of the only concerns. Currently I use
two methods of finding changes, one is NOTIFY which directs frontends to
reload various sections of data, the second is a table which holds a
QUEUE of actions to be completed (which must be tracked, logged and
completed).

If performance wasn't a concern, I'd simply use more RULES which insert
requests into my queue table.

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command

 
 
 

listen/notify argument (old topic revisited)

Post by Tom La » Thu, 04 Jul 2002 22:38:42



> Is a shared memory implementation going to play silly *s with the Win32
> port?

No.  Certainly no more so than shared disk buffers or the SI message
facility, both of which are *not* optional.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command

 
 
 

listen/notify argument (old topic revisited)

Post by Tom La » Thu, 04 Jul 2002 23:03:25



> Perhaps this is a good place to introduce anonymous mmap ?

I don't think so; it just adds a portability variable without buying
us anything.

Quote:> Is there a way to grow anonymous mmap on demand ?

Nope.  Not portably, anyway.  For instance, the HPUX man page for mmap
sayeth:

     If the size of the mapped file changes after the call to mmap(), the
     effect of references to portions of the mapped region that correspond
     to added or removed portions of the file is unspecified.

Dynamically re-mmapping after enlarging the file might work, but there
are all sorts of interesting constraints on that too; it looks like
you'd have to somehow synchronize things so that all the backends do it
at the exact same time.

On the whole I see no advantage to be gained here, compared to the
implementation I sketched earlier with a fixed-size shared buffer and
enlargeable internal buffers in backends.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command

 
 
 

listen/notify argument (old topic revisited)

Post by Tom La » Thu, 04 Jul 2002 23:11:47



> Are you planning to have one circular buffer per listening backend ?

No; one circular buffer, period.

Each backend would also internally buffer notifies that it hadn't yet
delivered to its client --- but since the time until delivery could vary
drastically across clients, I think that's reasonable.  I'd expect
clients that are using LISTEN to avoid doing long-running transactions,
so under normal circumstances the internal buffer should not grow very
large.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

 
 
 

listen/notify argument (old topic revisited)

Post by Hannu Krosi » Thu, 04 Jul 2002 23:53:06




> > Are you planning to have one circular buffer per listening backend ?

> No; one circular buffer, period.

> Each backend would also internally buffer notifies that it hadn't yet
> delivered to its client --- but since the time until delivery could vary
> drastically across clients, I think that's reasonable.  I'd expect
> clients that are using LISTEN to avoid doing long-running transactions,
> so under normal circumstances the internal buffer should not grow very
> large.

>                    regards, tom lane
> 2. On receipt of signal: grab read lock on shared-memory circular
> buffer, copy all data up to write pointer into private memory,
> advance my (per-process) read pointer, release lock.  This would be
> safe to do pretty much anywhere we're allowed to malloc more space,
> so it could be done say at the same points where we check for cancel
> interrupts.  Therefore, the expected time before the shared buffer
> is emptied after a signal is pretty small.

> In this design, if someone sits in a transaction for a long time,
> there is no risk of shared memory overflow; that backend's private
> memory for not-yet-reported NOTIFYs could grow large, but that's
> his problem.  (We could avoid unnecessary growth by not storing
> messages that don't correspond to active LISTENs for that backend.
> Indeed, a backend with no active LISTENs could be left out of the
> circular buffer participation list altogether.)

There could a little more smartness here to avoid unneccessary copying
(not just storing) of not-listened-to data. Perhaps each notify message
could be stored as

(ptr_to_next_blk,name,data)

so that the receiving backend could skip uninetersting (not-listened-to)
messages.

I guess that depending on the circumstances this can be either faster or
slower than copying them all in one memmove.

This will be slower if all messages are interesting, this will be an
overall win if there is one backend listening to messages with big
dataload and lots of other backends listening to relatively small
messages.

There are scenarios where some more complex structure will be faster (a
sparse communication structure, say 1000 backends each listening to 1
name and notifying ten others - each backend has to (manually ;) check
1000 messages to find the one that is for it) but your proposed
structure seems good enough for most common uses (and definitely better
than the current one)

---------------------
Hannu

---------------------------(end of broadcast)---------------------------

 
 
 

1. listen/notify argument (old topic revisited)

I have not seen any indication that the corresponding scan in the SI
code is a bottleneck --- and that has to scan over *all* backends,
without even the opportunity to skip those that aren't LISTENing.

It's no different from before --- notify messages don't get into the
buffer at all, until they're committed.  See my earlier response to Neil.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

2. multiple apps using a single odbc source

3. LISTEN/NOTIFY

4. SQL Update from another table

5. LISTEN/NOTIFY benchmarks?

6. .dbf file security on a LAN

7. Q: ORA-02710 osnpop: fork failed (on HP-UX)

8. NOTIFY/LISTEN Using Embedded SQL (ecpg)

9. Synchronous LISTEN/NOTIFY?

10. LISTEN/NOTIFY

11. problem with notify/listen

12. Notify argument?