"tee", but with fast writer, 1 slow reader and 1 fast reader

"tee", but with fast writer, 1 slow reader and 1 fast reader

Post by Ben Cha » Thu, 12 Jun 2003 08:04:15



I'm extending a data acquisition app, written in C.  It will acquire a
large amount of data (e.g. from a PCI card on an Sparc Ultra5 running
Solaris 9, I think), perhaps 10 or more Gibytes.  So, we're into
largefiles.  The acquiring process as it stands now is a 32-bit app,
although I _suppose_ that could change if that's required for the
solution.  The acquired data need to go two places, sort of like
"tee":

1. The data need to be written to a local disk.

2. The data also need to be sent to another process, running on
another machine.  The data could be sent to the other process as its
standard input (popen("rsh othermachine otherprocess"), or somesuch, I
guess).  Or it could be written to a named pipe, I suppose.  (Do named
pipes work if either the reader or writer is looking at an NFS
filesystem?)  It could be some other way of feeding a remote process
that you thought of that I didn't.

The problem is that the data must be collected in a timely fashion,
but the remote consumer might be slow.  (On the other hand, the remote
consumer might be fast enough to keep up and indeed be mostly
waiting.)  And the amount of data is large, in excess of the RAM in
the machine, but not more than the local disk.  So we don't want the
acquiring thread/process to block because it has filled up all
buffers.  And the buffers perhaps can't simply be made larger (or can
they?)

Is a reasonable solution to have multiple pthreads within the "tee"
process, one writing the data to disk, and the other reading the data
back from the disk and sending it to the remote consumer?  These two
threads would communicate (synchronize) so that the one reading from
disk never tried to get ahead of the one writing to disk.  I don't
want the slow one looping waiting for more data (maybe it's not that
slow); I'd rather it somehow block and be alerted when more data are
available.

If I instead went with the sort of obvious solution of NFS mounting
the local file system onto the remote machine, and having the slow
reader running on the remote machine, open the remote file itself,
etc., is there any way for that remote process to know how to wait for
more data to be written to the file?  (It will get EOF when more data
might be still coming, right?) Is there any way for it to know when
the writer of that file has called close()?

- Ben Chase
(Any email address above is junk, for spam-avoidance.

".foo")

 
 
 

"tee", but with fast writer, 1 slow reader and 1 fast reader

Post by Greg Andre » Thu, 12 Jun 2003 08:37:40



>The problem is that the data must be collected in a timely fashion,
>but the remote consumer might be slow.  (On the other hand, the remote
>consumer might be fast enough to keep up and indeed be mostly
>waiting.)  And the amount of data is large, in excess of the RAM in
>the machine, but not more than the local disk.  So we don't want the
>acquiring thread/process to block because it has filled up all
>buffers.  And the buffers perhaps can't simply be made larger (or can
>they?)

Sounds just like the problems faced (and solved) by a Usenet news
server.  Need to receive and store/crunch the data locally as fast
as your hardware will allow, yet also feed the data to a remote
consumer at speeds that may differ.

Most Usenet servers solve this by variations on the same basic theme,
a disk spool that can act as a buffer between the rate data flows
into your machine and the rate it flows out to the remote server.
Some "transit only" servers use a memory-based spool rather than
a disk-based one.

Just a thought...

  -Greg
--
Do NOT reply via e-mail.
Reply in the newsgroup.

 
 
 

"tee", but with fast writer, 1 slow reader and 1 fast reader

Post by Ben Cha » Thu, 12 Jun 2003 14:21:09



> > [...] amount of data is large, in excess of the RAM in
> >the machine, but not more than the local disk.  [...]

> Sounds just like the problems faced (and solved) by a Usenet news
> server.  Need to receive and store/crunch the data locally as fast
> as your hardware will allow, yet also feed the data to a remote
> consumer at speeds that may differ.

Yeah, thanks for the thought.  One monkey wrench is that the data is
monolithic - a blob of perhaps 10 Gbyte, rather unlike these easily
swallowed Usenet postings.


without the ".foo".  Ignore the header address.)

 
 
 

"tee", but with fast writer, 1 slow reader and 1 fast reader

Post by Greg Andre » Fri, 13 Jun 2003 00:10:39



>Yeah, thanks for the thought.  One monkey wrench is that the data is
>monolithic - a blob of perhaps 10 Gbyte, rather unlike these easily
>swallowed Usenet postings.

That just means you receive it in an unbroken stream and must dispense
it in an unbroken stream.  It doesn't mean you must spool it that way.

  -Greg
--
Do NOT reply via e-mail.
Reply in the newsgroup.

 
 
 

"tee", but with fast writer, 1 slow reader and 1 fast reader

Post by Lyle Merda » Fri, 13 Jun 2003 00:26:48



:> > [...] amount of data is large, in excess of the RAM in
:> >the machine, but not more than the local disk.  [...]


:> Sounds just like the problems faced (and solved) by a Usenet news
:> server.  Need to receive and store/crunch the data locally as fast
:> as your hardware will allow, yet also feed the data to a remote
:> consumer at speeds that may differ.

: Yeah, thanks for the thought.  One monkey wrench is that the data is
: monolithic - a blob of perhaps 10 Gbyte, rather unlike these easily
: swallowed Usenet postings.


: without the ".foo".  Ignore the header address.)

Well I'm not sure how the data is written to disk in your application
but if it's anything like say the messages file I would let the data
get appended to the source data file. Then once you have started writing
the data to the wile fire off something like this to pump the data across the
network as fast as possible:

tail -f datafilename | gzip -1 -c - | rsh remotehost '( cd /dir_to_extract_to; gzip -d - > datafilename )'

Im not sure what type of data you are moving but if it's at all
compressable the gzip -1 will help keep the pipe full...

Have Fun!

Lyle