File I/O vs. Piping

File I/O vs. Piping

Post by Haizhou Ch » Sun, 22 Jun 1997 04:00:00



Hi, all:

I have a question about performance and implementation about File I/O vs.
Piping.

Right now I have a program of A which produces  output file A.output. I have
a program B reading A.output to do some postprocessing. So the flow looks like
this:

             A --> A.output --> B

Since A.output if very large ( ~100MB ), it takes a long time to write and
read the file. I am looking into the possibility to set up a pipe between A
and B to transfer the data. I guess I can use either socket or pipe to do
it. Anyone having the experience how much performance will I gain and which
way is easier to implement( socket or stream ) ?

Now suppose A is a lot faster than B, I think it will result in a lot of
data storing in an internal buffer. Will it cause some memory problem?

Thank you very much for your answer.

Haizhou Chen

 
 
 

File I/O vs. Piping

Post by Barry Margoli » Sun, 22 Jun 1997 04:00:00




>Since A.output if very large ( ~100MB ), it takes a long time to write and
>read the file. I am looking into the possibility to set up a pipe between A
>and B to transfer the data. I guess I can use either socket or pipe to do
>it. Anyone having the experience how much performance will I gain and which
>way is easier to implement( socket or stream ) ?

On many systems, there's no performance difference between pipes and Unix
domain sockets, as they actually use the same internal mechanism.  Pipes
are generally easier to program; for instance, you can simply use popen(3)
to run another command with a one-directional pipe between the parent and
child.

Quote:>Now suppose A is a lot faster than B, I think it will result in a lot of
>data storing in an internal buffer. Will it cause some memory problem?

If the pipe buffer gets full, A will be blocked the next time it tries to
write.  If A need to run continuously, you'll have to make use of
non-blocking I/O, and the data will have to be buffered in A's memory if
the buffer fills.

--

BBN Corporation, Cambridge, MA
Support the anti-spam movement; see <http://www.cauce.org/>

 
 
 

File I/O vs. Piping

Post by Haizhou Ch » Mon, 23 Jun 1997 04:00:00


Hi, all:

I have a question about performance and implementation about File I/O vs.
Piping.

Right now I have a program of A which produces  output file A.output. I have
a program B reading A.output to do some postprocessing. So the flow looks
like
this:

             A --> A.output --> B

Since A.output if very large ( ~100MB ), it takes a long time to write and
read the file. I am looking into the possibility to set up a pipe between A
and B to transfer the data. I guess I can use either socket or pipe to do
it. Anyone having the experience how much performance will I gain and which
way is easier to implement( socket or stream ) ?

Now suppose A is a lot faster than B, I think it will result in a lot of
data storing in an internal buffer. Will it cause some memory problem?

Thank you very much for your answer.

Haizhou Chen

 
 
 

File I/O vs. Piping

Post by James Youngma » Tue, 24 Jun 1997 04:00:00



> Since A.output if very large ( ~100MB ), it takes a long time to write and
> read the file. I am looking into the possibility to set up a pipe between A
> and B to transfer the data. I guess I can use either socket or pipe to do
> it. Anyone having the experience how much performance will I gain and which
> way is easier to implement( socket or stream ) ?

> Now suppose A is a lot faster than B, I think it will result in a lot of
> data storing in an internal buffer. Will it cause some memory problem?

Whether it's faster to use pipes or intermediate files depends on many
factors; sometimes one is faster, sometimes the other.  For example,
compilers do this (the C compiler converts C into assembly and the
assembler converts that into object files); the GNU C compiler will
operate in either mode becuase sometimes one is faster than the other.

It sounds like your data-processing is, or can be, a multi-stage
process.  In this case, it is very usual in Unix programs to take your
input from stdin and send your output to stdout, for many reasons:-

1) You can always get output to file by doing
        programA > output-file-name

2) You can build up longer pipes
        programA | programB | programC | ...

3) You can have alternative processing
        programA | programB | programD | ...

Addtitionally it it very useful to have both the input and output
human-readable text.  The advantages are:-

1) You're more likely to be able to tell if it's wrong

2) You can generate test-cases by hand

3) You can process output with the usual Unix tools (awk, sed, grep,
more, perl, and so on).

 
 
 

File I/O vs. Piping

Post by Icarus Spar » Wed, 25 Jun 1997 04:00:00




>I have a question about performance and implementation about File I/O vs.
>Piping.

>Right now I have a program of A which produces  output file A.output. I have
>a program B reading A.output to do some postprocessing. So the flow looks
>like
>this:

>             A --> A.output --> B

>Since A.output if very large ( ~100MB ), it takes a long time to write and
>read the file. I am looking into the possibility to set up a pipe between A
>and B to transfer the data. I guess I can use either socket or pipe to do
>it. Anyone having the experience how much performance will I gain and which
>way is easier to implement( socket or stream ) ?

The simplest way to do this is to use the 'sh' programming language which
can start up programs A and B, connected by a pipe. All you need to do is
type

        A | B

You can set up named pipes, or sockets, but this seems overkill for what
you are describing. It is likley that it will be quite a lot faster than
writing to a disk file as the pipe contents are usually held totally in
memory, but anything which produces 100MB of data is not going to be
very fast.

Quote:>Now suppose A is a lot faster than B, I think it will result in a lot of
>data storing in an internal buffer. Will it cause some memory problem?

No, Unix will split the time between the two programs so that if B takes
twice as long to process the data as A does to produce it then B will
get about twice as much CPU time as A. This is because when the pipe
between A and B becomes full, (typical pipes have about 10k capacity), then
process A is suspended. When there is space in the pipe then it will be
resumed automatically.

Icarus

 
 
 

File I/O vs. Piping

Post by Howard C » Wed, 25 Jun 1997 04:00:00



%Hi, all:

%I have a question about performance and implementation about File I/O vs.
%Piping.

%Right now I have a program of A which produces  output file A.output. I have
%a program B reading A.output to do some postprocessing. So the flow looks
%like
%this:

%             A --> A.output --> B

%Since A.output if very large ( ~100MB ), it takes a long time to write and
%read the file. I am looking into the possibility to set up a pipe between A
%and B to transfer the data. I guess I can use either socket or pipe to do
%it. Anyone having the experience how much performance will I gain and which
%way is easier to implement( socket or stream ) ?

All else being equal, a pipe ought to be faster than the file, since you
avoid doing actual disk I/O. On most BSD-derived systems, a pipe is simply
a Unix domain socket, so there will be no difference between pipe & socket.

%Now suppose A is a lot faster than B, I think it will result in a lot of
%data storing in an internal buffer. Will it cause some memory problem?

No, but it will cause A to slow down to match B's speed. The pipe will only
buffer so much data, after which it will block the writing process and
prevent it from continuing. When the pipe buffer gets empty enough the
writer will be allowed to resume. (This is assuming that you have not
issued an ioctl to use non-blocking mode.)
--
Howard Chu                              Principal Member of Technical Staff

Adverti*ts proof-read for US$100 per word. Submission of your ad to my
email address constitutes your acceptance of these terms.

 
 
 

File I/O vs. Piping

Post by Lucio Chiappett » Fri, 27 Jun 1997 04:00:00



Quote:> It sounds like your data-processing is, or can be, a multi-stage
> process.  In this case, it is very usual in Unix programs to take your
> input from stdin and send your output to stdout, for many reasons:-

  That is usual in Unix, but Unix is not representative of typical DATA
  processing, more of TEXT (or ASCII file processing).

  Typical scientific DATA processing involves programs operating on binary
  data files (somewhat large), and either altering them in place, or
  producing a separate output file. Sometimes extra files are used as
  "control files". Almost always stdin is used for control, and stdout
  for diagnostics and status info.

----------------------------------------------------------------------------
Lucio Chiappetti - IFCTR/CNR - via Bassini 15 - I-20133 Milano (Italy)      
----------------------------------------------------------------------------
Fuscim donca de Miragn        E tornem a sta scio' in Bregn                
Che i fachign e i cortesagn   Magl' insema no stagn begn                    
Drizza la', compa' Tapogn                            (Rabisch, II 41, 96-99)
----------------------------------------------------------------------------
For more info : http://www.ifctr.mi.cnr.it/~lucio/personal.html            
----------------------------------------------------------------------------

 
 
 

File I/O vs. Piping

Post by James Youngma » Sat, 28 Jun 1997 04:00:00




> > It sounds like your data-processing is, or can be, a multi-stage
> > process.  In this case, it is very usual in Unix programs to take your
> > input from stdin and send your output to stdout, for many reasons:-

>   That is usual in Unix, but Unix is not representative of typical DATA
>   processing, more of TEXT (or ASCII file processing).

>   Typical scientific DATA processing involves programs operating on binary
>   data files (somewhat large), and either altering them in place, or
>   producing a separate output file. Sometimes extra files are used as
>   "control files". Almost always stdin is used for control, and stdout
>   for diagnostics and status info.

I am aware that many scientific data-processing programs work like
this; indeed, I have a degree in physics.  However, that does not mean
that this is the best way to do it, it just means that that is the way
in which it is usually done.

I have in mind a case study of a program which worked on external
files and took control information from stdin.

Unfortunately it was worked on by a succession of people whose primary
insterest was not the correctness of the program (but was instead the
physics of the end result), and it reached the stage where the program
required thirty megabytes of _control_ information as input for each
run of the simulation.  This input was itself crucial to the correct
operation of the program, but impossible to check.  Of course this an
extreme (but genuine) example.

For a (IMHO) better way of doing things, see Jon Bentley's column in
the June 1987 issue of "Communications of the ACM", which is also
collected in his book, ISBN 0-201-11889-0.

 
 
 

1. Named Pipes and FILE vs file descriptors

        I'm trying to make a random signature program in C that uses
named pipes and I've come across a few problems.

        If I open the pipe with open(".signature",O_WRONLY) I don't have
any problems with the writing.  It waits for my mail program to read the
signature before it writes to the file, as it should.  But if I end up
replacing one signature with another smaller one, anything beyond the
size of the new one is left in.  (I could just fill it up with white
space, but that's obviously not the best solution.)  So how do I write
an end of file so that my mail program knows to stop reading?

        The only method I could come up with is not to use open, but to
use fopen.  But in that case my program doesn't block upon accessing the
pip and just runs and runs.  Again, that's not a major problem, and the
program would work fine otherwise, but there must be a better solution
out there.  Is the problem that you must use file descriptors and not
FILE *'s to fully use the benefits of pipes?  If so, that brings me back
to the first question.

        Thanks for any help.

--Anthony

2. What to do about netcop slander??

3. pipes vs stream pipes

4. COLA FAQ 4 of 7 24-Aug-2002

5. Regular pipe vs. Named Pipe

6. pfmod and FDDI

7. Pipes vs. FIFO's vs ????

8. Setup Problems

9. Linux vs OS2 vs NT vs Win95 vs Multics vs PDP11 vs BSD geeks

10. Buffered file reads, and speeds of C prog vs. wc vs. awk vs. perl

11. Piping date into a file after piping withwc???

12. dump pipe gzip pipe ssh pipe dd... blocksize?

13. Perfomance: tar vs ftp vs rsync vs cp vs ?