Design question

Design question

Post by Penguin Nouvea » Sun, 15 Jun 2003 06:51:47



This is not a school assignment.  :)

I'm designing an application that has three main processing stages:

Stage 1: data capture;
Stage 2: data filter; and
Stage 3: data analysis.  

The data filter (2) and analysis (3) stages reduce the amount of data
each is initially faced with.  The output of the analysis stage (3) is a
series of summarizations of related input records.

The data are network packets, each handed to me as a unsigned char *
pointing to the packet.  The average length is less than 512bytes, but I
don't yet have a good distribution of the lengths.

I can't lose any input.

The data coming into the capture stage is bursty, naturally, and may
reach peaks of 50MBps.  The average input rate is much lower.

I'm torn between designing these using threads or as three distinct
processes that use shared memory/semaphores.  

Threads would work against a single large shared memory segment,
probably about 50MB.

As three distinct process, I'm additionally torn between using either
two pools of shared memory (between stages 1 and 2, and 2 and 3) or as a
single large pool of shared memory with some additional housekeeping
structures to keep the processing state of a given record straight
between all processes.

The analysis stage is most probably going to be augmented over time.  
The first two stages are probably static once developed and debugged.

It's sounding like three processes and two shm segments is the way to
go, but I don't work with anyone else technical enough to bounce these
design ideas off of.  Thanks for any discussion.

--
Penguin Nouveau

 
 
 

Design question

Post by Marc Rochkin » Sun, 15 Jun 2003 07:12:42




Quote:> This is not a school assignment.  :)

> I'm designing an application that has three main processing stages:

> Stage 1: data capture;
> Stage 2: data filter; and
> Stage 3: data analysis.

> The data filter (2) and analysis (3) stages reduce the amount of data
> each is initially faced with.  The output of the analysis stage (3) is a
> series of summarizations of related input records.

> The data are network packets, each handed to me as a unsigned char *
> pointing to the packet.  The average length is less than 512bytes, but I
> don't yet have a good distribution of the lengths.

> I can't lose any input.

> The data coming into the capture stage is bursty, naturally, and may
> reach peaks of 50MBps.  The average input rate is much lower.

> I'm torn between designing these using threads or as three distinct
> processes that use shared memory/semaphores.

> Threads would work against a single large shared memory segment, probably
> about 50MB.

> As three distinct process, I'm additionally torn between using either two
> pools of shared memory (between stages 1 and 2, and 2 and 3) or as a
> single large pool of shared memory with some additional housekeeping
> structures to keep the processing state of a given record straight
> between all processes.

> The analysis stage is most probably going to be augmented over time.  The
> first two stages are probably static once developed and debugged.

> It's sounding like three processes and two shm segments is the way to go,
> but I don't work with anyone else technical enough to bounce these design
> ideas off of.  Thanks for any discussion.

Based solely on what's in your post, I'd use three processes along with
shared memory and semaphores. The chief reason why is that you have a
natural fit for that arrangement, and you can avoid all the problems of
threads, most of which derive from the fact that all data is shared by
default, not just the data you want shared.

In addition, with processes you can plan things so each can be developed
and debugged separately, possibly even by different people, This is much
harder to do with threads.

If you use POSIX shared memory, you could develop with memory-mapped files,
thus allowing you to keep running, say, phase 2, without running phase 1
first. The APIs for shared-memory segments and memory-mapped files are very
close (e.g., mmap is used for both, and only the open calls differ).

Check on the availability of POSIX shared memory on your target systems,
however, While many systems have mmap, not all have shm_open. You'll find
System V shared memory and semaphores much more widely supported.

Also, I'd use separate shared memory segements for 1-2 and 2-3, as it
simplifies the programming and makes debugging easier.

--Marc

 
 
 

Design question

Post by Alex Colvi » Sun, 15 Jun 2003 09:58:04


Quote:>> I'm torn between designing these using threads or as three distinct
>> processes that use shared memory/semaphores.

Think of it this way:
Threads share everything by default. Processes have everything private by
default.

How about three state machines and a dispatcher? Easier to debug.
--
        mac the na?f

 
 
 

Design question

Post by Penguin Nouvea » Fri, 20 Jun 2003 11:06:22





> > This is not a school assignment.  :)

> > I'm designing an application that has three main processing stages:

> > Stage 1: data capture;
> > Stage 2: data filter; and
> > Stage 3: data analysis.

> > The data filter (2) and analysis (3) stages reduce the amount of data
> > each is initially faced with.  The output of the analysis stage (3) is a
> > series of summarizations of related input records.

> > The data are network packets, each handed to me as a unsigned char *
> > pointing to the packet.  The average length is less than 512bytes, but I
> > don't yet have a good distribution of the lengths.

> > I can't lose any input.

> > The data coming into the capture stage is bursty, naturally, and may
> > reach peaks of 50MBps.  The average input rate is much lower.

> > I'm torn between designing these using threads or as three distinct
> > processes that use shared memory/semaphores.

> > Threads would work against a single large shared memory segment, probably
> > about 50MB.

> > As three distinct process, I'm additionally torn between using either two
> > pools of shared memory (between stages 1 and 2, and 2 and 3) or as a
> > single large pool of shared memory with some additional housekeeping
> > structures to keep the processing state of a given record straight
> > between all processes.

> > The analysis stage is most probably going to be augmented over time.  The
> > first two stages are probably static once developed and debugged.

> > It's sounding like three processes and two shm segments is the way to go,
> > but I don't work with anyone else technical enough to bounce these design
> > ideas off of.  Thanks for any discussion.

> Based solely on what's in your post, I'd use three processes along with
> shared memory and semaphores. The chief reason why is that you have a
> natural fit for that arrangement, and you can avoid all the problems of
> threads, most of which derive from the fact that all data is shared by
> default, not just the data you want shared.

I was thinking along the same lines, but wanted to run it up the
flagpole of public review.

Quote:> In addition, with processes you can plan things so each can be developed
> and debugged separately, possibly even by different people, This is much
> harder to do with threads.

Unfortunately, I'm the "team" for this particular part of the project.

Thanks very much--your comments were helpful.

 
 
 

Design question

Post by Marc Rochkin » Fri, 20 Jun 2003 12:15:29




[snip]

Quote:

> Thanks very much--your comments were helpful.

You're welcome!

--Marc

 
 
 

1. Network design question.

I am looking on some thoughts from you all on a network design question.

Lets say I have 3 sites all connected together like a triangle via fiber
(figure Cisco routers). At each site there is also an internet
connection on another Cisco router. I am running BGP to the net at each
site under the same AS. Everything is great because if one internet
connection goes down it routes to the other. Yah BGP!

Now the issue.

I need to bring the 3 site internal, ie 10.1.x.x, 10.2.x.x, 10.3.x.x.
Great. Put in an OpenBSD firewall between the internet and the LAN.
However I now loose my BGP failover for outbound and inbound traffic if
one of the sites goes down. What I mean is this..

(site1): internet---router---firewall---lan---router to site2 and site3

(site2): internet---router---firewall---lan---router to site1 and site3

(site3): internet---router---firewall---lan---router to site1 and site2

Draw it out as a triangle if it helps to see it.

On each site the default gateway is the firewall. If the router goes
down to the internet at that site there is no way to get the traffic to
go back throught the firewall (and NAT) then out to the next site and
out that firewall.

So thoughts?

Here is what I have so far.

1. Run iBGP between the external and internal routers through the
OpenBSD firewall. I have NAT issues then with in bound packets. The
biggest issues is how do I tell OpenBSD that the internet router is down
. GateD?

2. Write a script that pings the serial interface on the internet router
from the OpenBSD box. If the ping fails then change the default gw on
the box to the internal ethernet on the OpenBSD box at the next site
(would have to hard code the route to the next site on the OpenBSD box).
My question on this is where does the NAT fall into place. Will the
packet head back off the OpenBSD box to the internal router before it
gets NAT'ed if I change the default route?

2. Complete Unix Illiterate

3. design question - new protocol handler

4. Prolog Interpreter

5. Data locking design question

6. Help: Recommendation for 4-Port Ethernet

7. Network application design question

8. Testing physical memory

9. Design question - Solaris startup scripts

10. DNS, firewall/network design question

11. Implementation of select - A design question.

12. Design-Question: end_that_request_* and bh->b_end_io hooks

13. Network design question.