I am looking for help in solving a perfomance problem with UNIX V.4.
I am developing a product that must control a check Reader/Sorter that
processes 1000 documents per minute. The problem is a very narrow, 37
millisecond, window for processing each check. Communications is trhrough
a non-standard RS422 paralles interface. The communications board and
UNIX driver were developed outside my group especially for this project.
The processor is an 80386 running at 20 MHZ.
The Reader/Sorter sends a data packet for a check and requires that a
sort decision (and other control data) for that check be returned within
37-40 milliseconds. If the sort decision is not received within that
window, the operator must inervene and perform a recovery procedure.
Consequently, we run our application as a real-time process. At the
moment, we are running at the highes real-time priority with an infinite
time slice. We have also dropped down to init level 1 and killed cron
and sac for many of our tests. Most of the time, we are making the sort
window with time to spare, but we have not been able to run for longer that
20 minutes (20,000 documents), without getting a late pocket decision.
20 minutes is the longest we have ever run, most runs crash after about
5 minutes (5,000 documents).
We have used the profiler, the system activity reporter and the clock()
function to try to isolate the problem, but none of these tools is quite
what we need. The profiler won't show the times for each individual
document or the maximum time in each function. (The averages that it
does show are very low.) The system activity report tells us that very little
time is spent executing user code (typically 2% - 3%. System time is 20%
to 30%. The rest of the time the processor is idle). The clock command
would be more helpful if it supported a resolution finer than 10
milliseconds. It has shown however, that we can get a late pocket even
when we spend less than 10 milliseconds in the application.
The UNIX documentation says that, for an application such as ours, the
developers should know "typical time to preemption," "maximum time to
preemption," and "software switch latency," but the vendor cannot supply
this information.
While the critical code is running, we are seeing some disk activity that
we cannot explain.
We are running tow time-sharing processes concurrent with the real-time
process, but one of them is waiting for serial I/O and the other is the
parent process waiting for the death of a child.
The real time process has been locked into memory so no swapping should
be taking place. This has been confirmed by the system activities report.
All help will be greatly appreciated.
--
--Mark