TechTips: A matter of sorts ...

TechTips: A matter of sorts ...

Post by Sundial Service » Fri, 30 Mar 2001 13:00:18

Consider the following scenario:  In a pharmaceutical application having
a database of approximately 8,600 production batches and 23,900 purchase
orders made against a roster of about 2,900 raw materials, we wish to
update raw-material records with current inventory-totals (being the
total amount purchased of these raw materials, less the amount

This is a problem of a size that becomes "interesting," that is to say,
large enough to bring out inherent performance-problems in the
particular algorithm or approach that you use.  There is definitely a
"right way" and a "wrong way" to tackle a problem like this.

Some of the gotchas:

        - The sheer number of combinations .. random searches are "NO!"

        - A potential for very large result-sets; that is, answer-tables
          or query-results comprising tens of thousands or hundreds of
          thousands of rows.

        - A potential for a very large, very intrusive query/process
          that locks most, if not all, of the rows in the database.

        - A potential for very "unfriendly" abuse of the database
          manager's (or the operating system's) attempts at cacheing.
          A whole lot of -physical- input/output (I/O) could occur as
          the disk-drives thrash themselves into oblivion.

        - A potential that the database server can "choke" upon the
          amount of resources being requested or committed at one time,
          unless we consciously avoid it.  (Database servers will give
          almost any request a good college try, even when the request
          as-designed is unreasonable and doomed to fail.)

And now for some possible solutions:

(1)  "Random seeks of any kind are Bad.  But sorting is surprisingly
Good."  The computer can arrange a list of 100,000 records into
ascending or descending order much more rapidly than it can search
randomly for 100,000 records (even using an index).

(2)  When you know a list of records is sorted, you know that all
records having the same key must occur together.  When a gap in values
occurs, you know that nothing can possibly be within that gap.  When
several records occur together which have the same key, the net-effect
of all of those records can be lumped together into just one change to
be applied to a master file somewhere.  Many databases group records by
key-value so that the odds are good that the records we'll need to
update next are already in memory.

(3)  A problem like this one can be reasonably divided into smaller
stages that can be completed together.  For example, with 2,900 raw
materials, we could issue 290 separate groups of queries that only
concern themselves with 10 possible raw-materials, or a range of values.
Each of these queries is smaller, more "lightweight," than a massive
query involving everything.  Empirical tests should be performed to
determine what produces the best overall performance.

(4)  A problem like this one can be -reasonably- subdivided.  The counts
for raw-material X are independent of those for raw-material Y.  So the
problem can be divided, in order to be conquered, without affecting
accuracy.  A set of database changes can be performed, committed, and
thereby cleared-out of the database server's way, so to speak, before
the next batch of changes is begun.

Problems that involve large numbers of records, or large combinetrics,
or (double-bad!) both at the same time, will definitely bring the most
modern computer to its knees, kicking and screaming on the way down, if
they are designed insensitively.  A little awareness of the basic
considerations of how the problem is -presented- to the computer (thus
to determine how the computer will try to go about -solving- it) can
make an astonishing difference in completion speed.

? Sundial Services :: Scottsdale, AZ (USA) :: (480) 946-8259

Quote:> Fast(!), automatic table-repair with two clicks of the mouse!
> ChimneySweep(R):  "Click click, it's fixed!" {tm}