Q: Why OLAP as compared to PL-SQL? Speed or convenience or both?

Q: Why OLAP as compared to PL-SQL? Speed or convenience or both?

Post by Johannes Reite » Sat, 06 Sep 1997 04:00:00



Hello,

Short version of Question: How does the speed and versatility (not
simplicity) of
generating rollups from a starview in ORACLE compare to dedicated
OLAP servers like Essbase, Holos and Oracle Express?

Longer version of question:

we would like to analyze management information data. A typical cube
might have 5 to 10 dimensions with 3 to 30 entries for most dimensions
with say two levels in the hierarchy and typically several million of
data entries(=occupied data cells).
Occasionaly I will have to add
several (may be even 100) such cubes  and analyze the resulting sum
cube.
The data will really be stored in an Oracle database (or may be DB2
database)
in form of a star view. With PL-SQL it seems to be possible to generate
any
desired rollup from the star view.
Since I have not yet loaded the database with a representative data set
(hopefully
I will have a first simplified version in a week or so), I
cannot yet comment on how long it will take to generate these rollups.
We definitly  also will evaluate dedicated OLAP tools like Essbase (or
the newer
DB2 Olap server which uses a RDBMS instead of the MDBMS), Holos or
Oracle Express. It seems difficult however, to get a copy
for evaluation purposes, if one does not yet have a fairly firm interest
to
buy the product.

In the book OLAP-solutions by Erik Thomson it is said that SQL is (a) to

complicated, (b) to slow and (c) that it is not powerful enough in
comparison
to  OLAP tools. With respect to the last point, Thomson mentioned that
it is difficult to subtract values from different rows and to transpose
a table (exchange rows and columns).

However  (compiled) PL-SQL  is faster and more powerful than SQL and
assuming that one uses and knows PL-SQL already,  is it a viable
alternative
to OLAP
or will OLAP still be (much) faster and versatile? I do not care if the
PL-SQL procedures which I have to write are more complicated than the
OLAP commands. Basically I just need to generate different (any
possible) rollups.

Does anybody know or already has a guess
how the speed of the (MOLAP) Essbase server (will) compare(s) to the
DB2 OLAP server.

Also do there exist rules of thumb how fast rollup generation is
if one for instance has 5 dimensions with 100 members each and three
hierarchies
and 60 per cent of all cells are sparse.

I understand that it is difficult to give speed estimates if one does
not precisely
know the data,
however I am still grateful for comments and especially comparisons
between speed and power of  PL-SQL and OLAP tools to generate rollups.

Cheers

Johannes

 
 
 

Q: Why OLAP as compared to PL-SQL? Speed or convenience or both?

Post by m bowe » Sun, 07 Sep 1997 04:00:00


  [..]

  However  (compiled) PL-SQL  is faster and more powerful than SQL and
  Assuming that one uses and knows PL-SQL already,  is it a viable
  Alternative
  To OLAP
  Or will OLAP still be (much) faster and versatile? I do not care if
the
  PL-SQL procedures which I have to write are more complicated than the
  OLAP commands. Basically I just need to generate different (any
  Possible) rollups.

An essbase implementation will always give you versatility at a low
development cost as compared to SQL. That's partially a
Function of the API and partially a result of the ease at which you can
handle operations which are very difficult or impossible in
SQL. Allocations are generally the kind of examples given which don't
lend themselves easily (or at all) to SQL processing.
Also, I never figured out how to do certain crosstabs in SQL given a
star despite the fact that I had all of the aggregates. For
Example, let's say I had a star with 10 measures across 12 months (120
columns) and 4 dimension tables for product, geo,
Organization and channel. I never figured out how to write SQL that
gives me just as efficient a query producing a crosstab with
(DOWN: Products, Geo, Time - ACROSS Measures, Organization with a Combo
Box on Channel) as (DOWN Products,
Measures - Across Quarters, with Combo Boxes on the other dimensions).
I'd certainly be able to optimize my rollups, but that
Would always come at the expense of query flexibility. With Essbase, I
just optimize my rollups and calculations without any
Concern for the queries. The API and the nature of the cube make query
construction and performance a no brainer.

  Does anybody know or already has a guess
  How the speed of the (MOLAP) Essbase server (will) compare(s) to the
  DB2 OLAP server.

Apples to apples, I figure the DB2 OLAP will inevitably be slower. The
indexing scheme of Essbase works fast on a fundamentally faster
structure than that of relational storage. Unless somebody at IBM has
figured a way to do block operations in relational Tables (behind sql)
with the speed of b* tree index traversals, Essbase will always be
faster. Of course there will inevitably by some db2 genius somewhere
with the big parallel edition on a multinode sp2 using emc ramdisks. We
want their attention too - And that's the point of going that direction,
in my view, for the scalability and systems variety (and bringing more
data warehouse and repository folks into the fold). Considering the
reserve of a number of relational folks with big iron and/or R/3
installations, I just think we won't see many apples to apples. What I'm
saying is that I haven't seen anyone put Essbase on their Oracle 7
server and go head to head. (wouldn't that be fun?) And I don't expect
to see that anytime soon.

  Also do there exist rules of thumb how fast rollup generation is
  If one for instance has 5 dimensions with 100 members each and three
  Hierarchies
  And 60 per cent of all cells are sparse.

The olap council benchmark was demonstrated on Essbase 4.1 at about
150,000 cells per second on a quad pentium box. I
Think that's a good ballpark figure, and I've seen figures like that in
the field. I'd say that a 5D cube at that size is relatively easy
To handle but i'd be very surprised at a 60% overall density on 65MMM
possible blocks (500^4) presuming 500 cells per
Block. (4 dimensions creating blocks containing the fifth dimension)
you'd much more likely see a cube density around 0.01%
Of the possible blocks. But let's just say for example your data
generated a cube with a half million blocks with about 500 cells
Per block. An average block density would be around 7% thus we'd be
looking at about 1.3 gig compressed to the drive, and
About 175MM cells.

There are two ways to look at estimated calc time. And please know that
this is just a sketchy example. On an average NT 4.0 server, Essbase
writes about a gig an hour on simple one pass calcs, which is all you'd
be doing with simple rollups. But I've seen three times that rate on
well built Proliant 5000s - and i wouldn't consider that exceptional or
extreme performance. So based on your ultimate cube size you'd probably
calc 1.3 gig in anywhere from 30 to 85 minutes depending on your server
horsepower. That would be the equivalent of building a fact table for
every hierarchical combo among your dimensions. (i'm
presuming that the 500 members per dimension include the hierarchical
subtotals - within that presumption, with a fixed number
Of 500 members, the more hierarchies - the more expensive it will be in
fact tables but essbase doesn't care).  If you built to one big fact
table, your queries would not be optimized for drilldown of course.

The second way would be to look at cells calculated per second. Essbase
averaged out 150,000 in the benchmark which
Included complex calculations on a fast box. Being gracious in both
directions (smaller box, simpler calcs) - leave it at that
figure. That way it comes out to .32 hours or about 20 minutes. That
sounds very quick to my gut - but not completely out of the question.

--

From this base their are a load of optimizations possible. You can, in
the next version, choose not to pre-calculate all of your
aggregate cells and have those rolled up at query time, you can do
tricks with time without requiring additional input or storage, you can
partition the single cube over multiple servers and/or drive arrays and
increase parallelism...