I have the following dimensional hierarchy
(1) Depot 'Brussels'
(2) Stream 'Major Accounts'
(3) Sales person x
(4) Invoicing depot 'Brussels'
(while the salesperson resides in Brussels he can still have
customers in other depots)
(5) Budget Portfolio xxx
(a collection of portfolios)
(6) Portfolio yyy
(a portfolio is a set of customers)
(7) Division zzz
(a customer can belong to another portfolio depending on the
8) Customer company x
Each month I receive an updated file that describes to which portfolio
a customer for a particular division belongs. Sales not only wants to
generate reports based on the current situation but also on portfolio
structures of the past. So I designed my starschema as follows.
Revenue FactTable --- CustDivPrtfolRelation ---- Customer (Type 2 SCD)
Rev_CPDId (FK) CPD_Id (PK) Cust_Id (PK)
... CPD_CustId (FK) Cust_Name (8)
Measures CPD_DivId (FK) ---- Division (Type 1 SCD)
CPD_PrtfolId (FK) Div_Id(FK)
CPD_StartDate Div_Code (7)
CPD_EndDate ---- Portfolio(Type 1 SCD)
CPD_CurrentPrtfolId (FK) Prtfol_Id (PK)
- Not all customers are linked to a portfolio, so if there is revenue
for a customer for a particular division I dont want to lose that
information so I had to provide several records in CustPrtfolDivision
for each customer that does not belong to a portfolio since they can
generate revenue in each of the division.
- Customer (76.000 records)
Division (20 records)
--> CustDivPrtfolRelation (1.100.000 records, 38.4 Mb data, 42,1 Mb
The facttable contains 8.500.000 records and each week 120.000
records are loaded.
- Architecture: HP Netserver LH4, 4 Gb RAM, 4 processors, MS SQL Server
7.0 and OLAP Services SP2 running on the same machine. 100 Gb in RAID 5
(93 Gb usable).
- We use ROLAP since the aggregates produced by OLAP Services will
also be used by another reporting tool.
- It takes 1 hour to process the dimension
- I would like to have your opinions on this design.
When a customer changes to another portfolio for a particular division
I don't need to update the fact table. I only need to update the
CPD_CurrentPrtfolId field. I could also add another CPD_Id field to the
fact table to describe the current situation but then I would have to
update the facttable each time the portfolio structure changes.
- Since the users want to quickly find a customer in the dimension,
should I add an extra custid to the fact table? (new dimension, one
level = Customer Name, less members = 63200)
Suppose I make a seperate dimension with as only member customer
number. Then I bump into the limit of maximum 64.000 members per level.
Can you change this limit? Or should I add extra levels so that I have
less members per level?
- I was also considering to put only the active customers in the
dimension. You may think, of course, but every month I receive a new
customer file with not only the active but also the non-active
customers. But I suppose I could extract the active customers from the
facttable, compare it against the current customer table and put the
inactive customers in a seperate table. Then the next time I have to
load a customer file I could only load data about active customers.
- What are the implications of the size of the dimension. A relational
query tool will handle this fine, I think, but what about OLAP. What if
a client searches for a customer name. I read 2 documents from
Microsoft which I recommend to everybody working with MS OLAP (OLAP
Services: working with Large Levels, OLAP Services: Performance
Implications of the Architecture ) but I'm very interested in your
experiences, your problems and how you solved them.
- What about the number of levels in the dimension.
- Microsoft recommends to keep the Member key value small, preferably a
number, since the server needs to perform comparison operations on
these strings during the processing ot the dimension. How do you handle
e.g customer names which can be 50 characters or more? Do I have split
of the customer names into a new table and link them to the original
customer table with an id?
- Microsoft asks to avoid ROLAP aggregations. According to them some
relational servers grind to a halt while creating records for
aggregation tables. Does anyone experience such behaviour.
- Who uses partitioning with MSOLAP and could you give some examples of
how you partitioned in your case?
- Any suggestions for the 'max server memory' and 'max degree of
parallelism' (nr of processors) of SQL Server and
for 'ReadAheadBufferSize' and 'Process Buffer Size' parameters (MSOLAP)
if you know that SQL Server and MSOLAP run on the same machine (4 Gb
RAM, 4 processors)
Johan De Graeve
Sent via Deja.com http://www.deja.com/
Before you buy.