## group by vs rollup vs cube

### group by vs rollup vs cube

Hi,

I'm looking vor a good explanation on the differences between normal group
by, rollup and cube.

thanks,

Robert

### group by vs rollup vs cube

GROUP BY calculates base aggregates.
ROLLUP calculates both base aggregates and super-aggregates, but only in one
direction.
CUBE calculates both base aggregates and super-aggregates, in both
directions.

Books online provide you with enough information to get the general idea of
what each of the options provides, but you should, however, be aware of the
performance implications of each of the three options. Especially the
difference between ROLLUP and CUBE to realize that you should use CUBE only
when its functionality is necessary.

A simple GROUP BY will use either a sort or a hash algorithm to calculate
the aggregates. Suppose we discuss the sort for now. Consider the following
query (run against the Northwind database) and its execution plan:

SELECT country, region, city, COUNT(*) as num_custs
FROM Customers
GROUP BY country, region, city

|--Compute Scalar(DEFINE:([Expr1002]=Convert([Expr1006])))
|--Stream Aggregate(GROUP BY:([Customers].[Country],
[Customers].[Region], [Customers].[City]) DEFINE:([Expr1006]=Count(*)))
|--Sort(ORDER BY:([Customers].[Country] ASC,
[Customers].[Region] ASC, [Customers].[City] ASC))
|--Clustered Index
Scan(OBJECT:([Northwind].[dbo].[Customers].[PK_Customers]))

Notice that a sort is being performed only once to accommodate the
aggregates that need to be calculated.

Now, consider the same query with the ROLLUP option:

SELECT country, region, city, COUNT(*) as num_custs
FROM Customers
GROUP BY country, region, city
WITH ROLLUP

|--Compute Scalar(DEFINE:([Expr1002]=Convert([Expr1003])))
|--Stream Aggregate(GROUP BY:([Customers].[Country],
[Customers].[Region], [Customers].[City])
DEFINE:([Expr1003]=SUM([Expr1004])))
|--Stream Aggregate(GROUP BY:([Customers].[Country],
[Customers].[Region], [Customers].[City]) DEFINE:([Expr1004]=Count(*)))
|--Sort(ORDER BY:([Customers].[Country] ASC,
[Customers].[Region] ASC, [Customers].[City] ASC))
|--Clustered Index
Scan(OBJECT:([Northwind].[dbo].[Customers].[PK_Customers]))

Notice that the plan has only one additional step that calculates the
super-aggregates provided by the ROLLUP option based on the previous result
of the base aggregates. There's no need to re-sort the data as the sort that
was performed for the base aggregates is sufficient for ROLLUP's extra data.

Now, consider the same query with the CUBE option:

SELECT country, region, city, COUNT(*) as num_custs
FROM Customers
GROUP BY country, region, city
WITH CUBE

|--Compute Scalar(DEFINE:([Expr1002]=Convert([Expr1003])))
|--Concatenation
|--Stream Aggregate(GROUP BY:([Customers].[Country],
[Customers].[Region], [Customers].[City])
DEFINE:([Expr1003]=SUM([Expr1004])))
|    |--Table Spool
|         |--Stream Aggregate(GROUP BY:([Customers].[Country],
[Customers].[Region], [Customers].[City]) DEFINE:([Expr1004]=Count(*)))
|              |--Sort(ORDER BY:([Customers].[Country] ASC,
[Customers].[Region] ASC, [Customers].[City] ASC))
|                   |--Clustered Index
Scan(OBJECT:([Northwind].[dbo].[Customers].[PK_Customers]))
|--Compute Scalar(DEFINE:([Expr1006]=NULL))
|    |--Stream Aggregate(GROUP BY:([Customers].[Region],
[Customers].[City]) DEFINE:([Expr1003]=SUM([Expr1004])))
|         |--Sort(ORDER BY:([Customers].[Region] ASC,
[Customers].[City] ASC))
|              |--Table Spool
|--Compute Scalar(DEFINE:([Expr1007]=NULL))
|--Stream Aggregate(GROUP BY:([Customers].[City],
[Customers].[Country]) DEFINE:([Expr1003]=SUM([Expr1004])))
|--Sort(ORDER BY:([Customers].[City] ASC,
[Customers].[Country] ASC))
|--Table Spool

And you get a much more complex plan which performs three separate sorts to
accommodate the calculations that CUBE needs to provide.
Since CUBE needs to calculate super aggregations in all directions, the base
aggregations need to be sorted three times:

1. Sort(ORDER BY:([Customers].[City] ASC, [Customers].[Country] ASC))
2. Sort(ORDER BY:([Customers].[Region] ASC, [Customers].[City] ASC))
3. Sort(ORDER BY:([Customers].[Country] ASC, [Customers].[Region] ASC,
[Customers].[City] ASC))

If you write the three variations: simple GROUP BY, with ROLLUP, with CUBE
in the same batch and display the graphical execution plan for the whole
batch, you will get the following respective cost estimation for the three
queries (my test was run on SQL Server 7.0 + SP2):

3.35%, 3.40%, 93.25% respectively.

Again, what I'm trying to say is that you should use ROLLUP whenever
possible, and only use CUBE when ROLLUP is not sufficient.

--
BG

Hi-Tech College, Israel
http://sql.hi-tech.co.il

What I hear I forget.
What I see I remember.
What I do I understand.
--Confucius

Quote:> Hi,

> I'm looking vor a good explanation on the differences between normal group
> by, rollup and cube.

> thanks,

> Robert

Hello out there,

does someone know a database management system, which can serve queries
with group by cube() / group by rollup(), recursive queries and is free
for educational use?

I don't want to download and install the trial of IBM DB2 (it's big (545
MB) and does a lot with the system while being installed, I've been
told).

Thanks,
Marcel

6. ODBC