Datamation's Proposed Extention to SQL Language (#1)

Datamation's Proposed Extention to SQL Language (#1)

Post by Michael Sallwass » Wed, 20 Jul 1994 04:51:52



In the June 1, 1994 issue of Datamation, the cover story proposes 3 extensions
to the SQL language. In an article, "SQL is our language. Fix it now.", the
editors suggest three extensions that appear syntactically compatible with SQL
as I know it. Here is the first of the three. I have made my comments. I
would be interested in reading yours.

[If you have not read the article, what follows may not be very clear]

#1 ALTERNATE clause:

This feature would allow two group functions to be used in the same select
statement with different selection criteria for each group function. The
ALTERNATE constraint replaces all constraints on the same table in the
surrounding query.

The example they give is for comparing this year's sales with last year's
sales by productname. The syntax might be:

        select item.name,
               sum(sales.amt)/sum(alternate sales.amt where sales.year = 1993)
        from sales,
             item
        where sales.item_id = item.id
          and sales.year = 1994
        group by item.name;

The intention is that the WHERE clause within the alternate clause would
replace that all parts of the main WHERE clause other than join conditions.

I believe a clearer syntax might be as follows:

        select item.name,
               sum(sales.amt where sales.year=1994)/
                     sum(sales.amt where sales.year=1993)
        from sales,
             item
        where sales.item_id = item.id
        group by item.name;

In this case, there is no replacement, the WHERE clause within the group
function provides constraints in addition to those contained with the
main WHERE clause.

*******

Either way, this feature would be of benefit to the customers that I support.
If the SQL could be generated by the various third party GUI tools we use
that would be even better.
--
============================================================================
Michael Sallwasser  | Down one path is utter dispair and hopelessness. Down
Northrop Grumman    | the other is total destruction. Let us choose wisely.
--
============================================================================
Michael Sallwasser  | Down one path is utter dispair and hopelessness. Down
Northrop Grumman    | the other is total destruction. Let us choose wisely.

 
 
 

Datamation's Proposed Extention to SQL Language (#1)

Post by Albert Godfri » Thu, 21 Jul 1994 23:34:29



Quote:>...
>#1 ALTERNATE clause:

>This feature would allow two group functions to be used in the same select
>statement with different selection criteria for each group function. The
>ALTERNATE constraint replaces all constraints on the same table in the
>surrounding query.

>The example they give is for comparing this year's sales with last year's
>sales by productname. The syntax might be:

>    select item.name,
>           sum(sales.amt)/sum(alternate sales.amt where sales.year = 1993)
>    from sales,
>         item
>    where sales.item_id = item.id
>          and sales.year = 1994
>    group by item.name;

I don't think new syntax is necessary. SQL 92 can do that today. You only need
to use the derived tables capability, like this:

        select s1993.name, sales_1993, sales_1994 from
         (select item.name, sum (sales.amt)
            from sales, item
           where sales.item_id = item.id
             and sales.year = 1993
           group by item.name) as s1993 (name, sales_1993)
        ,
         (select item.name, sum (sales.amt)
            from sales, item
           where sales.item_id = item.id
             and sales.year = 1994
           group by item.name) as s1994 (name, sales_1994)
        where s1993.name = s1994.name;

In other words, do an aggregation on 1993 sales, another one on 1994 sales, and
join the results.

A better way is to use an outer join so as to cover the cases where no sales
figure exist for certain items in certain years.

        select item.name, s1993.amt, s1994.amt
          from  item    
            left outer join
                (select sales.item_id, sum (sales.amt)
                   from sales
                  where sales.year = 1993
                  group by sales.item_id) as s1993 (item_id, amt)
            on item.id = s1993.item_id
            left outer join
                (select sales.item_id, sum (sales.amt)
                   from sales
                  where sales.year = 1994
                  group by sales.item_id) as s1994 (item_id, amt)
            on item.id = s1994.item_id;

Results look like this:

 ITEM.NAME                 S1993.AMT                S1994.AMT
 A                           1000.00                  3000.00
 B                              NULL                  1000.00
 C                           3000.00                     NULL
 D                              NULL                     NULL

using the following data for SALES and ITEM:

SALES.ITEM_ID    SALES.YEAR        SALES.AMT
            1          1993          1000.00
            3          1993          3000.00
            1          1994          3000.00
            2          1994          1000.00

ITEM.ID   ITEM.NAME
      1   A
      2   B
      3   C
      4   D

--
Albert Godfrind                    Database Systems Sustaining Engineering


BP 027                             Voice: [+33] 92.95.51.63
F-06901 Sophia-Antipolis, France   FAX:   [+33] 92.95.63.63
Don't tell my employer I have opinions

 
 
 

Datamation's Proposed Extention to SQL Language (#1)

Post by Paul Ke » Fri, 22 Jul 1994 00:05:51




>>...
>>#1 ALTERNATE clause:

>>This feature would allow two group functions to be used in the same select
>>statement with different selection criteria for each group function. The
>>ALTERNATE constraint replaces all constraints on the same table in the
>>surrounding query.

>>The example they give is for comparing this year's sales with last year's
>>sales by productname. The syntax might be:

>>        select item.name,
>>               sum(sales.amt)/sum(alternate sales.amt where sales.year = 1993)
>>        from sales,
>>             item
>>        where sales.item_id = item.id
>>          and sales.year = 1994
>>        group by item.name;

>I don't think new syntax is necessary. SQL 92 can do that today. You only need
>to use the derived tables capability, like this:

>    select s1993.name, sales_1993, sales_1994 from
>     (select item.name, sum (sales.amt)
>        from sales, item
>       where sales.item_id = item.id
>         and sales.year = 1993
>       group by item.name) as s1993 (name, sales_1993)
>    ,
>     (select item.name, sum (sales.amt)
>        from sales, item
>       where sales.item_id = item.id
>         and sales.year = 1994
>       group by item.name) as s1994 (name, sales_1994)
>    where s1993.name = s1994.name;

>In other words, do an aggregation on 1993 sales, another one on 1994 sales, and
>join the results.

>A better way is to use an outer join so as to cover the cases where no sales
>figure exist for certain items in certain years.

 [ outer join formulation snipped ]

your proposal is a bummer if the sales and item datasets are large
and the optimiser dinna realise that you are making several passes
over the same tables (whose optimser does?)

but SQL2 also has conditional logic. use this instead to dismiss
this alternate scheme (whose rules for propogating the alternate where clause
need some work in the face of on clauses of outer joins and the like).

This scheme makes a single pass over the sales,item join.
Sorry about the SASisms to create the tables (old habits die hard)

data item;
 input name $ id;
 cards;
item1 1
item2 2
item3 3
item4 3
 run;

data sale;
 input id year amt;
 cards;
1 93 100
1 93 100
1 94 120
1 94 120
1 94 120
2 93 200
2 93 200
3 94 300
4 92 50
4 93 75
4 94 100
4 95 125
 run;

proc sql;

select name, sales93, sales94,
       sales94 / sales93 as ratio
  from item,
     (
       select id,
         sum( case sale.year when 93 then sale.amt else 0 end ) as sales93,
         sum( case sale.year when 94 then sale.amt else 0 end ) as sales94
         from sale
        where year in(93, 94)
        group by id
     ) as sale
   where sale.id = item.id;

NAME       SALES93   SALES94     RATIO
--------------------------------------
item1          200       360       1.8
item2          400         0         0
item3            0       300         .
item4            0       300         .

how about the other more interesting proposals that return ordinal row
positions within result sets that allow easy formulation of "the top
10 salesman" kinds of queries?  are other SQL implementors considering
these?

Paul

--

Paul Kent (Base SAS R&D)              " nothing ventured, nothing disclaimed "

 
 
 

Datamation's Proposed Extention to SQL Language (#1)

Post by Joe Hellerste » Fri, 22 Jul 1994 15:10:10


Quote:>   how about the other more interesting proposals that return ordinal row
>   positions within result sets that allow easy formulation of "the top
>   10 salesman" kinds of queries?  are other SQL implementors considering
>   these?

You can do this in Illustra (formerly Montage) in a couple of ways:

1) Illustra supports user-defined aggregates.  The example in the
manual is 'third-highest', which can be made into 'tenth-highest' with
no trouble.  Given tenth-highest, it's a easy to get 'top ten'.

2) Adding row numbers to tuples, as you suggest, moves away from the
relational model more than makes me comfortable.  But if you want, it
can be done in Illustra with a user-defined (scalar) function in the
target list.  It's easy to write an Illustra function that returns 1
the first time it's called and n the n'th time it's called.  To do
what you want requires this function and an order-by.

As I said, I find (2) to be a little too "unrelational" for my taste,
and I doubt it's a good thing to add to vanilla SQL.  For example,
what happens if you want the top ten salespeople subject to some
correlation variable from an outer query block?  With approach (1) the
system may be able to decorrelate the query (thus making it much more
efficient), but with approach (2) it can't.  Just one example of why
"unrelational-ness" is dangerous from a performance perspective (not
just from a "religious" perspective).  Approach (1) is more in the
spirit of SQL and the relational model, since it just uses relational
queries and aggregation.

The two datamation suggestions posted here were pretty bone-headed,
actually, though the suggestion you mentioned here isn't as blatantly
limited and unnecessary as the ALTERNATE business.  (Do they allow 2
alternates?  N alternates?  Why clutter up the language with more
confusing and redundant keywords?  Your "table expression" solution
seems like the natural one to me.)

Joe Hellerstein