Steve Kass <sk...@drew.edu> wrote in message <news:3DACB4B1.682036D1@drew.edu>...
> Andrew,
> I don't know about you, but I know that when I disagree with
> something Steve Kass has posted, I'm definitely wrong. In any case,
> the documented limit is 256 tables in a SELECT, and I think the
> issue is that it's not entirely clear whether a UNION ALL is one
> SELECT query - evidence shows that more tables are allowed, but
> whether that's undocumented behavior or not, I don't know.
> In any case, it does appear that a view with more than 256 tables
> is not allowed.
> While I can't think of any reason one would want more tables in
> a (partitioned) view, I'd be curious to know why, and might try to
> suggest an alternative.
Disagreeing with Steve Kass is probably wrong in this case ;-)
He showed us how to do a UNION ALL with practically unlimited
tables, which appears to work (well, we haven't tried millions);
but the same statement elicits an error message if you try to
make a VIEW of it.
The actual application (in which we are indeed going for
alternatives) is school attendance records. Data is recorded
in Student, Admission, Calendar, and Register tables; Student
for each student that we need to keep track of, Admission
recording their start and end date attending the school
(possibly more than one Admission for the same student; life
is complicated), Calendar for all the dates when a student
should attend school, and Register which contains rows only
to represent a student's absence on a date when they should
attend. You'll have to take it on trust (as I do) that these
tables are appropriate; most of them have other functions in the
student records system besides tracking truancy.
It was then decreed that there shall be a table Attendance
containing rows for absent /and/ present students. This doesn't
precisely duplicate Register, because Register counts attendance
by timetable periods during the day (up to 7) and Attendance
only tracks Morning and Afternoon. Nevertheless, and with a
25 bytes student ID column, and a couple of useful indexes,
Attendance takes about 1 megabyte per 50 students per year.
It was thought that it would be easier and less error-prone
to have reports run on that rather than on the underlying tables.
I suspect that it was thought wrongly, but it wasn't my decision.
There is one such database for each school, there are going to be
up to 500 schools on a server (we run this for several education
authorities, one per server instance at most), and there may be
a desire for year-on-year data, or at least a desire to be seen
to be offering year-on-year data. I am trying to get the boss
to stop calling this a Data Warehouse before he embarrasses us
in public.
And so I was considering a UNION of 500 copies each year of all the
schools' Attendance table. It is redundant use of disk space already
(since it contains only calculated data), I was reluctant to make
it redundant /twice/ ;-)
Of course another possible alternative architecture - at first sight -
is to create one monster Attendance table for all schools /instead/
of one per school database.
I recognise that a combined table which is probably still under
a gigabyte a year is peanuts by the standards of some guys here.
What's one more gigabyte? How much developer time is one gigabyte
worth today? Less than an hour, I'll guess.
However, the individual school Attendance table is also updated
live; the consolidated table probably isn't. (Depressing
performance further at the time when every school on the server
is calculating their register is one "not do".)
Other responses:
> > > All the same, does anyone also happen to know if a partitioned view
> > > is actually smart enough to query only one or a few of the underlying
> > > tables, when the query implies that that is appropriate? The explicit
> > > benefit of a partitioned view is being able to /insert/ rows and
> > > have them go into one appropriate underlying table.
> > No, the main benefit is your first reason - it is smart enough. Your
> > queries need to be explicitly on the primary key, and I presume above
> > a certain complexity you may confuse the optimiser enough for it
> > not to only go for the correct tables.
> > As you want to insert, you are actually talking about an updateable
> > partitioned view - there are several constraints that make it fiddly to
> > create. I printed the BOL section and used a highlighter.
I actually don't want to update, but what I'm getting from BOL
is that it isn't a partitioned view unless it's (in principle)
an updateable partitioned view, and I anticipated not getting
the benefit of querying only the right tables in the UNION
unless I jumped through the hoops of setting the right constraints,
and stuff, for a partitioned view.
> > > If we can't have a new server /and/ we can't UNION more than
> > > 256 tables together - as apparently we implicitly need to do -
> > I would suggest not going much over 100 tables in the view. That way
> > you can actually run queries on it that join to other tables, and can
> > still do a self join if necessary, without getting that pesky error.
Oops. Do we still hit the wall at 256 as soon as we JOIN
a mega-view with a few other tables? We probably do need to
do such JOINs. Darn.
> > > or if a partitioned view is no darned good for me anyway
> > > performancewise,
> > No, it will give you faster performance, not orders of magnitude though
> > - unless your boss's, boss's, boss? springs for a federation of servers, but
> > gains nethertheless. ( Always assuming you do sensible things
> > with indexes and query optimisation ).
I'm figuring that a mega-view is not going to perform
/better/ than a properly indexed mega-table - the view
logic won't out-perform the logic of an index - but
it may not perform significantly /worse./ And in that
case, a mega-view is an effective substitute - except
that it does appear that I can't have the mega-view
that I wanted anyway.
> > > then my next thought is a user-defined
> > > function either containing a massive UNION statement with
> > > WHERE clauses (hopefully suppressing, more or less, queries
> > > on tables which don't actually contain the data we're
> > > interested in) or containing a stack of IFs peforming individual
> > > selections from the tables of interest, probably into a table
> > > variable. But I'd rather have the partitioned view.
> > Well the massive union will have the same problem. With a lot of
> > IFs your performance will also severely degrade, as you are making
> > it hard for the optimiser.
I don't need the optimizer. I know what I'm doing (honest, I do...)
I don't think the optimizer even looks at IFs. It looks at the
queries which are /between/ the IFs...hmm, but wouldn't it have
to calculate plans for those queries? Or does that only happen
when one of the IFs comes true? I don't know if I know that ;-)
The mega-union probably would go like
SELECT * FROM table1 WHERE @table='one' AND (other conditions)
UNION ALL
SELECT * FROM table2 WHERE @table='two' AND (other conditions)
UNION ALL
SELECT * FROM table3 WHERE ...
and the behaviour I'd be hoping for is to pretty much stop dead
if the first WHERE condition is recognised to be false without
looking at the table, and go straight on to consider the next
SELECT.
(A complication is that the mega-table, mega-view, or mega-union
probably should be able to track students who leave one school
within our system and are enrolled in another, either by changing
address or just graduating from primary school into secondary
school, so we probably want to look at two or more - still only
a few - of the tables at one time. So I might write
"WHERE CHARINDEX('one',@tables)>0" over and over again,
substituting different table names for 'one', and still hoping
to avoid actually touching 500 other tables that I don't need
for more than a moment.
We did establish the other day that in a UNION of tables where
corresponding columns' data types aren't exactly the same, the
server tries to convert implicitly each column to the "highest"
data type - so it would seem it's going to have to take /that/
much information from each table, at least, even if it doesn't
take any data rows.)
But, yeah, I guess chances are that I won't be allowed a JOIN
on that, either. Anyway, the boss /has/ decreed a mega-table.
I just would like to have an alternative to hand when we run out
of disk space.
(As we did, indeed, run out of space when we deployed creation
of the Attendance table in school databases yesterday. Tragically,
this was the same deployment in which we first imposed regular
backup and shrinking of the transaction logs, and so it all was
rolled back again, i.e. /neither/ has happened.
I mentioned we were promised more disks - or anything else - if we
needed. Unfortunately we only know we need it when we run out...
and the guy who would actually bolt a bigger disk into our server
if we were getting one is spending the rest of this week bolting
bigger disks into someone else's server. I /do/ have his big
screwdriver on loan at the moment...)