FW: Migrating from Dynamic Server 7.2 to IDS 9 (2000) - fragmentation

FW: Migrating from Dynamic Server 7.2 to IDS 9 (2000) - fragmentation

Post by Dirk Moolma » Wed, 19 Sep 2001 15:16:01



Tried to take this offline, but could not reach your e-mail adres ...

-----Original Message-----

Sent: Monday, September 17, 2001 4:21 PM
To: Jack Parker; Jack Parker
Subject: FW: Migrating from Dynamic Server 7.2 to IDS 9 (2000) -

fragmentation

>Might consider taking this offline - nobody else seems to be jumping all
>over it.

>Some minor deletia of concluded discussion.

Agreed, taking it offline.

> >***Jack
> >If you have a table fragmented over 4 dbspaces and 4 CPUs, you will get 4
> >scan threads.  Odds are that they will run together and complete in about
> >the same amount of time.  If you have 5 dbspaces, then you will get 5
scan
> >threads - one of those will have to bounce around from CPU to CPU.  8
> >threads and you would get two per cpu - no bouncing.  Mind you this is
best
> >when your database has to do scans of all fragments.

> >Fragmented indices tend to also be shallower - hence not as many reads
when
> >looking for data.  If the index is big........

> >Round Robin is a lovely thing when it comes time to load, but it means no
> >fragment elimination when it comes time to read.  Read on.

> ***Dirk
> This makes perfect sense thank you. The only thing that I still wonder
about
> is the round robin fragmentation. I would guess from what you are saying,
> that round robin fragmentation for reading doesn't help, because of the
> records not being contiguous, if you understand what I mean, ie. if I have
> to do a sequential scan on a range of records, the time to access them,
> because they are now scattered across fragments, will actually slow me
down
> ?

>***Jack
>Round robin means that as each row is inserted it goes into the next
>dbspace.  If you have four dbspaces, then row one goes to dbspace 1, 2->2,
>3>3, 4>4, 5>1, 6>2, .........  Easiest fragmentation scheme for the engine
>to figure out while loading.  However, there is no real intelligence to
such
>a fragmentation scheme.  If you have fragmented by customer name (for
>example)  A-F>dbspace1, G-M>dbspace2, N-Q>dbspace3....  Then when the user
>does a query by last name the optimizer is smart enough not to bother with
>fragments where the data can't be - it goes straight to the proper
fragment.
>Kind of a moot point, EXCEPT that if you have attached indices, they use
the
>same fragmentation schema as the data.  In fact XPS will complain if you
try
>to build an index on a round robin table without using a fragmentation
>scheme.  Does it really matter if you are building a detached index?
>probably not, unless you are going to query by something that is not in an
>index and might have been in your fragmentation scheme.

>With XPS there is another side to this.  You have multiple nodes
(coservers)
>across which the database is laid.  If you fragment two tables across these
>coservers using a hash on the same column (fragment by hash(column)) AND
you
>join these two tables on that column (and you've laid out your disks
>properly) you get a co-located join - each coserver does it's own join and
>then sends the result set to the requesting coserver.  Also, the gating
>factor when loading data is not generally your fragmentation scheme (ok -
>half-truth there, you can slow things down with a * scheme) but disk
>i/o.  Hence I am not a big fan of Round Robin.

***Dirk
I understand the basics of fragmentation. It is just the technical stuff
that happens in the background (inside the engine) that I don't have a
firm grip on.

It sounds like the main difference between round robin and the rest is
- round robin doesn't give fragment elimination while the others methods do,
and this is perfectly logical.
With round robin I should still have multiple scan threads, but on ALL the
fragments instead of only certain ones.

This is basically what I tried to do. We had a disk bottleneck, so I tried
to split the load evenly across 4 physical disks. It has had a big
difference
on my disks, but maybe not on the amount of work done by the engine.

> >***Jack

> > >4.5 million compresses?  Someone is doing a lot of deletes to cause
this.
> > >Find them, blame them, make them pay.  Look into why they are
deleting -
> > >perhaps truncates, or table rewrite would be more effective.

> > In our environment we do a lot of deletes yes. Our data gets old quickly
> > and then get moved to "history" tables. The data is still online and
> > reports / queries still take place on the older data, especially the
> > very recent records.
> > By the way, what exactly are compresseses - is this index rebuilds ?

> >When you 'delete' a row that space is not automatically re-available.
> There
> >is a 'next space' pointer in the page header which always points to the
end
> >of the data - new data is always placed at the 'end' of the page.  Over
> time
> >the engine will recognize that a page is getting 'holey' and will
re-write
> >the page freeing up space at the end - this is a compress.  Note that
there
> >is stuff at the end of the physical page - but let's not go there now -
> >hence the tic marks around the word 'end'.

> >In a delete situation like this consider fragmenting by the date of the
> data
> >(provided it is not updated - that would be a disaster).  Then when you
> need
> >to delete you can detach the old data fragment and re-attach it as a new
> >fragment for the new data.  This may imply a lot of fragments - depending
> on
> >how low a granularity you need and how long you keep the data.  It may
also
> >mean unbalancing your fragments against the number of CPUs.  But it's way
> >fast.  I've done 33 fragments on a 16CPU box for this sort of thing (40GB
> of
> >data).  The table screamed.

> ***Dirk
> I will have to take another look at fragmentation. I just don't know if it
> will help with our compression rate. It is highly likely that the programs
> do deletes on individual records as statuses change, and that I will not
> have much control over  this.

>***Jack
>Do the rows have to be actually deleted?  Can you build a table of rows to
>be deleted and then remove them in batch every evening?  At that point you
>could do a table rewrite.  (What the h*ll is he suggesting???  Not do
>deletes realtime?  (yes))

***Dirk
I think this will be one of our last resorts. The company is very set in its
ways and won't change the code easily. I think it would really depend on
how slow the system really gets.

> > >Are you really supporting 2600 connections?
>***Jack
>(because I want to remember to look that up)

==========================================================
This message contains information intended for the perusal, and/or use (if
so stated), by the stated addressee(s) only. The information is
confidential and privileged. If you are not an intended recipient, do not
peruse, use, disseminate, distribute, copy or in any manner rely upon the
information contained in this message (directly or indirectly). The sender
and/or the entity represented by the sender shall not be held accountable
in the event that this prohibition is disregarded. If you receive this
message in error, notify the sender immediately by e-mail, fax or telephone
representations contained in this message, whether express or implied, are
those of the sender only, unless that sender expressly states them to be
the views or representations of an entity or person, who shall be named by
the sender and who the sender shall state to represent. No liability shall
otherwise attach to any other entity or person.
==========================================================