> >You don't indicate how long this box has been up. How long did it take
> >to get to 12 billion isam instructions? That would be kind of
> >to know. 1200 checkpoint waits - so 1000 hours? Or are you getting
> >checkpoints more frequently than CKPTINTL - my guess is yes, so maybe
> Yes, sorry, the system was up for approximately 4 days by the time I sent
> the mail.
> >You don't include an onstat -d - but from the -D I can see some layout
> >issues. Your dbspaces are deep, - probably designed around tables? They
> >are not balanced to the number of CPUs. A table fragmented over eight or
> >twelve dbspaces might perform better that one built on a single
> >dbspace. I'm not really sure what your schema is or your application -
> >YMMV. However you can see that the load on the disks is not even. Your
> >disks are getting hammered You number of temp spaces should be a
> >of the number of CPUs. I would pull the whole thing apart and rebuild it
> >just to get the disks to group on the onstat -d output. .
> I am starting to see disk bottlenecks yes. The previous weekend I
> unloaded and fragmented one of our bigger tables that gets accessed
> a lot.
> Fragmented by round robin and the disks seem much better this week.
> (This specific table gets a lot of reads and not so many writes).
> Could you perhaps elaborate a bit more on the issue of balancing dbspace
> the number of cpus ? Are you just refering to fragmentation of tables so
> that multithreading can take place ?
If you have a table fragmented over 4 dbspaces and 4 CPUs, you will get 4
scan threads. Odds are that they will run together and complete in about
the same amount of time. If you have 5 dbspaces, then you will get 5 scan
threads - one of those will have to bounce around from CPU to CPU. 8
threads and you would get two per cpu - no bouncing. Mind you this is best
when your database has to do scans of all fragments.
Fragmented indices tend to also be shallower - hence not as many reads when
looking for data. If the index is big........
Round Robin is a lovely thing when it comes time to load, but it means no
fragment elimination when it comes time to read. Read on.
> >Nice write cache hit rate. Read cache hit - ok.
> >4.5 million compresses? Someone is doing a lot of deletes to cause this.
> >Find them, blame them, make them pay. Look into why they are deleting -
> >perhaps truncates, or table rewrite would be more effective.
> In our environment we do a lot of deletes yes. Our data gets old quickly
> and then get moved to "history" tables. The data is still online and
> reports / queries still take place on the older data, especially the
> very recent records.
> By the way, what exactly are compresseses - is this index rebuilds ?
When you 'delete' a row that space is not automatically re-available. There
is a 'next space' pointer in the page header which always points to the end
of the data - new data is always placed at the 'end' of the page. Over time
the engine will recognize that a page is getting 'holey' and will re-write
the page freeing up space at the end - this is a compress. Note that there
is stuff at the end of the physical page - but let's not go there now -
hence the tic marks around the word 'end'.
In a delete situation like this consider fragmenting by the date of the data
(provided it is not updated - that would be a disaster). Then when you need
to delete you can detach the old data fragment and re-attach it as a new
fragment for the new data. This may imply a lot of fragments - depending on
how low a granularity you need and how long you keep the data. It may also
mean unbalancing your fragments against the number of CPUs. But it's way
fast. I've done 33 fragments on a 16CPU box for this sort of thing (40GB of
data). The table screamed.
> >2.7 million seq scans? Do you have a lot of small tables without
> >Or without stats updated? If sequential scans are desirable then you
> >want to look into light scans and bumping up your DS parameters to
> Yes, we have some small tables and do a lot of sequential scans.
Check into those and check the size of the tables involved - if they are 8
pages (or whatever your disk read is) then that means one disk access to get
to your data. If they are 32 pages (e.g.) that means on average two disk
accesses. An indexed read is always one index read + 1 data read. For each
level your index descends the number of reads required to find your index
may be one per level. Hence sequential scans CAN be more effective in some
situations. Generally in OLTP they are undesirable.
A quick check might be select tabname from systables where npused > 8 and
nindexes = 0;
> Update stats runs every night and is up to date. I double checked this
> recently by running the query -
> select tabname, max(constructed) from
> where systables.tabid = sysdistrib.tabid
> group by 1
> order by 2,1
> I have a problem with the DS (PDQ) parameters. When I give a value to
> MAX_PDQPRIORITY, all the users start using PDQ, even without PDQPRIORITY
> being set in their environments. I had to disable MAX_PDQPRIORITY to get
> around this problem.
> I did the same for DS_TOTAL_MEMORY, gave it the lowest value I could.
Yes, it is an issue. XPS has some stored procedures which will be run
before a user session starts where you can change that, but I see your
problem. You can also set the environment variable PDQPRIORITY=1 to acheive
the same effect.
> >While we're there, your RA_PAGES and THRESHOLD are way out of whack.
> >should be closer to one another. With a setting of 128, you are doing a
> >of reading ahead - which goes into your buffers and implies management of
> >same. This may be desirable, in which case your threshold should
> >some to meet it. 120 for example. If you want light scans then 128/120
> >an ideal setting. However if you are doing OLTP work - which the rest of
> >this engine looks like, then you want to decrease these dependant on the
> >size of your tables. Again, I don't know your application or how the
> >is being read/used. You may have a buffer management issue here 13
> >buffer waits, 42 million latch waits - depending on how long the engine
> >been up. You may have a problem here where you are flooding your buffers
> >with read ahead for a random access set of applications.
> I will play around with these values thank you.
> >Impressively low lockwaits - especially considering 14 billion lock
> >requests. Why so many? Is there really a need for all of those locks or
> >has some developer gone amok and decided that he will always lock
> >(s)he reads - even when running a report. Find them....
> I do not know where the lock requests come from - any way I can trace this
> We do a lot of inserts on this system. To give you a better picture, this
> a transport company scanning thousands of barcodes into and out of
> Isn't this perhaps where the lock requests come from ?
You can look at onstat -u and see who has a lot of lock requests and then
see what they're doing. For something like that I generally do up a script
which scans every 'n' sceonds, when it finds something big in the locks
column it would call a function to onstat -g sql and ses that session.
> >Are you really supporting 2600 connections?
> My NETTYPE was originally set to 4,300 ,which I thought were 1200
> connections. Tech Support (and also in a recent discussion on this list)
> told me that it is in fact only 300 connections, and they requested me
> to change it to 4,600 ,which according to them is now only 600
> Apparently the number of poll threads do not affect the number of
> connections like I originally thought.
Surprises me too. Of course it's one of those 'set and forget' things.
I'll have to go read Manuella for that one.
> The change to 4,600 was due to the error
> -25580 System error occurred in network function.
> >First thing I would do is decrease the read ahead.
> >Next thing I would do is plan a weekend of pulling apart the dbspaces and
> >rebuilding them.
> >Then I would re-visit and see what affect that had.
> >Of course that's my .02 worth. I'm sure others will gently correct me or
> >provide other insight.
> Thank you for all the input. We are moving from the current L2000 to
> an N4000 server next weekend (at long last!) and from there on I think
> most of my effort will go into disk reorgs.
"Know thy data and know thy queries". Once you can figure out what the
users are doing against what data you can plan more effectively on how to
deal with it.
Let me know if the RA helps things.