Non-linear performance

Non-linear performance

Post by Jim Sneeringe » Sun, 24 Dec 2000 06:53:47



I am using the following query to update a 5-millon-row table (Client5M) with
information from a table indexed by zip. In both cases there is a clustered index
on the 9-digit zip code.

What I find is that as I manipulate the WHERE clause to vary the number of rows
updated, performance degrades more than linearly.  For example, updating

  74K rows takes  4 minutes,
 385K rows takes  8 minutes, and
 896K rows takes 54 minutes, and
5000K rows takes over 16 hours.

My question is this: Since both tables are indexed the same way an in the same
order, which does the time required increase more than linearly with the number
of rows affected, and what can I do about it? (So far, what I do is run several
smaller updates, which is fine, but I would like to understand the reason.)

Here is the query:

UPDATE  zhen.dbo.Client5M
SET     msa = data.dbo.zip.msa,
        blockgroup = data.dbo.zip.blockgroup,
        longitude = data.dbo.zip.longitude,
        latitude = data.dbo.zip.latitude
FROM    data.dbo.zip9
WHERE   (zhen.dbo.Client5M.zip9 < '2') AND
        (zhen.dbo.Client5M.Zip9 = data.dbo.zip.zip9)

Thanks in advance.

Jim

 
 
 

Non-linear performance

Post by BP Margoli » Sun, 24 Dec 2000 09:27:21


Jim,

I'm going to point you to a seemingly unrelated article authored by SQL
Server MVP Ron Talmage, but please read it through to the end ... I think
you'll find the answer lurking in the piece.

http://msdn.microsoft.com/library/periodic/period99/SQL99I9.HTM

----------------------------------------------------------------
BP Margolin
Please reply only to the newsgroups.
When posting, inclusion of SQL (CREATE TABLE ..., INSERT ..., etc.) which
can be cut and pasted into Query Analyzer is appreciated.


I am using the following query to update a 5-millon-row table (Client5M)
with
information from a table indexed by zip. In both cases there is a clustered
index
on the 9-digit zip code.

What I find is that as I manipulate the WHERE clause to vary the number of
rows
updated, performance degrades more than linearly.  For example, updating

  74K rows takes  4 minutes,
 385K rows takes  8 minutes, and
 896K rows takes 54 minutes, and
5000K rows takes over 16 hours.

My question is this: Since both tables are indexed the same way an in the
same
order, which does the time required increase more than linearly with the
number
of rows affected, and what can I do about it? (So far, what I do is run
several
smaller updates, which is fine, but I would like to understand the reason.)

Here is the query:

UPDATE zhen.dbo.Client5M
SET msa = data.dbo.zip.msa,
blockgroup = data.dbo.zip.blockgroup,
longitude = data.dbo.zip.longitude,
latitude = data.dbo.zip.latitude
FROM data.dbo.zip9
WHERE (zhen.dbo.Client5M.zip9 < '2') AND
(zhen.dbo.Client5M.Zip9 = data.dbo.zip.zip9)

Thanks in advance.

Jim

 
 
 

Non-linear performance

Post by Dan Guzma » Sun, 24 Dec 2000 09:45:36


You query references a table named zip.  Should this be zip9?

You may want to display the execution plan in Query Analyzer to see if the
plan provides some insight into the issue.  Also, it could be that the
update is being blocked which is more likely when more rows are updated.

An unrelated issue is that the rows will be updated even if they already
contain the desired value.  If the Client5M table may already contain the
values, you may want to add the following to the WHERE clause so that rows
are only updated if needed:

AND
(zhen.dbo.Client5M.msa <> data.dbo.zip.msa OR
zhen.dbo.Client5M.blockgroup <> data.dbo.zip.blockgroup OR
zhen.dbo.Client5M.longitude <> data.dbo.zip.longitude OR
zhen.dbo.Client5M.latitude <> data.dbo.zip.latitude)

If you still need help, please post the table and index create DDL along
with some sample data.

Hope this helps.

-----------------------
SQL FAQ links (courtesy  Neil Pike):

 http://forumsb.compuserve.com/gvforums/UK/default.asp?SRV=MSDevApps
 (faqxxx.zip in lib 7)
 or www.ntfaq.com/Articles/Index.cfm?DepartmentID=800
 or www.sqlserverfaq.com
 or www.mssqlserver.com/faq
-----------------------


I am using the following query to update a 5-millon-row table (Client5M)
with
information from a table indexed by zip. In both cases there is a clustered
index
on the 9-digit zip code.

What I find is that as I manipulate the WHERE clause to vary the number of
rows
updated, performance degrades more than linearly.  For example, updating

  74K rows takes  4 minutes,
 385K rows takes  8 minutes, and
 896K rows takes 54 minutes, and
5000K rows takes over 16 hours.

My question is this: Since both tables are indexed the same way an in the
same
order, which does the time required increase more than linearly with the
number
of rows affected, and what can I do about it? (So far, what I do is run
several
smaller updates, which is fine, but I would like to understand the reason.)

Here is the query:

UPDATE zhen.dbo.Client5M
SET msa = data.dbo.zip.msa,
blockgroup = data.dbo.zip.blockgroup,
longitude = data.dbo.zip.longitude,
latitude = data.dbo.zip.latitude
FROM data.dbo.zip9
WHERE (zhen.dbo.Client5M.zip9 < '2') AND
(zhen.dbo.Client5M.Zip9 = data.dbo.zip.zip9)

Thanks in advance.

Jim

 
 
 

Non-linear performance

Post by Jim Sneeringe » Fri, 29 Dec 2000 02:06:26


BP,

Thanks for the lead. It does seem that his experience was the same as mine, but I
am not sure I undersand the reason very well. He says "I talked about this query
with some major SQL Server brains after our last SQL Server user group meeting,
and their consensus was that the final version performed better because the WHILE
loop segmented the data into chunks small enough for SQL Server to do all of its
work in RAM."

Can you explain to me what is being kept in RAM? Or, to look at it from a
different perspective, is there some constraint that keeps the query optimizer
from adopting the strategy that that is used by the WHILE loop? Alternatively, is
there some reason why the optimizer does not discover that strategy?

It seems to me that the amount of data written in the database and the log files
should be the same in either case. Is it the locks that can be kept in memory
only for the WHILE loop?

Thanks again.

Jim Sneeringer

-----Original Message-----

Jim,

I'm going to point you to a seemingly unrelated article authored by SQL
Server MVP Ron Talmage, but please read it through to the end ... I think
you'll find the answer lurking in the piece.

http://msdn.microsoft.com/library/periodic/period99/SQL99I9.HTM

----------------------------------------------------------------
BP Margolin
Please reply only to the newsgroups.
When posting, inclusion of SQL (CREATE TABLE ..., INSERT ..., etc.) which
can be cut and pasted into Query Analyzer is appreciated.



I am using the following query to update a 5-millon-row table (Client5M)
with
information from a table indexed by zip. In both cases there is a clustered
index
on the 9-digit zip code.

What I find is that as I manipulate the WHERE clause to vary the number of
rows
updated, performance degrades more than linearly.  For example, updating

  74K rows takes  4 minutes,
 385K rows takes  8 minutes, and
 896K rows takes 54 minutes, and
5000K rows takes over 16 hours.

My question is this: Since both tables are indexed the same way an in the
same
order, which does the time required increase more than linearly with the
number
of rows affected, and what can I do about it? (So far, what I do is run
several
smaller updates, which is fine, but I would like to understand the reason.)

Here is the query:

UPDATE zhen.dbo.Client5M
SET msa = data.dbo.zip.msa,
blockgroup = data.dbo.zip.blockgroup,
longitude = data.dbo.zip.longitude,
latitude = data.dbo.zip.latitude
FROM data.dbo.zip9
WHERE (zhen.dbo.Client5M.zip9 < '2') AND
(zhen.dbo.Client5M.Zip9 = data.dbo.zip.zip9)

Thanks in advance.

Jim

.

 
 
 

1. Non-linear Performance

I am curious how you are going to address >4 gig of RAM on a 32-bit
system, especially if you want to address it all from the same process.

--
  Bruce Momjian                        |  http://candle.pha.pa.us

  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command

2. Access 2000 Required DAO Files

3. Non linear sort order

4. Large Text parameter clobbering OUTPUT parameters

5. Specifying a non-linear sort order in SQL

6. Problem with the outer join

7. Substring search (non linear)

8. Concatenation of varchars, output padded?

9. Linear Performance Degradation

10. Sybase SQL Server 4.9.1 -- performance non-linear under load?

11. Non-Numeric Indices And Performance

12. Lower performance for non-dbo users (SQL 6.5 SP4, NT 4.0 SP4, Access 2.0 client)

13. non-indexed view performance problem