Slow Bulk Insert into Large Table

Slow Bulk Insert into Large Table

Post by msnnew » Tue, 02 May 2000 04:00:00



I currently have a nightly process that does a bulk insert of approximately
100,000 rows into a 15 million row table.  The destination table has 1
clustered and 2 non-clustered indexes that have fill_factors of 90 and they
are dropped/recreated weekly.

I have tried using the BATCHSIZE, TABLOCK, ORDER, and DATAFILETYPE='native'
hints but the insert still takes over 40 minutes.   We have a good server
with quad processors which I don't think is the issue.

Creating the indexes takes much longer, so it is not an option to drop them
before the bulk insert.  Does this time (40+ minutes) sound reasonable to
people???

I'm at my wits end to speed this thing up anymore.  Any help/thoughts would
be appreciated.  Thanks in advance.

 
 
 

Slow Bulk Insert into Large Table

Post by Tom Morea » Tue, 02 May 2000 04:00:00


Have you tried Table Lock On Insert?

    Tom
------------

I currently have a nightly process that does a bulk insert of approximately
100,000 rows into a 15 million row table.  The destination table has 1
clustered and 2 non-clustered indexes that have fill_factors of 90 and they
are dropped/recreated weekly.

I have tried using the BATCHSIZE, TABLOCK, ORDER, and DATAFILETYPE='native'
hints but the insert still takes over 40 minutes.   We have a good server
with quad processors which I don't think is the issue.

Creating the indexes takes much longer, so it is not an option to drop them
before the bulk insert.  Does this time (40+ minutes) sound reasonable to
people???

I'm at my wits end to speed this thing up anymore.  Any help/thoughts would
be appreciated.  Thanks in advance.

 
 
 

Slow Bulk Insert into Large Table

Post by msnnew » Tue, 02 May 2000 04:00:00


As mentioned in my original message, I have tried the "TABLOCK" option on
the BULK INSERT (if that's what you were referring to).


> Have you tried Table Lock On Insert?

>     Tom
> ------------


> I currently have a nightly process that does a bulk insert of
approximately
> 100,000 rows into a 15 million row table.  The destination table has 1
> clustered and 2 non-clustered indexes that have fill_factors of 90 and
they
> are dropped/recreated weekly.

> I have tried using the BATCHSIZE, TABLOCK, ORDER, and

DATAFILETYPE='native'
Quote:> hints but the insert still takes over 40 minutes.   We have a good server
> with quad processors which I don't think is the issue.

> Creating the indexes takes much longer, so it is not an option to drop
them
> before the bulk insert.  Does this time (40+ minutes) sound reasonable to
> people???

> I'm at my wits end to speed this thing up anymore.  Any help/thoughts
would
> be appreciated.  Thanks in advance.

 
 
 

Slow Bulk Insert into Large Table

Post by William Talad » Tue, 02 May 2000 04:00:00


Many times I've bulk inserted gigs per hour on a quad Pentium 90 machine.

Are your indexes unique?  It is three times faster with unique indexes (just
add pk columns onto the non-unique ones).

Is the inserted data reverse sorted from your clustered index?  That would
cause massive page splitting.

Is 10% enough space for another row?  Is the row size over 190 bytes?  If
all data is going on to the end of the indexes and data rows clustering does
not matter.

Do you have many columns that are nullable or variable length?  That takes
an extra 30% time.

Do you have non-clustered indexes on a separate drive and the log on a
separate drive to maximize throughput?


>I currently have a nightly process that does a bulk insert of approximately
>100,000 rows into a 15 million row table.  The destination table has 1
>clustered and 2 non-clustered indexes that have fill_factors of 90 and they
>are dropped/recreated weekly.

>I have tried using the BATCHSIZE, TABLOCK, ORDER, and DATAFILETYPE='native'
>hints but the insert still takes over 40 minutes.   We have a good server
>with quad processors which I don't think is the issue.

>Creating the indexes takes much longer, so it is not an option to drop them
>before the bulk insert.  Does this time (40+ minutes) sound reasonable to
>people???

>I'm at my wits end to speed this thing up anymore.  Any help/thoughts would
>be appreciated.  Thanks in advance.

 
 
 

Slow Bulk Insert into Large Table

Post by Roy Harve » Tue, 02 May 2000 04:00:00


What release are you running?  Please include this with all questions
as it often makes all the difference.

If you are using 6.5..... how selective is the clustered index?  I ran
into some terrible performance problems inserting into a table where
the clustered index had thousands of rows per value.  It turns out
this is a very bad idea with 6.5 and earlier; I have not tested it
with 7.0 but I think it may not be an issue with that release.

Roy


>I currently have a nightly process that does a bulk insert of approximately
>100,000 rows into a 15 million row table.  The destination table has 1
>clustered and 2 non-clustered indexes that have fill_factors of 90 and they
>are dropped/recreated weekly.

>I have tried using the BATCHSIZE, TABLOCK, ORDER, and DATAFILETYPE='native'
>hints but the insert still takes over 40 minutes.   We have a good server
>with quad processors which I don't think is the issue.

>Creating the indexes takes much longer, so it is not an option to drop them
>before the bulk insert.  Does this time (40+ minutes) sound reasonable to
>people???

>I'm at my wits end to speed this thing up anymore.  Any help/thoughts would
>be appreciated.  Thanks in advance.

 
 
 

Slow Bulk Insert into Large Table

Post by Ion Chalmers Freema » Tue, 02 May 2000 04:00:00


William,
        I got the sense that nullability was kind of a bad idea from one of Joe
Celko's books, but 30% more on insert? Where does this figure come from?
ion

> Do you have many columns that are nullable or variable length?  That takes
> an extra 30% time.

 
 
 

Slow Bulk Insert into Large Table

Post by bruce barke » Tue, 02 May 2000 04:00:00


because you have index's, bulk load is really just a batch insert. to make
it go fast do the following.:

create a work table with no indexes (i always use a work database on the
same server) with one additional column named processstate. also be sure
"select into / bulk" is set in the work database.

truncate work table
bcp data into work table (this will be fast, as the fast load routine will
be used)
call sp to move data from work table to live table

create procedure loaddata
as




begin
    set rowcount 10000
    update workdb.dbo.worktable
        set processstate =1
    where isnull(processstate,0) < 1

    set rowcount 0


    begin
        insert livedata.dbo.livetable (...)
        select ... from workdb.dbo.worktable
        where processsate = 1

        update workdb.dbo.worktable
            set proccessstate = 2
        where processstate = 1
    end
end

this should get you down to a couple of minutes


Quote:> I currently have a nightly process that does a bulk insert of
approximately
> 100,000 rows into a 15 million row table.  The destination table has 1
> clustered and 2 non-clustered indexes that have fill_factors of 90 and
they
> are dropped/recreated weekly.

> I have tried using the BATCHSIZE, TABLOCK, ORDER, and

DATAFILETYPE='native'
Quote:> hints but the insert still takes over 40 minutes.   We have a good server
> with quad processors which I don't think is the issue.

> Creating the indexes takes much longer, so it is not an option to drop
them
> before the bulk insert.  Does this time (40+ minutes) sound reasonable to
> people???

> I'm at my wits end to speed this thing up anymore.  Any help/thoughts
would
> be appreciated.  Thanks in advance.

 
 
 

Slow Bulk Insert into Large Table

Post by William Talad » Wed, 03 May 2000 04:00:00


Duplicates in a clustered index give horrendous performance.  Adding PK
columns at the end of your clustered duplicates index columns will solve the
problem without affecting existing queries.  It will require a little more
index space.


>What release are you running?  Please include this with all questions
>as it often makes all the difference.

>If you are using 6.5..... how selective is the clustered index?  I ran
>into some terrible performance problems inserting into a table where
>the clustered index had thousands of rows per value.  It turns out
>this is a very bad idea with 6.5 and earlier; I have not tested it
>with 7.0 but I think it may not be an issue with that release.

>Roy


>>I currently have a nightly process that does a bulk insert of
approximately
>>100,000 rows into a 15 million row table.  The destination table has 1
>>clustered and 2 non-clustered indexes that have fill_factors of 90 and
they
>>are dropped/recreated weekly.

>>I have tried using the BATCHSIZE, TABLOCK, ORDER, and

DATAFILETYPE='native'

- Show quoted text -

Quote:>>hints but the insert still takes over 40 minutes.   We have a good server
>>with quad processors which I don't think is the issue.

>>Creating the indexes takes much longer, so it is not an option to drop
them
>>before the bulk insert.  Does this time (40+ minutes) sound reasonable to
>>people???

>>I'm at my wits end to speed this thing up anymore.  Any help/thoughts
would
>>be appreciated.  Thanks in advance.

 
 
 

Slow Bulk Insert into Large Table

Post by msnnew » Wed, 03 May 2000 04:00:00


Bruce - that's pretty much what I'm doing now.  I have a raw text file that
I process and insert into a work table that has just been truncated.  This
work table has no indexes.    I was doing an insert statement to insert all
rows from work table into live table, but perf of that insert statement is
about one hour.  In my testing, if I BCP data out of work table, then Bulk
Insert into Live table, I get about 40 minutes.

Isn't that essentially what you are saying to do???  Will the process state
column and rowcount statement that you have in your script really make a
difference??

Thanks for your help.


> because you have index's, bulk load is really just a batch insert. to make
> it go fast do the following.:

> create a work table with no indexes (i always use a work database on the
> same server) with one additional column named processstate. also be sure
> "select into / bulk" is set in the work database.

> truncate work table
> bcp data into work table (this will be fast, as the fast load routine will
> be used)
> call sp to move data from work table to live table

> create procedure loaddata
> as




> begin
>     set rowcount 10000
>     update workdb.dbo.worktable
>         set processstate =1
>     where isnull(processstate,0) < 1

>     set rowcount 0


>     begin
>         insert livedata.dbo.livetable (...)
>         select ... from workdb.dbo.worktable
>         where processsate = 1

>         update workdb.dbo.worktable
>             set proccessstate = 2
>         where processstate = 1
>     end
> end

> this should get you down to a couple of minutes



> > I currently have a nightly process that does a bulk insert of
> approximately
> > 100,000 rows into a 15 million row table.  The destination table has 1
> > clustered and 2 non-clustered indexes that have fill_factors of 90 and
> they
> > are dropped/recreated weekly.

> > I have tried using the BATCHSIZE, TABLOCK, ORDER, and
> DATAFILETYPE='native'
> > hints but the insert still takes over 40 minutes.   We have a good
server
> > with quad processors which I don't think is the issue.

> > Creating the indexes takes much longer, so it is not an option to drop
> them
> > before the bulk insert.  Does this time (40+ minutes) sound reasonable
to
> > people???

> > I'm at my wits end to speed this thing up anymore.  Any help/thoughts
> would
> > be appreciated.  Thanks in advance.

 
 
 

Slow Bulk Insert into Large Table

Post by msnnew » Wed, 03 May 2000 04:00:00


Thanks William.   Interesting comments.  None of my indexes are unique.  The
three indexes that I have are basically there to optimize reporting and I
was trying to keep them as small as possible (few columns).    The problem
is, I don't think there is an exactly unique set of columns I could use.
Perhaps I should add an identity column to the table?? Do you think that
would just slow down my inserts more?? Or would the unique index I could
create using that identity column help me out??

I believe the inserted data is in the same order as the clustered index
because when I bcp the data out I use a query to specify the order of the
data (select * from worktable order by cAccount asc.   cAccount is the
clustered index column).

The table has 22 columns.  14 are money, 2 datetime, 6 chars of length
(1-9).  8 of those columns allow nulls.

I believe that the log is not on a separate drive because we are in a
clustered environment and I'm told you can't do that.  (I'm not a DBA, just
a lowly programmer!).

Thanks for all your help.


> Many times I've bulk inserted gigs per hour on a quad Pentium 90 machine.

> Are your indexes unique?  It is three times faster with unique indexes
(just
> add pk columns onto the non-unique ones).

> Is the inserted data reverse sorted from your clustered index?  That would
> cause massive page splitting.

> Is 10% enough space for another row?  Is the row size over 190 bytes?  If
> all data is going on to the end of the indexes and data rows clustering
does
> not matter.

> Do you have many columns that are nullable or variable length?  That takes
> an extra 30% time.

> Do you have non-clustered indexes on a separate drive and the log on a
> separate drive to maximize throughput?


> >I currently have a nightly process that does a bulk insert of
approximately
> >100,000 rows into a 15 million row table.  The destination table has 1
> >clustered and 2 non-clustered indexes that have fill_factors of 90 and
they
> >are dropped/recreated weekly.

> >I have tried using the BATCHSIZE, TABLOCK, ORDER, and

DATAFILETYPE='native'

- Show quoted text -

Quote:> >hints but the insert still takes over 40 minutes.   We have a good server
> >with quad processors which I don't think is the issue.

> >Creating the indexes takes much longer, so it is not an option to drop
them
> >before the bulk insert.  Does this time (40+ minutes) sound reasonable to
> >people???

> >I'm at my wits end to speed this thing up anymore.  Any help/thoughts
would
> >be appreciated.  Thanks in advance.

 
 
 

Slow Bulk Insert into Large Table

Post by msnnew » Wed, 03 May 2000 04:00:00


Sorry.   I'm on version 7.  The clustered index may have thousands of rows
per value so maybe this is my problem.    My indexes were created to
optimize the reports that are run against this table.  My thought was that I
should keep the indexes as small as possible that will run reports
efficiently and not have to do more work during inserts.  Maybe my thinking
was backwards.  Perhaps adding more columns to clustered index could
actually speed up insert??

Thanks.


> What release are you running?  Please include this with all questions
> as it often makes all the difference.

> If you are using 6.5..... how selective is the clustered index?  I ran
> into some terrible performance problems inserting into a table where
> the clustered index had thousands of rows per value.  It turns out
> this is a very bad idea with 6.5 and earlier; I have not tested it
> with 7.0 but I think it may not be an issue with that release.

> Roy


> >I currently have a nightly process that does a bulk insert of
approximately
> >100,000 rows into a 15 million row table.  The destination table has 1
> >clustered and 2 non-clustered indexes that have fill_factors of 90 and
they
> >are dropped/recreated weekly.

> >I have tried using the BATCHSIZE, TABLOCK, ORDER, and

DATAFILETYPE='native'

- Show quoted text -

Quote:> >hints but the insert still takes over 40 minutes.   We have a good server
> >with quad processors which I don't think is the issue.

> >Creating the indexes takes much longer, so it is not an option to drop
them
> >before the bulk insert.  Does this time (40+ minutes) sound reasonable to
> >people???

> >I'm at my wits end to speed this thing up anymore.  Any help/thoughts
would
> >be appreciated.  Thanks in advance.

 
 
 

Slow Bulk Insert into Large Table

Post by William Talad » Thu, 04 May 2000 04:00:00


A clustered non-unique index with a few duplicates per indexed item will
result in queries that are twice as slow.  Some of my queries ran 8 times
slower with just a few duplicates.  I believe you should add an identity
column onto your clustered index (non-clustered indexes don't matter).
 
 
 

Slow Bulk Insert into Large Table

Post by bruce barke » Fri, 05 May 2000 04:00:00


should make the difference. you want to insert more than 1 row at time, but
keep the unit of work not too large. you want the unit of work to fit in
memory, so no page writes occur, except for the log writes. i usually find
10k about the right size, but if your rows have lots of data and indexes,
you might try as low as 1000 rows at a time. i'd expect much better than 40
minutes with this approach, as it should easily beat bcp in slow mode.

note: with bcp the batchsize only controls how many rows are inserted before
a commit, and has little impact on performance (except on a rollback when it
fails). the tablock save little time, as lock escalation will happen very
soon anyway. mostly usefull for stopping readlocks getting in the way.


> Bruce - that's pretty much what I'm doing now.  I have a raw text file
that
> I process and insert into a work table that has just been truncated.  This
> work table has no indexes.    I was doing an insert statement to insert
all
> rows from work table into live table, but perf of that insert statement is
> about one hour.  In my testing, if I BCP data out of work table, then Bulk
> Insert into Live table, I get about 40 minutes.

> Isn't that essentially what you are saying to do???  Will the process
state
> column and rowcount statement that you have in your script really make a
> difference??

> Thanks for your help.



> > because you have index's, bulk load is really just a batch insert. to
make
> > it go fast do the following.:

> > create a work table with no indexes (i always use a work database on the
> > same server) with one additional column named processstate. also be sure
> > "select into / bulk" is set in the work database.

> > truncate work table
> > bcp data into work table (this will be fast, as the fast load routine
will
> > be used)
> > call sp to move data from work table to live table

> > create procedure loaddata
> > as




> > begin
> >     set rowcount 10000
> >     update workdb.dbo.worktable
> >         set processstate =1
> >     where isnull(processstate,0) < 1

> >     set rowcount 0


> >     begin
> >         insert livedata.dbo.livetable (...)
> >         select ... from workdb.dbo.worktable
> >         where processsate = 1

> >         update workdb.dbo.worktable
> >             set proccessstate = 2
> >         where processstate = 1
> >     end
> > end

> > this should get you down to a couple of minutes



> > > I currently have a nightly process that does a bulk insert of
> > approximately
> > > 100,000 rows into a 15 million row table.  The destination table has 1
> > > clustered and 2 non-clustered indexes that have fill_factors of 90 and
> > they
> > > are dropped/recreated weekly.

> > > I have tried using the BATCHSIZE, TABLOCK, ORDER, and
> > DATAFILETYPE='native'
> > > hints but the insert still takes over 40 minutes.   We have a good
> server
> > > with quad processors which I don't think is the issue.

> > > Creating the indexes takes much longer, so it is not an option to drop
> > them
> > > before the bulk insert.  Does this time (40+ minutes) sound reasonable
> to
> > > people???

> > > I'm at my wits end to speed this thing up anymore.  Any help/thoughts
> > would
> > > be appreciated.  Thanks in advance.

 
 
 

1. Error when bulk insert follows another large bulk insert

I am running a job out of MS SQL Enterprise Manager where the first step
turns off transaction logging, the 2nd step is a large bulk insert step,
and the 3rd step is another large bulk insert step.  I receive the
following error when executing the 3rd step.  Is this error occurring
because the database is still comitting data from the previous large bulk
insert at the same time as it is performing the next bulk insert?  If so,
is there a workaround?  Thank you for any help you can provide.

Error message from Step 3 (2nd bulk insert):
Backup, CHECKALLOC, bulk copy, SELECT INTO, and file manipulation
(such as CREATE FILE) operations on a database must be serialized.
Reissue the statement after the current backup, CHECKALLOC, or
file manipulation operation is completed. [SQLSTATE 42000] (Error 3023)  
The statement has been terminated. [SQLSTATE 01000] (Error 3621).  
The step failed.

--
Posted via CNET Help.com
http://www.help.com/

2. first project

3. Cached Updates get slower and slower during Large table import

4. Check for database corruption?

5. Insert Trigger on a bulk inserted table

6. How do i speed up inserting a large amount of data into a very large table

7. Large deletes/inserts against large tables....

8. BULK INSERT bug with large files?

9. How do i speed up inserting a large amount of data into a very large table

10. Tempdb can't handle large bulk insert operations