To follow up, I did more testing of my file. I started
cutting the file in half, seeing which half broke, then
cutting that in half and repeating. I finally did narrow
down the problem to a line in the web log, actually. The
line had an extra space in the query string, which space
is what the W3C defines as the field delimiter. Seems
like IIS should be escaping it to %20 or something. In
any case, BULK INSERT was not providing a correct line
number in the error message, even with MAXERRORS = 0. So
I have worked around this issue by checking for the
expected number of space delimiters in my log preparation
utility.
Interestingly, this extra space issue was not in my
original file that caused me to split the logs into 100MB
chunks. I can't, however reproduce that error, so I am
stumped as to how I originally had this problem. Oh well,
thanks for all the great feedback!
Sam
Quote:>-----Original Message-----
>It's sounding a bit mysterious, and your followup to this
suggests a few
>questions (at least until Linda W. catches up on news and
likely solves
>the problem):
>Can we see a few lines of the source file as well as the
CREATE TABLE
>statement for the destination table?
> (feel free to change some characters to X's, but don't
change the
>lengths or spacing of anything.
>Are all the lines terminated with the standard Windows
CHAR(13)+CHAR(10) ?
>When you import the file in chunks, do you get the same
number of rows
>in the table as there are lines in the original file?
>Is the last line of the file terminated with an end-of-
line?
>What is the exact file size in bytes (I recall some weird
error once
>that happens when a file is an exact multiple of some
magic number.)?
>Since you seem to have a workaround that imports all the
data, I'm more
>interested in whether this is a bug or not at this
point. It would be
>nice to feel sure enough it's a bug to open a case with
product support,
>but if it's being caused by something in your file (which
is too large
>to post), you'll get charged...
>SK
>>Thanks for the feedback. Yes, I am quite surprised that
>>they would default the maxerrors to 10! Seems like that
>>is quite a problem to me. In any case, I am not certain
>>that there are errors in my data. When I run the bulk
>>insert, it tells me the line number that has caused the
>>error (or the 11th as the case may be). I have copied
>>that row, along with about 100 rows on both sides of the
>>errored row into a new file. When I try to bulk insert
>>this file, supposedly containing the error that SQL
Server
>>reported, with maxerrors set to 0, it succeeds! How is
>>this possible? Is the line number that SQL Server
reports
>>back reliable?
>>Thanks,
>>Sam
>>>-----Original Message-----
writes:
>>>>I am using BULK INSERT to import a large web log file
>>into
>>>>a SQL Server 2000 SP3a table. I have set the
>>>>rowterminator to '\n' and the fieldterminator to ' '.
>>The
>>>>web log file has been converted to ASCII and has had
>>all #
>>>>comment lines removed. The log file is about 430MB.
>>When
>>>>I run the bulk insert on the large file, it gets a
>>>>datetime conversion error on row 1500000 or somewhere
>>>>about there. When I split the file into 100MB chunks
>>>>however, it works fine! Clearly, there is nothing
>>wrong
>>>>with the data in the file. Is this a known limitation
>>of
>>>>BULK INSERT, or is this some sort of bug?
>>>In addition to Steve's response, you may want to try
the
>>command-line
>>>tool BCP instead. With BCP you can define an error
file,
>>to which
>>>BCP will log the errouneous rows.
>>>As with BULK INSERT, the default behaviour of BCP is to
>>permit ten errors
>>>and stop on the eleventh.