[I tried posting this two nights ago, but I haven't seen it in the user
group. Sorry if this dupes down the line.]
I still don't have a bug number, but I do have a little more info.
Evidently this "large" transaction bug concerns some of the bugs being
fixed in the upcoming service pack 4. It also relates to my post about
recovery. There *is* a bug in that it is possible to introduce
duplicate keys in very large transactions. In my case, the transaction
was 5000 INSERTs (give or take one). I'm going to try to reproduce this
error later in the week, but let me relate the entirety of the problem
(which I thought would be too much info in my earlier post).
I'm converting a large FoxPro database to SQL Server. I'm running SQL
Server 6.5 Enterprise on a server configured with MSCS. I'm moving over
and massaging ~5 million rows of data. I've chunked it into groups of
5000 records. Each batch was taking ~20 minutes to execute. On the
night that the big job was run, it clashed with a user-scheduled
checkpoint (I haven't pointed fingers at that culprit yet!) shortly
after midnight. At 7AM, the same batch was running. The thread was
killed which (seemingly) triggered an error to MSCS which failed over to
our second server. This seems to have issued a SQL Server stop on
server A and a SQL Server start on server B. As it should, the start of
SQL Server B attempted to recover the database. Three days later, the
database was still in recovery (after 40 hours, attempts were made to
remove the database, set up trace flags, etc.) Nothing worked and MS
was notified. A MS tech investigated the problem and determined that
there were duplicates in the database.
Since this time, I've been told (all hearsay, I'm trying to get names)
that MS has identified this bug and has a hot fix that they're willing
to give us prior to the release of the service pack. Management is not
e*d about this and I'm just trying to get some more information from
anyone that might have any information. Some of this discussion has
surrounded the long time of recovery (hence my other posts).
I don't have much else to add. I'm sitting here with a client that is
none too happy, and I'm trying to placate decision-makers. I'm really
just looking for any info that anyone may have on this/these
problem(s). My client's downtime is measured in dollars per minute and
if we had a 40 hour database recovery in production, there'd be no need
to come into work because they'd be out of business.
Thanks for the soapbox (and all the help).