I have noticed an irritating problem with the way that SQL Server Agent
handles errors generated by DTS packages. Is this a bug, or a bizarre design
feature. (Or have I missed something...)
I have a DTS package which runs a single stored procedure on a remote
machine. On occasions this sp fails (due to deadlock conflicts or other
issues, which are beyond my authority to change) The package is called as
part of an ETL process, controlled by SQL Server Agent. If the DTS package
fails, then the SQL Agent job step fails, and the rest of the job can't
proceed. The solution is usually to wait a couple of minutes and re-run,
but this needs to happen automatically.
I therefore changed the DTS package to attempt to run the stored procedure,
then if it fails to wait 2 minutes then try again.
This is achieved by a simple "on failure" workflow link from the sp call to
an ActiveX step that waits 2 minutes without doing anything, then an "on
success" workflow to another step which tries again to run the sp.
The DTS package works exactly as intended, and the second attempt is started
only if the first has failed. Looking at the package log, it reports that
the package executed successfully,and the detail shows that that the one
step failed, which is correct.
If I run this from SQL Server Agent, the failure of the step causes the SQL
Agent job step to fail, even though the package as a whole has completed
successfully. This renders the whole exercise pointless. I have not got
"fail package on first error" set, nor have I got the workflow packages of
the failing step set to "fail package on step failure". Either of these
would fail the package straight away, and change the reported status in the
log.
There is an obvious work-round, using SQL agent to test for a failed step,
then call a wait step and try again, but it is not elegant, and shouldn't be
necessary.
Is this a known bug with SQL Agent?
Is there a way I can force the DTS package to return a code acceptable to
SQL Agent which will allow it to class the step as successful? I could put
the try/wait/try logic in the SQL job, but I'd rather not, as it's not
elegant.
Regards,
Richard R.