Ongoing problems...

Ongoing problems...

Post by Jan Blickster » Sat, 28 Jun 2003 20:57:49



Dear everyone,

I posted a week or so back regarding our ongoing problem with Database
lockups, and crashes. You may remember that we had regular situations where
the database would lock, and could only be accessed once all the users have
logged out, and the lck files on the network deleted.

Even though I have received much useful ideas and advice from this group,
nothing has solved the problem, so once again I would like to appeal for
help, as we are still getting an average of 3 lockups/crashes a day, and
unless this is solved soon, my job is on the line.

Here is a breakdown of what I have tried....

[1] All netware clients (XP & 98) upgraded to latest versions.
[2] Checked tables for evidence of corruption, nothing found.
[3] Used Remlock util to remove ALL locks and restart.
[4] Examined lck files after lockup, using netdump and lockdump.
[5] Checked network cables etc for problems.

None of the above have given me any more hints as to what is going wrong.
Being as the problem doesn't seem to rest with any one or group of users or
PCs, doesn't seem to consistently affect the same tables and doesn't appear
to have any other patterns, does anymore have any more ideas that can help
my users, improve managements temper, and save my career?

Many thanks

 
 
 

Ongoing problems...

Post by Steven Gree » Sat, 28 Jun 2003 21:16:22



> [1] All netware clients (XP & 98) upgraded to latest versions.

what is the server?.. is LOCAL SHARE always TRUE?.. are the oplocks and
write-behind settings disabled everywhere?

Quote:> [2] Checked tables for evidence of corruption, nothing found.

using what tool and process?.. the Verify option misses many things.. and the
Warnings results, if any, can be very important..

Quote:> [3] Used Remlock util to remove ALL locks and restart.
> [4] Examined lck files after lockup, using netdump and lockdump.

which sessions abort?.. which sessions don't clear from the files?.. honest
question - do you really know what you're looking for in those files?

Quote:> [5] Checked network cables etc for problems.

professional sniffer, or you and/or your own people casually?

I'm not trying to attack your skills, but these questions must be asked..

--

Steve Green - Diamond Software Group, Inc - Waldorf Maryland USA
Corel CTech Paradox - http://www.diamondsg.com - Support/Downloads/Links
---------------------------------------------------------------------------------

Do you need a Sanity Check? http://www.diamondsg.com/sanity.htm
Upgrade/Downgrade versions? http://www.diamondsg.com/upgrade.htm
-------------------------------------------------------------------------

 
 
 

Ongoing problems...

Post by Jan Blickster » Sat, 28 Jun 2003 21:26:37


No don't worry about attacking my skills - I'm not trained for this.

Quote:>what is the server?.. is LOCAL SHARE always TRUE?.. are the oplocks and
> write-behind settings disabled everywhere?

Compaq Presario, running Novell Netware 4.11
Local Share is always true/On.
Oplocks disabled on all clients.

I can't see a write-behind option (I assume this is a Novell Client switch).

Quote:> > [4] Examined lck files after lockup, using netdump and lockdump.

> which sessions abort?.. which sessions don't clear from the files?..
honest
> question - do you really know what you're looking for in those files?

No, I don't really know what I'm loking for in these files., but I did post
the results here, and people explained what was going on. Perhaps someone
could point me in the direction of a guide, and I'll try again.

Quote:> professional sniffer, or you and/or your own people casually?

No, it was done properly, we found a couple of problems, but nothing that
solved the DB issues.



> > [1] All netware clients (XP & 98) upgraded to latest versions.

> what is the server?.. is LOCAL SHARE always TRUE?.. are the oplocks and
> write-behind settings disabled everywhere?

> > [2] Checked tables for evidence of corruption, nothing found.

> using what tool and process?.. the Verify option misses many things.. and
the
> Warnings results, if any, can be very important..

> > [3] Used Remlock util to remove ALL locks and restart.

> > [4] Examined lck files after lockup, using netdump and lockdump.

> which sessions abort?.. which sessions don't clear from the files?..
honest
> question - do you really know what you're looking for in those files?

> > [5] Checked network cables etc for problems.

> professional sniffer, or you and/or your own people casually?

> I'm not trying to attack your skills, but these questions must be asked..

> --

> Steve Green - Diamond Software Group, Inc - Waldorf Maryland USA
> Corel CTech Paradox - http://www.diamondsg.com - Support/Downloads/Links
> --------------------------------------------------------------------------
-------

> Do you need a Sanity Check? http://www.diamondsg.com/sanity.htm
> Upgrade/Downgrade versions? http://www.diamondsg.com/upgrade.htm
> -------------------------------------------------------------------------

 
 
 

Ongoing problems...

Post by Rodney Wis » Sat, 28 Jun 2003 21:32:09


Try this:

1. Download the file
    http://www.ars-florida.com/DIAGNOS.zip

2. Unzip it into its own directory.

3. Read the ReadMe.RTF file.

--
...
    `.??.`.??.`->  rodney

 
 
 

Ongoing problems...

Post by Tony McGuir » Sat, 28 Jun 2003 22:48:46


Do you have a UPS on that server?

Is power management shut off in the BIOS on the server?

Is power management shut off in the BIOS and OS on the clients?

As well, check the NIC cards on all systems to make sure they aren't set to
be turned off or put to sleep by the OS - to 'save power'.

--
--
Tony

"I woke up and was able to get myself out of bed.
Being that fortunate, what's to complain about?"
_____________

 
 
 

Ongoing problems...

Post by Liz » Sat, 28 Jun 2003 23:52:02


Jan,

Just double checking - LOCAL SHARE is a BDE setting, nothing to do
with NetWare, should be the same on each client machine.

Write-behind caching is not a NetWare thing, it's a Windows thing.  On
Win2K/XP you can find it in the properties for the drive, hardware
tab, select drive, click properties button, select disc properties
tab, uncheck "Write cache enabled" box if possible.  Only matters for
local HDDs on a Windows computer.  NOTE: I cannot get this to stick -
reboot and the box is checked again.  I've never seen anyone else
report differently or how to fix it, so my suggestion is to not put
any shared .DB files on a Windows machine.

Win9x: Control Panel, System applet, Performance tab, File System
button, Troubleshooting tab, CHECK the boxes labeled:
    i. "Disable new file sharing and locking semantics."
    ii. "Disable write-behind caching for all drives."
    iii. On the Hard Disk tab, you may wish to try setting Read-ahead
optimization to None, but I think that should only make a difference
on the server machine.

If you're 100% certain that 100% of the hardware involved (all RAM
(esp. server's), all cables, all switches (you are using switches
rather than hubs, right?), all NICs, all HDDs etc.)*, then I'd say
it's time to look at user behavior and code.  Is a user doing
something you didn't develop for?  Is the code doing things which
require full table locks (queries, add(), cMax() (any c..() function),
etc.)?  Is the code not trapping all errors?  Do you have potential
terminal loops (I've seen apps prompt the user for input or to select
a menu item and not let the user out until a valid input was made or
menu item selected - if a user wants to cancel, there's no way but to
Ctrl+Alt+Delete, and if a user really wants to cancel, that's what the
user will do.).

*A database app will be the first thing to tell you there's hardware
problems.  Unfortunately, the hardware guy will be the last person to
believe this.  I'll let you fight the battle.

NOTE: I'm not questioning your abilities, but like Steve said, these
things have to be asked because it's not normal for a Paradox DB to
die on a NetWare network.

What changed recently (just before the lockups started)?  How many
users do you have?

Liz


> No don't worry about attacking my skills - I'm not trained for this.

> >what is the server?.. is LOCAL SHARE always TRUE?.. are the oplocks and
> > write-behind settings disabled everywhere?

> Compaq Presario, running Novell Netware 4.11
> Local Share is always true/On.
> Oplocks disabled on all clients.

> I can't see a write-behind option (I assume this is a Novell Client switch).

> > > [4] Examined lck files after lockup, using netdump and lockdump.

> > which sessions abort?.. which sessions don't clear from the files?..
> honest
> > question - do you really know what you're looking for in those files?

> No, I don't really know what I'm loking for in these files., but I did post
> the results here, and people explained what was going on. Perhaps someone
> could point me in the direction of a guide, and I'll try again.

> > professional sniffer, or you and/or your own people casually?

> No, it was done properly, we found a couple of problems, but nothing that
> solved the DB issues.

 
 
 

Ongoing problems...

Post by Jan Blickster » Sun, 29 Jun 2003 00:42:38


Liz,

Please don't worry about questioning my abilities. I don't know enough about
these things, and in my experience, the computer systems person who says
they know everything, is either lying, or over confident and likely to see a
complete network failure sooner rather than later.

Local Share in the BDE is set to ON for every machine.

I will get each machine checked for the Write Behind setting. Steve
mentioned the OpLocks setting. This is a setting on Novell Client though
isn't it?

Tony asked about UPS & power management.

Yes we have a UPS on the server, and Power Management is shut off in the
BIOS on server (and on clients). As far as I am aware, none of out NIC cards
are set to turn off. We will check these though.

Hardware - this is a possible area of problem, now you mention
Hubs/Switches. We do have a couple of older offices with Hubs, and not
switches. These are the offices that have recently had extra staff in them
using the database.... can you tell me why hubs could be a problem in this
context?

If non of this works, I am going to get the code taken apart and looked at
bit, by bit, though you will all appreciate that this will be the hard and
slowest part.

Thanks everyone, I look forward to your responses.

J


> Jan,

> Just double checking - LOCAL SHARE is a BDE setting, nothing to do
> with NetWare, should be the same on each client machine.

> Write-behind caching is not a NetWare thing, it's a Windows thing.  On
> Win2K/XP you can find it in the properties for the drive, hardware
> tab, select drive, click properties button, select disc properties
> tab, uncheck "Write cache enabled" box if possible.  Only matters for
> local HDDs on a Windows computer.  NOTE: I cannot get this to stick -
> reboot and the box is checked again.  I've never seen anyone else
> report differently or how to fix it, so my suggestion is to not put
> any shared .DB files on a Windows machine.

> Win9x: Control Panel, System applet, Performance tab, File System
> button, Troubleshooting tab, CHECK the boxes labeled:
>     i. "Disable new file sharing and locking semantics."
>     ii. "Disable write-behind caching for all drives."
>     iii. On the Hard Disk tab, you may wish to try setting Read-ahead
> optimization to None, but I think that should only make a difference
> on the server machine.

> If you're 100% certain that 100% of the hardware involved (all RAM
> (esp. server's), all cables, all switches (you are using switches
> rather than hubs, right?), all NICs, all HDDs etc.)*, then I'd say
> it's time to look at user behavior and code.  Is a user doing
> something you didn't develop for?  Is the code doing things which
> require full table locks (queries, add(), cMax() (any c..() function),
> etc.)?  Is the code not trapping all errors?  Do you have potential
> terminal loops (I've seen apps prompt the user for input or to select
> a menu item and not let the user out until a valid input was made or
> menu item selected - if a user wants to cancel, there's no way but to
> Ctrl+Alt+Delete, and if a user really wants to cancel, that's what the
> user will do.).

> *A database app will be the first thing to tell you there's hardware
> problems.  Unfortunately, the hardware guy will be the last person to
> believe this.  I'll let you fight the battle.

> NOTE: I'm not questioning your abilities, but like Steve said, these
> things have to be asked because it's not normal for a Paradox DB to
> die on a NetWare network.

> What changed recently (just before the lockups started)?  How many
> users do you have?

> Liz


> > No don't worry about attacking my skills - I'm not trained for this.

> > >what is the server?.. is LOCAL SHARE always TRUE?.. are the oplocks and
> > > write-behind settings disabled everywhere?

> > Compaq Presario, running Novell Netware 4.11
> > Local Share is always true/On.
> > Oplocks disabled on all clients.

> > I can't see a write-behind option (I assume this is a Novell Client
switch).

> > > > [4] Examined lck files after lockup, using netdump and lockdump.

> > > which sessions abort?.. which sessions don't clear from the files?..
> > honest
> > > question - do you really know what you're looking for in those files?

> > No, I don't really know what I'm loking for in these files., but I did
post
> > the results here, and people explained what was going on. Perhaps
someone
> > could point me in the direction of a guide, and I'll try again.

> > > professional sniffer, or you and/or your own people casually?

> > No, it was done properly, we found a couple of problems, but nothing
that
> > solved the DB issues.

 
 
 

Ongoing problems...

Post by Tony McGuir » Sun, 29 Jun 2003 00:53:19


: BIOS on server (and on clients). As far as I am aware, none of out NIC
cards
: are set to turn off. We will check these though.

Many, if not most, manufacturers have Windows set to turn the NIC cards off
by default.  So you have to be proactive about this.  If the NIC card is
off, Paradox can't find the server, may appear to lock up and thus gets a
3-finger salute.

Power management in the BIOS doesn't necessarily prevent Windows from
managing the power settings.

And for goodness sakes turn off all screen savers.  They are cute (some of
them), but they eat clock cycles.  I've known of several instances where
they created problems with Paradox and open tables.  (Well, I say this but I
couldn't prove it technically.  Other than that we were having problems,
disabled the screen savers and the problems went away.)

--
--
Tony

"I woke up and was able to get myself out of bed.
Being that fortunate, what's to complain about?"
_____________

 
 
 

Ongoing problems...

Post by Ed Covne » Sun, 29 Jun 2003 02:12:26


Jan,

Does the network have a gateway to the internet?
Do you have virus protection?  For email also?
Next time it locks up, could the internet plug be
pulled easily? (before "unlocking" efforts are taken)

Ed

--


 
 
 

Ongoing problems...

Post by Liz » Sun, 29 Jun 2003 02:20:01


Jan,

2. Oplocks:
  A. NT: http://www.tonymcguire.com/oplocks-nt.htm
    i. UseOpportunisticLocking should be 0
    ii. EnableOplocks should be 0
    iii. The rest don't matter
  B. Win2K/XP: http://www.tonymcguire.com/oplocks-2000.htm
    i. OplocksDisabled should be 1
    ii. EnableOplocks should be 0
    iii. The rest don't matter
  C. Win9x: Control Panel, System applet, Performance tab, File System
button, Troubleshooting tab, CHECK the boxes labeled:
    i. "Disable new file sharing and locking semantics."
    ii. "Disable write-behind caching for all drives."
    iii. On the Hard Disk tab, you may wish to try setting Read-ahead
optimization to None, but I think that should only make a difference
on the server machine.

Liz


> ... Steve
> mentioned the OpLocks setting. This is a setting on Novell Client though
> isn't it?

 
 
 

Ongoing problems...

Post by Liz » Sun, 29 Jun 2003 02:23:11


Jan,

Hubs generate more network traffic, or at least don't manage it well
(I'm not a network guru so I can't tell you the specifics).  Hubs
broadcast all packets to all machines, switches forward packets to the
machine the packet is intended for.  If I understand correctly.

Extra staff - there's a change, it seems.  Perhaps the hubs can't
handle them, perhaps the application design can't handle them.

How many total users accessing the same .NET file?

Liz


> Hardware - this is a possible area of problem, now you mention
> Hubs/Switches. We do have a couple of older offices with Hubs, and not
> switches. These are the offices that have recently had extra staff in them
> using the database.... can you tell me why hubs could be a problem in this
> context?

 
 
 

Ongoing problems...

Post by Rodney Wis » Sun, 29 Jun 2003 02:27:30


Jan,

If you run the DIAGNOS script, it will do all of this for you.

--
...
    `.??.`.??.`->  rodney

 
 
 

Ongoing problems...

Post by Steve Urbac » Sun, 29 Jun 2003 10:55:51


On Fri, 27 Jun 2003 16:42:38 +0100, "Jan Blickstern"


>I will get each machine checked for the Write Behind setting. Steve
>mentioned the OpLocks setting. This is a setting on Novell Client though
>isn't it?

Both Windows and Netware have Oplock settings. AFAIK they are not
inclusive. AFIK Windows affects the Local drive(s), And Netware
affects the Netware devices only.

Remove 'suspend' from  power management (shutdown options) = deadly to
paradox users.

Steve U

 
 
 

Ongoing problems...

Post by Steve Urbac » Sun, 29 Jun 2003 11:02:37




Quote:>Jan,

>Hubs generate more network traffic, or at least don't manage it well
>(I'm not a network guru so I can't tell you the specifics).  Hubs
>broadcast all packets to all machines, switches forward packets to the
>machine the packet is intended for.  If I understand correctly.

>Extra staff - there's a change, it seems.  Perhaps the hubs can't
>handle them, perhaps the application design can't handle them.

Look at your hub or Switch. Many have a 'Collision' indicator.
Some Collisions are normal. Constant collisions is not.
Excessive collisions can be caused by a overloaded network OR a
defective NIC.   Removing a single device (exclude server) should not
drastically reduce the colision occurance. If it does = suspect  bad
device NIC.
Quote:>How many total users accessing the same .NET file?

>Liz

Steve U
 
 
 

Ongoing problems...

Post by Jan Blickster » Tue, 01 Jul 2003 17:38:32


Thanks again for all the advice.

We are now going around all the machines in turn, configuring them as you
all suggest, where there are any differences.

One quick question - one of our machines is Windows ME. Are there any
specifics for this PC?

Thanks again.


Quote:> Try this:

> 1. Download the file
>     http://www.ars-florida.com/DIAGNOS.zip

> 2. Unzip it into its own directory.

> 3. Read the ReadMe.RTF file.

> --
> ...
>     `.??.`.??.`->  rodney

 
 
 

1. Ongoing SQL 7.0 crashing problems

I have a very torching problem.

We have a Windows NT Server 4.0 running Sp 6a, IIS 4.0, using index server,
SQL  7, latest service packs on dual 550 Pentium 3 processors. 256k of RAM,
plenty of hard drive space.

We have a about 6 databases. Large one about 100 mb. The rest averaging 1 to
10 mb.

This server crashes every few days. Just freezes up. In addition, any time
we attempt to modify a database's tables or manipulate indexing in any way.
The server crashes. The only way we are able to make changes to the
databases is to make changes to them on another SQL server running the
identical databases and publish them over.

We don't receive any errors messages in Event Viewer. We have nothing to go
on.

I am at a lose as to what the problem is or what steps to take to approach
the problem.

Any suggestions?

Thanks

Anthony

2. change passwd user from db2

3. ongoing tempdb problems...

4. Differences in dblib...

5. Will pay for help with an ongoing ADP - SQL Server Project

6. Connection Issue with Application

7. ongoing deadlock with ddl + procs

8. Variable TableName in Query

9. ongoing holy grail thread - VICTORY!!!

10. ongoing holy grail thread

11. Problems, problems, problems

12. SQL problem, MSDTC Problem or VB.NET problem?