OSR5 slows - dtdaemon using most of cpu time

OSR5 slows - dtdaemon using most of cpu time

Post by Bill Vermilli » Thu, 06 Feb 1997 04:00:00



When it rains is snothes.

We've just moved the office staff on an OSR5 system to a site
about 2 miles away.  Things ran right for about 2 days but
things are now slow.

cpuhog shows that   dtdaemon   is consuming most of the cpu
power.   sar shows   about 35% user   65% system - no idle
time.

The dtdaemon shows  12:20:57 - as I last looked at it - and the
system has been up since January 30.  Nothing else has over an
hour worth of time consumed - most just mere minutes.

I can not find what/where dtdaemon is started from/by.
Pointers will be appreciated.

Hardware

Micronics M54PE - Pentium 90 - 128 MB ram
DPT Cachining controller - PCI = 16 MB ECC RAM
Unknown brand video card - PCI
3com 3c509B network card - just added 2 weeks ago
Two ISA Arnet Clustports - one with 2 V.35 connection to remote
                         - one with 1 v.35 connection to remote
EISA Arnet clusterport   - four 16 port devices
EISA Digi CX             - two 16 port devices
128MB RAM
128 MB Swap
2 GB Baracudda Primary
4 GB Baracudda Secondary
Sony STD7000 DAT
3.5" floppy / 5.25" floppy

Software

FACTS 6.? - running on BBX
Facetwin - terminals at remote and Savanah GA
BackupEdge

OSR5.02 - with of 0499a - network patches (did I get that
                          number correct)

The system just runs the one basic package.   Savanah came up
last week, and the remote Orlando Office came up on Friday -
the 30th.    Things ran fine Monday and then have become really
slow.

CPU is 100% used.

Typically 75-80 users.   Only about a dozen are heavy users the
rest are counter terminals doing sales tickets - with most out
of town.

Iozone shows in the 400KB/min on the 4GB and in the
the /u - using DTFS for rarely accessed files as we need more
storage space - the Iozone was down to 40KB/min - maximum.

It looks CPU starved - and as I said iohog shows dtdaemon
as the culprit.   sar looks okay except for no idle cpu time.

So what/where is dtdaemon?

Bill

(what have I left out of the problem description)
--

 
 
 

OSR5 slows - dtdaemon using most of cpu time

Post by Bela Lubki » Fri, 07 Feb 1997 04:00:00



> We've just moved the office staff on an OSR5 system to a site
> about 2 miles away.  Things ran right for about 2 days but
> things are now slow.

> cpuhog shows that   dtdaemon   is consuming most of the cpu
> power.   sar shows   about 35% user   65% system - no idle
> time.

dtdaemon is part of the operation of DTFS.

Quote:> The dtdaemon shows  12:20:57 - as I last looked at it - and the
> system has been up since January 30.  Nothing else has over an
> hour worth of time consumed - most just mere minutes.

> I can not find what/where dtdaemon is started from/by.
> Pointers will be appreciated.

I'm not sure this analysis is correct.  dtdaemon has consumed 12 hours
of CPU time, but the system had been up about 160 hours when you posted.
So it was consuming < 10% of the CPU -- significant, but not enough to
drive `sar -u` to 100% busy.  Look for other things spinning.

The fact that no other process has a large amount of CPU accumulated is
not necessarily relevant.  The CPU could be getting used up by a lot of
short-lived CPU hog processes.  (In fact, download tls518 from
ftp.sco.com:/TLS, run `cpuhog` to get a better picture of the problem.)

Quote:> Iozone shows in the 400KB/min on the 4GB and in the
> the /u - using DTFS for rarely accessed files as we need more
> storage space - the Iozone was down to 40KB/min - maximum.

I hope you mean 400KB/sec, 40KB/sec, otherwise your system is
astonishingly slow...  (400KB/sec isn't very good for the hardware you
described either, but at least it isn't 2 orders of magnitude too slow!)

Quote:> It looks CPU starved - and as I said iohog shows dtdaemon
> as the culprit.   sar looks okay except for no idle cpu time.

Ahh... if you have iohog, you already have tls518.  But iohog is the
wrong tool.  If the system is CPU-starved, you want to run cpuhog.

Quote:> So what/where is dtdaemon?

It does some magic with respect to DTFS.  Probably, part of what it does
is the actual compression/decompression of data.

I would recommend switching to HTFS.  DTFS's compression ratios do not
impress me, and you definitely pay a performance penalty.  You didn't
buy all that hot hardware to run slow, did you?  If you need space, 4GB
SCSI disks are about $1K, which is probably less than the value of the
time you've already wasted, not to mention the time wasted by all your
users waiting for compression.  (Nota bene: I am *NOT* saying that I
expect DTFS to be as slow as you're experiencing.  I don't.  I expect it
to be noticably, but acceptably, slower than HTFS.  Still, at modern
hardware prices, I don't think the economics make sense.  If you were
using a laptop with 1GB maximum hard disk it might make sense, but not
for a multiuser server.)

Quote:>Bela<


 
 
 

OSR5 slows - dtdaemon using most of cpu time

Post by Bill Vermilli » Fri, 07 Feb 1997 04:00:00





>> We've just moved the office staff on an OSR5 system to a site
>> about 2 miles away.  Things ran right for about 2 days but
>> things are now slow.
>> cpuhog shows that   dtdaemon   is consuming most of the cpu
>> power.   sar shows   about 35% user   65% system - no idle
>> time.
>dtdaemon is part of the operation of DTFS.

OK.  I searched for dtdaemon in the online docs and found
nothing.

Quote:>> The dtdaemon shows  12:20:57 - as I last looked at it - and the
>> system has been up since January 30.  Nothing else has over an
>> hour worth of time consumed - most just mere minutes.
>> I can not find what/where dtdaemon is started from/by.
>> Pointers will be appreciated.
>I'm not sure this analysis is correct.  dtdaemon has consumed 12 hours
>of CPU time, but the system had been up about 160 hours when you posted.
>So it was consuming < 10% of the CPU -- significant, but not enough to
>drive `sar -u` to 100% busy.  Look for other things spinning.

I ran a remote on sar yesterday and I'll be there today to pick
up the printouts.  Scanning them remotely didn't show anything.
The SW support (in Tampa) ran cpuhog remotely and noted the
dtdaemon was the top user - replaced periodically by bdflush.
I'll check with him today.  The new upgraded program he
installed is a real system hog.   It has tyically over 30 open
files for each user logged in.  The other day there were about
2200 open files.   The app needs to be in a real data base IMO
- eg Oracle, Informix, - a real RDBMs.

Quote:>The fact that no other process has a large amount of CPU accumulated is
>not necessarily relevant.  The CPU could be getting used up by a lot of
>short-lived CPU hog processes.  (In fact, download tls518 from
>ftp.sco.com:/TLS, run `cpuhog` to get a better picture of the problem.)

>> Iozone shows in the 400KB/min on the 4GB and in the
>> the /u - using DTFS for rarely accessed files as we need more
>> storage space - the Iozone was down to 40KB/min - maximum.
>I hope you mean 400KB/sec, 40KB/sec, otherwise your system is
>astonishingly slow...  (400KB/sec isn't very good for the hardware you
>described either, but at least it isn't 2 orders of magnitude too slow!)

Well - I did mean /sec.   However the HDs on an idle system
measure about 2MB/sec on tests with 100MB files.    The one
DTFS file systems shows about 400KB/sec.      The SW tech ran
those yesterday and got under 40KB/sec on the DTFS side and
under 400KB/sec on the HTFS side - far too slow.   The sar
shows some wait for the HDs - I'll have hard copies in my hands
later - but the CPU is always at 0% idle..     I think this
performance is abysmal - but so much has been changed in the
past two week - of course everyone under a deadline - who knows
what is causing this for sure.    

I wanted them to run fiber for the 1.5-2 mile run - but the
cost frightened them - and so they went frame relay at 56Kbs
with Cicso routers (too many people involved making decisions
IMO) and now the heavy users are complaining about being too
slow - but the line is also handling 5 PCs with Facetwin on
TCP/IP, on Digi Terminal server with about 10 Wyse terminals -
and two 395 Oki's that run non-stop.   The people who sold them
the FR devices convinced them that it would be fast enough!
What a business.

Quote:>> It looks CPU starved - and as I said iohog shows dtdaemon
>> as the culprit.   sar looks okay except for no idle cpu time.
>Ahh... if you have iohog, you already have tls518.  But iohog is the
>wrong tool.  If the system is CPU-starved, you want to run cpuhog.

I will be doulbe and triple checking.  I think perhaps there
might have been an errant process because a restart made things
go faster.

Quote:>> So what/where is dtdaemon?

>It does some magic with respect to DTFS.  Probably, part of what it does
>is the actual compression/decompression of data.
>I would recommend switching to HTFS.  DTFS's compression ratios do not
>impress me, and you definitely pay a performance penalty.  You didn't
>buy all that hot hardware to run slow, did you?  If you need space, 4GB
>SCSI disks are about $1K, which is probably less than the value of the
>time you've already wasted, not to mention the time wasted by all your
>users waiting for compression.

Long story but I'll try to make it short.  When they
upgraded to the lastest verison of FACTS - most of the files
doubled in size.  SO went from 800MB to 1.8GB.   I set up the
DTFS over a weekend to have room to store things.   That was to
be a holding area only for the developers to put things until
they didn't need them.   Now they are using that for a live
file system.

I told them we need to put another 4GB 'cudda on line - but
they didn't want to spend any more money because they had spent
so much already.  They run off and do things without checking
with everything as they think everything is down to a hammer
and nails level and don't realize how things can interact.

Three months ago they put up another remote city - and because
they found a 'good deal' on a building they got it WITHOUT
checking on anything else.   Well we had to go through three
LATA's to get leased lines to the building.  If they had gotten
a building 1 mile close the phone costs would have been about
$400/month cheaper.  This is typical of them.  Argh.
So they talked to their 'phone person' who is not the same as
the people who handle their 'data lines' - and he said Frame
Relay.  The priced that and found it was about 1/4 the price.
Trying to make that run with the hardware they wanted to run
was a noble but failed experiment.  Robert Lipe and his
fantastic support crew at Arnet can verify that one!

I am going to print this to take to the person in charge of the
system and maybe she can make the owner aware of this - and
that I am not making things up.  

Quote:>(Nota bene: I am *NOT* saying that I
>expect DTFS to be as slow as you're experiencing.  I don't.  I expect it
>to be noticably, but acceptably, slower than HTFS.

Running tests on large files - because that's all they seem to
have are large files - DTFS is noticeably slower than HTFS.

Granted these are two different drives - but both Seagate
Baracuddas.    On idle system over 2MB/sec read/write using
iozone on the HTFS on the 4GB drive - but about 400KB/sec on
the DTFS 2 GB 'cudda.   That to me is more than just noticeable
- but your mileage may vary.

Quote:>Still, at modern
>hardware prices, I don't think the economics make sense.  If you were
>using a laptop with 1GB maximum hard disk it might make sense, but not
>for a multiuser server.)
>I< know that - but I can't convince them of that.

This is typical of human nature.   Far too often I have seen
advice accepted from a stranger than someone they know.
Reason is that they know the first person (even though they
many nore really KNOW that person) and therefore make
judgements.   As the movie M*A*S*H calls them - these new
people are the 'pros from Dover'.   After all if they are from
out of town they HAVE to be good.  :-) ad infinitum.

This morning I called and she was talking with the people who
have the HW support contract.  Tape wouldn't come out of the
DAT drive.   The previous day it wouldn't come out until they
rebooted - I JUST found that out.

I told them that it sounded like something in the software
kept it from ejecting.   The reply was 'well it won't eject
when I push the button'.   Then I had to explain that.   She
had the drive replaced once already - but I didn't know about
that until after.    Someday maybe I'll get a customer who
tells me about all the problems instead of just the ones they
'think' I should know about!   I had one like that once - it
got to the point that he discussed even some of his business
decisions with me to see if I had any ideas.  That system just
hummed along as there were never any surprises.

Thanks for these pointers

Bill
--

 
 
 

1. CPU time used on 4 CPU's?

If I have 4 CPU's and am looking at one of the  various time output
programs, is the number real, or is it divided by 4?  or instance,
is it possible for me to get a cpu time usage on the system of 4
hours in a one hour period?  Or, will it only be 1 hour max.  Then,
how do I determine if all CPU's are being used equally.

BTW: This is  a DGUX system.  But, I would take any Unix flavor as
an example.

Side line:  I am looking for a good product for generating reports on
CPU usage sorted by users, Processes, and groups.  Not % of capacity,
but raw seconds of CPU time used.  Like time(1) or times(2).
Commercial is fine.  We would pay for it if it is good.

Is there a way to get times(2) to give up every process, as if it were
run by init.  I am looking for something other than inittab.  If an
inittab entry is the only way, examples would be appreciated.  I can't
play with this system to test it first!  (Crontab likewise.)

--

Sysop - Home Brew University BBS   Brew City Campus       414-238-9074
Genealogy Search:  Brickner, Kane, Kimbro, LaClaire, Snyder|Snider,
                   Seecs|Seetch|Sich, and Thorton.
  Visit my homepage for more ...          http://www.execpc.com/~jkane

2. ProLinea doesn't see EtherExpress NIC

3. OSR5.0.2 System time slow

4. X11-multiuser-game (blast)

5. Is the time in gprof CPU time or elapsed time?

6. Soundblaster 16 recording problem

7. Xsun uses too many CPU resources and slow

8. Accessing platform/domain shells with SSH

9. help with code using too much CPU time

10. use ps to determine the percentage of time a process is using a CPU

11. Possible to get CPU time used for a process that has not exited?

12. gil process using a lot of CPU time

13. How to get the total CPU time used by parent, and all the children.