sar on OSE5.0.2 "kernel check failed"

sar on OSE5.0.2 "kernel check failed"

Post by Mark Ra » Wed, 15 Jan 1997 04:00:00



I'm getting our new server set up, and I did a fresh install of OSE 5.0.2.
I ran '/usr/lib/sa/sar_enable -y' and it said:
  kernel check failed, kernel = (null), code = Unknown
very weird.  I tried 'sar -u 5 10', and got the same error.

Details:
  Dell PowerEdge 4100 (PPro 200MHz), 128MB Ram, integrated adaptec 7880
  (ultra/wide) controller (disabled in BIOS) and 7860 (ultra/narrow)
  SCSI.  Mylex DAX960PU RAID adapter.

  CDROM, tape drive and boot disk on 7860 using alad driver.  Root disk
  on RAID controlled by mdac driver.  I suspect that the problem is
  somehow related to having boot and root on different drives.

I had to jump through some hoops to get it installed because I can't
currently use the BIOS on the RAID controller to boot from.  I ended up
installing the OS twice, once on the boot disk and once on the RAID.  I
now boot from the adaptec controller, using the kernel copied from the
mylex RAID, and everything appears to work correctly (including things
that I thought were sar-like in their kernel probing like ps and memhog).

/unix is a symlink to /stand/unix, which is a different file (but an
exact copy) than the one I boot from (on another disk).  It's group mem
readable and sar is sgid mem, so it shouldn't be a permissions problem.
Just to be sure, I changed /stand/unix permissions to world-read with no
effect.  I also tried unmounting /stand and mounting the first
filesystem from the boot disk (where the actual boot kernel is) there,
with no change.

Any advice greatly appreciated.
--

 
 
 

sar on OSE5.0.2 "kernel check failed"

Post by Bela Lubki » Thu, 16 Jan 1997 04:00:00



> I'm getting our new server set up, and I did a fresh install of OSE 5.0.2.
> I ran '/usr/lib/sa/sar_enable -y' and it said:
>   kernel check failed, kernel = (null), code = Unknown
> very weird.  I tried 'sar -u 5 10', and got the same error.

> Details:
>   Dell PowerEdge 4100 (PPro 200MHz), 128MB Ram, integrated adaptec 7880
>   (ultra/wide) controller (disabled in BIOS) and 7860 (ultra/narrow)
>   SCSI.  Mylex DAX960PU RAID adapter.

>   CDROM, tape drive and boot disk on 7860 using alad driver.  Root disk
>   on RAID controlled by mdac driver.  I suspect that the problem is
>   somehow related to having boot and root on different drives.

> I had to jump through some hoops to get it installed because I can't
> currently use the BIOS on the RAID controller to boot from.  I ended up
> installing the OS twice, once on the boot disk and once on the RAID.  I
> now boot from the adaptec controller, using the kernel copied from the
> mylex RAID, and everything appears to work correctly (including things
> that I thought were sar-like in their kernel probing like ps and memhog).

> /unix is a symlink to /stand/unix, which is a different file (but an
> exact copy) than the one I boot from (on another disk).  It's group mem
> readable and sar is sgid mem, so it shouldn't be a permissions problem.
> Just to be sure, I changed /stand/unix permissions to world-read with no
> effect.  I also tried unmounting /stand and mounting the first
> filesystem from the boot disk (where the actual boot kernel is) there,
> with no change.

This explains it.

sar(ADM) is one of the few users of kernel(ADM), a new interface which
communicates from /boot up to multiuser mode the name and inode number
(and maybe some other details -- perhaps COFF datestamp?) of the file
from which the kernel was booted.  Your copy-of-/stand filesystem is
apparently not identical to the actual /stand on the first drive, which
is being used to boot the system.  Specifically, although the same
kernel exists with the same /mountpoint/unix name, their inode numbers
must be different.

There is an easy fix for this, which will also fix another potentially
serious problem.  What you should do is rename your current /dev/boot
(/stand) filesystem, then rename the real /stand as /dev/boot.  That is,
run `divvy` on the partition containing the root filesystem and give a
different name, e.g. "not-boot", to the "boot" division.  Then run
`divvy` on the partition containing the boot filesystem and give it the
name "boot".  Next time you boot, the real /stand filesystem will be
mounted, kernel(ADM) will be able to do its job, and sar will be happy.

A side effect is that when you relink the kernel or change
/etc/default/boot, changes will be reflected on the actual /stand
filesystem which is used to boot your system, rather than in the sterile
copy found on your root drive.  This could save you a lot of
head-scratching in the future when you can't figure out why your kernel
changes are having no effect...

(You could then take over the "not-boot" division for another purpose --
scratch space, or perhaps a bit of extra swap space...)

- Show quoted text -

Quote:>Bela<


 
 
 

sar on OSE5.0.2 "kernel check failed"

Post by Roberto Zi » Fri, 24 Jan 1997 04:00:00



 >
 >
 >> I'm getting our new server set up, and I did a fresh install of OSE 5.0.2.
 >> I ran '/usr/lib/sa/sar_enable -y' and it said:
 >>   kernel check failed, kernel = (null), code = Unknown
 >> very weird.  I tried 'sar -u 5 10', and got the same error.
 >>
 >> Details:
 >>   Dell PowerEdge 4100 (PPro 200MHz), 128MB Ram, integrated adaptec 7880
 >>   (ultra/wide) controller (disabled in BIOS) and 7860 (ultra/narrow)
 >>   SCSI.  Mylex DAX960PU RAID adapter.
 >>
 >>   CDROM, tape drive and boot disk on 7860 using alad driver.  Root disk
 >>   on RAID controlled by mdac driver.  I suspect that the problem is
 >>   somehow related to having boot and root on different drives.
 >>
 >> I had to jump through some hoops to get it installed because I can't
 >> currently use the BIOS on the RAID controller to boot from.  I ended up
 >> installing the OS twice, once on the boot disk and once on the RAID.  I
 >> now boot from the adaptec controller, using the kernel copied from the
 >> mylex RAID, and everything appears to work correctly (including things
 >> that I thought were sar-like in their kernel probing like ps and memhog).
 >>
 >> /unix is a symlink to /stand/unix, which is a different file (but an
 >> exact copy) than the one I boot from (on another disk).  It's group mem
 >> readable and sar is sgid mem, so it shouldn't be a permissions problem.
 >> Just to be sure, I changed /stand/unix permissions to world-read with no
 >> effect.  I also tried unmounting /stand and mounting the first
 >> filesystem from the boot disk (where the actual boot kernel is) there,
 >> with no change.
 >
 >This explains it.
 >
 >sar(ADM) is one of the few users of kernel(ADM), a new interface which
 >communicates from /boot up to multiuser mode the name and inode number
 >(and maybe some other details -- perhaps COFF datestamp?) of the file
 >from which the kernel was booted.  Your copy-of-/stand filesystem is
 >apparently not identical to the actual /stand on the first drive, which
 >is being used to boot the system.  Specifically, although the same
 >kernel exists with the same /mountpoint/unix name, their inode numbers
 >must be different.
 >
 >There is an easy fix for this, which will also fix another potentially
 >serious problem.  What you should do is rename your current /dev/boot
 >(/stand) filesystem, then rename the real /stand as /dev/boot.  That is,
 >run `divvy` on the partition containing the root filesystem and give a
 >different name, e.g. "not-boot", to the "boot" division.  Then run
 >`divvy` on the partition containing the boot filesystem and give it the
 >name "boot".  Next time you boot, the real /stand filesystem will be
 >mounted, kernel(ADM) will be able to do its job, and sar will be happy.
 >
 >A side effect is that when you relink the kernel or change
 >/etc/default/boot, changes will be reflected on the actual /stand
 >filesystem which is used to boot your system, rather than in the sterile
 >copy found on your root drive.  This could save you a lot of
 >head-scratching in the future when you can't figure out why your kernel
 >changes are having no effect...
 >
 >(You could then take over the "not-boot" division for another purpose --
 >scratch space, or perhaps a bit of extra swap space...)
 >
 >>Bela<

Bela,

        thanks for the clarification. A couple of days ago we
had a customer reporting the same exact problem so I suggested
him the above steps in order to solve his problem. After fiddling
with the OS, the customer reported to me that he was still unable
to solve his problem because he was not able to find /boot (and
the mounted /stand) on his filesystem. I was a little disappointed
since he stated he had SCO OS 5.0.0 and every SCO OS 5.0.x
installation I attended used to make a /boot partition and a /stand
read only filesystem. After talking with him I was told that he
upgraded from SCO UNIX 3.2v4.2 -> Open Server 5.0 Host System
and, during the installation, he instructed the system to 'preserve'
the original filesystem configuration. I asked him to send me his
current divvy configuration and here it goes:

==================================================
        Name    Type    New Fs  #       First bl        Last bl
==================================================
        root    EAFS    no      0       0       465881
        swap    NON FS  no      1       465882  529881
        u       EAFS    no      2       529882  1029881
                NOT US  no      3
                NOT US  no      4
                NOT US  no      5
        recover NON FS  no      6       1029882 1029891
        hd0a    WHOLE   no      7       0       1031939
==================================================

I have to admit that I never tried to upgrade from SCO UNIX to SCO OS5
with filesystem preservation so I'm not in the position to tell if the
customer operated correctly but assuming he did
(since he informed me that he's able to use the
OS correctly) how can we workaround
the fact that sar continues failing ?

Should I tell him to reinstall from scratch ? I suggested him to
create a dummy /stand and to fill it with the needed files
but since then I've not heard anything from him.

Could you shed some lights over this problem ?

Thanks again,
Roberto

--
---------------------------------------------------------------------          

Strhold Sistemi EDP                                                            
Reggio Emilia      ITALY                                                        
---------------------------------------------------------------------          
"Has anybody seen an aircraft carrier around ?"                                
        (Pete "Maverick" Mitchell - Top Gun)                                    
---------------------------------------------------------------------

 
 
 

sar on OSE5.0.2 "kernel check failed"

Post by Bela Lubki » Fri, 24 Jan 1997 04:00:00


[customer getting "kernel check failed" messages from sar, and they did
in-place upgrade from ODT 3.0 to OpenServer Release 5]

This problem is discussed in an SCO Technical Article at:

  http://www.sco.com/cgi-bin/ssl_reference?482688

Quote:>Bela<

 
 
 

sar on OSE5.0.2 "kernel check failed"

Post by Bill Vermilli » Tue, 28 Jan 1997 04:00:00




> >This explains it.
...
> >There is an easy fix for this, which will also fix another potentially
> >serious problem.  What you should do is rename your current /dev/boot
> >(/stand) filesystem, then rename the real /stand as /dev/boot.  That is,
> >run `divvy` on the partition containing the root filesystem and give a
> >different name, e.g. "not-boot", to the "boot" division.  Then run
> >`divvy` on the partition containing the boot filesystem and give it the
> >name "boot".  Next time you boot, the real /stand filesystem will be
> >mounted, kernel(ADM) will be able to do its job, and sar will be happy.
>Bela,
>    thanks for the clarification. A couple of days ago we
>had a customer reporting the same exact problem so I suggested
>him the above steps in order to solve his problem. After fiddling
>with the OS, the customer reported to me that he was still unable
>to solve his problem because he was not able to find /boot (and
>the mounted /stand) on his filesystem. I was a little disappointed
>since he stated he had SCO OS 5.0.0 and every SCO OS 5.0.x
>installation I attended used to make a /boot partition and a /stand
>read only filesystem. After talking with him I was told that he
>upgraded from SCO UNIX 3.2v4.2 -> Open Server 5.0 Host System
>and, during the installation, he instructed the system to 'preserve'
>the original filesystem configuration. I asked him to send me his
>current divvy configuration and here it goes:

I did ONE upgrade from a 2.2v4.2 ->osr5.  Never again.

Programs that expected an OSR5 environment, and
/stand,/boot,/proc, and friends didn't work.   A pure OSR5
install fixed those problems.

Quote:>I have to admit that I never tried to upgrade from SCO UNIX to SCO OS5
>with filesystem preservation so I'm not in the position to tell if the
>customer operated correctly but assuming he did
>(since he informed me that he's able to use the
>OS correctly) how can we workaround
>the fact that sar continues failing ?
>Should I tell him to reinstall from scratch ? I suggested him to
>create a dummy /stand and to fill it with the needed files
>but since then I've not heard anything from him.

Based on things I had not work - such as BackupEdge for OSR5 not
running on the upgraded version - I'd suggest a re-install.

Since the OSR5 is more like V.4 than V.3 in many ways there are
probably more unexpected suprises.

--

 
 
 

sar on OSE5.0.2 "kernel check failed"

Post by Roberto Zi » Wed, 29 Jan 1997 04:00:00



 >


 >
 >> >This explains it.
 >...
 >

[big snip]

 >I did ONE upgrade from a 2.2v4.2 ->osr5.  Never again.
 >
 >Programs that expected an OSR5 environment, and
 >/stand,/boot,/proc, and friends didn't work.   A pure OSR5
 >install fixed those problems.

Yep, agreed !

 >
 >
 >Based on things I had not work - such as BackupEdge for OSR5 not
 >running on the upgraded version - I'd suggest a re-install.
 >
 >Since the OSR5 is more like V.4 than V.3 in many ways there are
 >probably more unexpected suprises.

This is exactly what I suggested to him, after sending the notes
that Bela kindly posted. I'm still waiting for a feedback from him
(which, as custom here in Italy, probably will never reach me ;-).

Thanks !

--
---------------------------------------------------------------------          

Strhold Sistemi EDP                                                            
Reggio Emilia      ITALY                                                        
---------------------------------------------------------------------          
"Has anybody seen an aircraft carrier around ?"                                
        (Pete "Maverick" Mitchell - Top Gun)                                    
---------------------------------------------------------------------

 
 
 

sar on OSE5.0.2 "kernel check failed"

Post by Roberto Zi » Fri, 31 Jan 1997 04:00:00



 >

 > >


says...
 > >
 > >> >This explains it.
 > >...
 > >
 >
 >[big snip]
 >
 > >I did ONE upgrade from a 2.2v4.2 ->osr5.  Never again.
 > >
 > >Programs that expected an OSR5 environment, and
 > >/stand,/boot,/proc, and friends didn't work.   A pure OSR5
 > >install fixed those problems.
 >
 >Yep, agreed !
 >
 > >
 > >
 > >Based on things I had not work - such as BackupEdge for OSR5 not
 > >running on the upgraded version - I'd suggest a re-install.
 > >
 > >Since the OSR5 is more like V.4 than V.3 in many ways there are
 > >probably more unexpected suprises.
 >
 >This is exactly what I suggested to him, after sending the notes
 >that Bela kindly posted. I'm still waiting for a feedback from him
 >(which, as custom here in Italy, probably will never reach me ;-).
 >
 >Thanks !
 >

Hi !

Just a quick followup for those still interesting about
this one.

Yesterday I've got a phone call from the customer and
he confirmed to me that everything we asked him to
check was perfectly normal. I suggested to him to
comment out the part of the idvidi script which deals
with the vidimaster creation; he did it but he got
some new additional messages, regarding some
undefined symbols declared in Driver.o under
/etc/conf/pack.d/cn & /etc/conf/pack.d/evld.

I helped him checking the contents of these dirs
and we discovered that there were a lot of temp files
left on these dirs, and a couple of missing/misconfigured
files (space.c, class.h and so on).

Today I emailed him with the contents of these dirs
(I extracted the files from a running 5.0.2 version)
asking him to replace the original files with the ones
contained in the Email itself.

I think he will be able to conduct this test this afternoon
so I'll keep you posted about it in the next few days.

I also suggested to him to reinstall the OS itself from scratch
since, how stated in my previous message, the 'mkdev mouse'
script thinks the link kit is screwed (and so I do) so probably
there's something messed up with his system.

Thanks for your time !

Best,
Roberto

--
---------------------------------------------------------------------          

Strhold Sistemi EDP                                                            
Reggio Emilia      ITALY                                                        
---------------------------------------------------------------------          
"Has anybody seen an aircraft carrier around ?"                                
        (Pete "Maverick" Mitchell - Top Gun)                                    
---------------------------------------------------------------------

 
 
 

1. "sar -f <file>" fails with "sar: malloc failed"

I'm trying to run sar by specifying a specific data file:

% ls -l xac
-rw-rw-rw-   1 staff   staff      4218553 Aug  4 10:56 xac

% sar -f xac

SunOS stsun1 5.5.1 Generic_103640-08 sun4u    01/14/70

16:24:50    %usr    %sys    %wio   %idle
21:01:36        unix restarts
sar: malloc failed
Resource temporarily unavailable

Looking at a truss output of this, sar is trying to allocate 765Mb.

15961:  brk(0x0002A220)                                 = 0
15961:  brk(0x2FD28220)      Err#11 EAGAIN

Any ideas why sar is doing this?

2. ftp not working...

3. GETSERVBYNAME()????????????????????"""""""""""""

4. fsck question

5. failed "Read Cd/Dvd Capacity";failed "Prevent/Allow Medium Removal"

6. Telnet hung up: help

7. Recursive grep?

8. failed "Prevent/Allow Medium Removal" ;failed "Read Cd/Dvd Capacity"

9. """"""""My SoundBlast 16 pnp isn't up yet""""""""""""

10. question obout "vmstat" and "sar -r" "swap -l"

11. "sar" and "vmstat" gives allocation error

12. Problems reading "small" CD-R under SCO OpenServer V5.0.2