Bus errors -- how do I debug?

Bus errors -- how do I debug?

Post by Edward Wa » Tue, 17 Oct 1995 04:00:00



I've been getting bus errors on an application I'm developing on a Sun
Sparc running Solaris 2.4.  I was wondering what causes them and what's
the best way to root them out?

                                                               Thanks,
                                                               Edward

 
 
 

Bus errors -- how do I debug?

Post by Jeff Dicks » Wed, 18 Oct 1995 04:00:00



>I've been getting bus errors on an application I'm developing on a Sun
>Sparc running Solaris 2.4.  I was wondering what causes them and what's
>the best way to root them out?

Bus errors are caused by mis-alligned memory references. As a basic rule of
thumb I ensure that all integers, longs, and floats are on longword (32 bit)
boundries. For example the follwing C program will generate a bus error if
the address passed to the initint() routine to store an integer in is not
on a longword boundry. If it is, however, then everything is hunky-dory.

#include <stdio.h>

main() {

        char intspc[16];

        initint(&intspc[5]); /* causes a bus error */
/*      initint(&intspc[4]); /* doesn't */

Quote:}

initint(i) int *i; {

    *i = -1;

Quote:}

You don't have to worry about the compiler not observing alignment restrict-
ions when it sets aside space for variables you declare. Likewise, memory allo-
cated with malloc, for instance, is always properly alligned.

As far as the best way to root them out...go over your code with a fine tooth
comb?

Jeff S.*son


 
 
 

Bus errors -- how do I debug?

Post by Guy Harr » Wed, 18 Oct 1995 04:00:00



>At least on the Sun platform, the two most possible reasons for a bus
>error are:
>    1. Unaligned access to a memory location, e.g fetching a long
>            from an odd address.
>    2. The system detected a changed binary whilst trying to page in
>            parts of your process.
>These are the two reasons I discovered to be the cause of a bus error,
>Sun itself doesn't seem to document this.

Well, it's not directly stated in the man page, but it is implied, to
some degree, by SIGVEC(2) in SunOS 4.x and siginfo(5) in SunOS 5.x:

        NAME
             sigvec - software signal facilities

                ...

        CODES
             The following defines the codes for  signals  which  produce
             them.  All of these symbols are defined in signal.h:

                ...

               Hardware bus error                 SIGBUS  BUS_HWERR
               Address alignment error            SIGBUS  BUS_ALIGN
               No mapping fault                   SIGSEGV SEGV_NOMAP
               Protection fault                   SIGSEGV SEGV_PROT
               Object error                       SIGSEGV SEGV_CODE(code)=SEGV_OBJERR
               Object error number                SIGSEGV SEGV_ERRNO(code)

and

        NAME
             siginfo - signal generation information

                ...

          System Signals
             Otherwise, si_code contains a positive value reflecting  the
             reason why the system generated the signal:

                ...

             SIGSEGV  SEGV_MAPERR    address not mapped to object
                      SEGV_ACCERR    invalid permissions for mapped object

             SIGBUS   BUS_ADRALN     invalid address alignment
                      BUS_ADRERR     non-existent physical address
                      BUS_OBJERR     object specific hardware error

I'm not sure whether the changed-binary SIGBUS is due to getting code
changed out from under a running program or not.

 
 
 

Bus errors -- how do I debug?

Post by Bart D'hoo » Thu, 19 Oct 1995 04:00:00



>|> I've been getting bus errors on an application I'm developing on a Sun
>|> Sparc running Solaris 2.4.  I was wondering what causes them and what's
>|> the best way to root them out?
>|>

We use a tool called "Purify" from "PureSoftware, 1309 S. Mary Avenue,
Sunnyvale, CA 944087" to detect all kinds of memory allocation and
initialization problems. This seemed to be the best way to deal with Bus Error
problems.

Bart.

 
 
 

Bus errors -- how do I debug?

Post by Michael Gamba » Fri, 20 Oct 1995 04:00:00


    I recently received a sort of confirmation that I will be taking
over a spot within my company as a system administrator on a Solaris
system.  I know fair amount of unix but not enough to be comfortable at
the command line on most commands.  I'll be going through some
extensive training.  

    I'm in search of two books that will help me with all of the
details about what a system administrator needs to know.  The first
book I'm looking for is basically an extensive overview of the UNIX
System.  By overview I mean summerizing topics like the file
system(inodes, etc), process concepts, security(file permissions, user
account information, etc.), device drivers, Standard Input/Output, and
Networking.  The second book I'm looking for is a more in depth look at
those topics especially when they concern system administrator tasks.
Both Books I'm looking for are for System V, preferably the current
version of Solaris.  I have several books(6-7) on UNIX.  The one that
has helped me the most is Essential System Administration from O'Reily.
 I'd like to read that kind of book but in more detail about System
V(Solaris) and I don't want to  have to use the man pages unless it is
really neccessary.  Thank you for your suggestions.  By the way, How
beneficial would it be to get the Solaris for x86.  I've heard that the
two(Intel and RISC) will eventually be the same product on two
different platforms.  I have a Linux coy and had planned on getting to
know that but paying special attention to the differences between Linux
and System V.

 
 
 

Bus errors -- how do I debug?

Post by Greg Herle » Wed, 25 Oct 1995 04:00:00


<snip>
: At least on the Sun platform, the two most possible reasons for a bus
: error are:
:       1. Unaligned access to a memory location, e.g fetching a long
:               from an odd address.
:       2. The system detected a changed binary whilst trying to page in
:               parts of your process.
<snip>

Hmmm.  I wonder if some guru of SunOS would comment:  I have an application
that works fine compiled under linux (1.2.3) running on PC harware, but
core dumps with a bus error under SunOS 4.1.4.  I am explicitely using the
gcc specific "__attribute__ ((packed))" on some data structures, since they
originated on DOS (and thus are byte aligned).  

I got pulled away from my porting effort to fight some other fires, but
I suspect that the Sun hardware is detecting my attempt to read on
a non-standard aligned address and dumping core.

Comments?  Any gurus seen this before?  A virtual beer to whoever
points me towards a solution (hell, a real beer if you are somewhere in
the local SF Bay Area!).

Greg Herlein

--

 
 
 

Bus errors -- how do I debug?

Post by Casper H.S. Dik - Network Security Engine » Thu, 26 Oct 1995 04:00:00



>Hmmm.  I wonder if some guru of SunOS would comment:  I have an application
>that works fine compiled under linux (1.2.3) running on PC harware, but
>core dumps with a bus error under SunOS 4.1.4.  I am explicitely using the
>gcc specific "__attribute__ ((packed))" on some data structures, since they
>originated on DOS (and thus are byte aligned).  

That's to be expected.  Gcc should do the right thing (loading, etc)
but when you pass such pointers to libc, stuff will go wrong.
(If it crashes in gcc generated code, gcc needs to be fixed)

Of course, you can also start your program with:

        asm("ta  0x6");  (software trap ST_FIX_ALIGN, tells kernel to
                        fix up alignment)

Casper
--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.