Follow Up: Irix 6.2 Hangs On Console Login ("amd" on Irix 6.2)

Follow Up: Irix 6.2 Hangs On Console Login ("amd" on Irix 6.2)

Post by D. Gerasimat » Sun, 14 Dec 1997 04:00:00



I have done some more digging and I have discovered that this same problem
manifests itself even on machines local to the NIS server. What is
happening is that somehow the automount daemon "amd" is dying at about
the time "clogin -f" and "Xlogin" are being run. So, of course the login
hangs because a home directory cannot be found. "amd" *never* dies if I
login over the network using "rsh" or "telnet" so I have to imagine this a bug
in "clogin" or similar. Anyone have any ideas?

Is anyone else out there running "amd" on Irix 6.2?

Dimitri

 
 
 

Follow Up: Irix 6.2 Hangs On Console Login ("amd" on Irix 6.2)

Post by Mike O'Conno » Mon, 15 Dec 1997 04:00:00



:I have done some more digging and I have discovered that this same problem
:manifests itself even on machines local to the NIS server. What is
:happening is that somehow the automount daemon "amd" is dying at about
:the time "clogin -f" and "Xlogin" are being run. So, of course the login
:hangs because a home directory cannot be found. "amd" *never* dies if I
:login over the network using "rsh" or "telnet" so I have to imagine this a bug
:in "clogin" or similar. Anyone have any ideas?
:
:
:Is anyone else out there running "amd" on Irix 6.2?

I've run amd briefly with IRIX 6.1/6.2.  I was in a "generic" xdm
environment, however.  You will want to run par on amd and see what
it's doing when it dies.  Does it happen with a particular user or
all user IDs?  

--

 InterNIC WHOIS: MJO | (has my PGP & Geek Code info) | Phone: +1 248-848-4481

 
 
 

Follow Up: Irix 6.2 Hangs On Console Login ("amd" on Irix 6.2)

Post by D. Gerasimat » Wed, 17 Dec 1997 04:00:00




Quote:

>I've run amd briefly with IRIX 6.1/6.2.  I was in a "generic" xdm
>environment, however.  You will want to run par on amd and see what
>it's doing when it dies.  Does it happen with a particular user or
>all user IDs?  

It happens with all users that are not local to the system. Local users
don't have this problem. As I mentioned, it doesn't always happen. It
happens about 90% of the time, though. And it only happens when logging
in on console. I will try to see if the problem persists with
"visuallogin" off and I will use "par".

Dimitri

 
 
 

Follow Up: Irix 6.2 Hangs On Console Login ("amd" on Irix 6.2)

Post by H. Todd Chapm » Thu, 18 Dec 1997 04:00:00


Amd on irix always gave me problems of this sort. Now I use the Irix
Autofs stuff which gives me less severe problems.

-Todd

 
 
 

Follow Up: Irix 6.2 Hangs On Console Login ("amd" on Irix 6.2)

Post by ptgrun.. » Sat, 27 Dec 1997 04:00:00




> I have done some more digging and I have discovered that this same problem
> manifests itself even on machines local to the NIS server. What is
> happening is that somehow the automount daemon "amd" is dying at about
> the time "clogin -f" and "Xlogin" are being run. So, of course the login
> hangs because a home directory cannot be found. "amd" *never* dies if I
> login over the network using "rsh" or "telnet" so I have to imagine this
a bug
> in "clogin" or similar. Anyone have any ideas?

> Is anyone else out there running "amd" on Irix 6.2?

> Dimitri

I have a *possibly* related problem which started when I loaded 6.2 (or is
it 6.3) on my INDIGO2 boxes. When you log in to an account via clogin, the
first window to come up is the clogin window again. You may then clogin
again or open a shell and work maybe, but you can't log out. And somtimes,
the system will lock up after all the logging in, and then logging out.

The problem went away for exiting account on one INDIGO2 when I converted from
EFS to XFS; the other system continued to have problems. And, when I create new
accounts on the "fixed" system, the problem returns for those accounts.

I opened a service call, got one phone call in return. The Engineer on the
SGI knew after trying a of couple things that the problem persisted. The
call may still be open after many months. But SGI Support Service is not
what it used to be. Sounds like a bug that SGI doesn't want to talk about.

I can usually handle the problem myself, but I have some very
inexperienced users. They don't know what to do--sometimes I don't, so I
just reboot.

Can anybody help?

Thanks in advance,
Peter

--
Consciousness -- those annoying periods between naps.

 
 
 

Follow Up: Irix 6.2 Hangs On Console Login ("amd" on Irix 6.2)

Post by J. Manuel Urruti » Tue, 30 Dec 1997 04:00:00



SNIP...
> I have a *possibly* related problem which started when I loaded 6.2 (or is
> it 6.3) on my INDIGO2 boxes. When you log in to an account via clogin, the
> first window to come up is the clogin window again. You may then clogin
> again or open a shell and work maybe, but you can't log out. And somtimes,
> the system will lock up after all the logging in, and then logging out.

> The problem went away for exiting account on one INDIGO2 when I converted from
> EFS to XFS; the other system continued to have problems. And, when I create new
> accounts on the "fixed" system, the problem returns for those accounts.

> I opened a service call, got one phone call in return. The Engineer on the
> SGI knew after trying a of couple things that the problem persisted. The
> call may still be open after many months. But SGI Support Service is not
> what it used to be. Sounds like a bug that SGI doesn't want to talk about.

There have been persistent reports of SGIs locking up after clogin, all
of them with various versions of IRIX (mine is till locked up, and it is
running 5.3! To top it off, the system has been stable for years). Of
all the posts found in DejaNews and the SGI archive, not one of them was
followed up by the residents sages of these newsgroups.

It is clear that they are under an oath of silence.

I got it.  It is all part of a diabolical scheme by Pinky and the Brain
to take over the world!!

But seriously, folks, are we all caught in a sweep of mass hysteria? Are
our problems related (so much fiverse HW and SW)? Is there some hacker
out there that managed to find a heretofore unknown vulnerability and is
now exploiting it?

Please, any and all comments are welcome. This is driving me nuts
because my only way out will be to restore IRIX and hope for the best.

--
* J. Manuel Urrutia                      |     En tierra de ciegos,   *

 
 
 

Follow Up: Irix 6.2 Hangs On Console Login ("amd" on Irix 6.2)

Post by Dave Ols » Wed, 31 Dec 1997 04:00:00



| There have been persistent reports of SGIs locking up after clogin, all
| of them with various versions of IRIX (mine is till locked up, and it is
| running 5.3! To top it off, the system has been stable for years). Of
| all the posts found in DejaNews and the SGI archive, not one of them was
| followed up by the residents sages of these newsgroups.

"locking up" is such a poorly defined term...

I'm not aware of any general problem in this area.
--

Dave Olson, Silicon Graphics

 
 
 

Follow Up: Irix 6.2 Hangs On Console Login ("amd" on Irix 6.2)

Post by Jean-Francois Panisse » Wed, 31 Dec 1997 04:00:00




> | There have been persistent reports of SGIs locking up after clogin, all
> | of them with various versions of IRIX (mine is till locked up, and it is
> | running 5.3! To top it off, the system has been stable for years). Of
> | all the posts found in DejaNews and the SGI archive, not one of them was
> | followed up by the residents sages of these newsgroups.

> "locking up" is such a poorly defined term...

> I'm not aware of any general problem in this area.
> --

> Dave Olson, Silicon Graphics


At least on Onyx/IR IRIX 6.2, there is (was?) a problem where you
login from clogin, but 4Dwm fails to start: all you
get is a single shell window without any wm decorations. You
can type stuff into that window, but the only way to get back
control of the machine is to reset it, since shutdown/reboot
never actually completes(well, "uadmin 2 1" does the
trick...). From what I have been told, this is due to some race
condition between the X server and the console driver (does this
make any sense?).
In /var/X11/xdm/Xsession{.dt}, if you replace redirection
of standard error and standard output of applications being started
from /dev/console to some "real" file in /tmp, the problem goes away.
I haven't seen this problem in a while, although I don't know
if this was actually fixed in a patch or I just got lucky.

JF

my email address is PANISSET and i work AT a DISCREET dot COMpany

 
 
 

Follow Up: Irix 6.2 Hangs On Console Login ("amd" on Irix 6.2)

Post by J. Manuel Urruti » Wed, 31 Dec 1997 04:00:00




> | There have been persistent reports of SGIs locking up after clogin, all
> | of them with various versions of IRIX (mine is till locked up, and it is
> | running 5.3! To top it off, the system has been stable for years). Of
> | all the posts found in DejaNews and the SGI archive, not one of them was
> | followed up by the residents sages of these newsgroups.

> "locking up" is such a poorly defined term...

Forgive me, but after reading the posts on this subject I have gotten
into the habit of assuming that the term would be self-explanatory as in
"machine looks fine but I can't get any input into it via the keyboard
or mouse," that is, it has "locked me out!"

My problem is that something got corrupted, most likely in the Indigo
Magic environment, when I was detaching a Word document from an e-mail
via zmail. Initially, I could highlight items on the zmail GUI window
but would not get any of the drop-down menus. Simultaneously, no input
from the keyboard or mouse clicks would have any effect on any other
windows (again, the feeling is one of being locked out of a house, able
to look through the windows but no more).

I rebooted the 4D310-GTX hoping that the problem would dissapear but it
did not. I have gone through many reboot cycles and the problem is still
there for any user, including root or any newly created user. Again, the
problem is that after entering the machine through "clogin" (all the
time accepting keyboard and mouse input) and after the desktop is
established (toolbar drawn, icons for all windows, directories, and
applications placed, and desktop icons drawn), the machine ignores all
keyboard and mouse input, with the exception of the Vulcan Death Grip
(left-shift + left-crtl + F12 + num-keypad-\) and that the mouse is free
to roam the desktop, changing color as it alights on open winterms (red
to yellow in our case), and changing the highlight of the winterms as it
goes over them (light gray to light beige).

I am in the process of painstakingly going over each script that gets
used during the login/desktop establishment in the hope of finding out
the corrupt file (at least I think that is the problem since clogin
works). Since this is something that I am not familiar at all, I would
appreciate any pointers that you or anyone else could contribute to help
solve the puzzle. Likewise, a list of the steps taken by IRIX to take a
user from clogin to actual work (e.g, checking directories, launching
applications, etc.) would be most helpful.

There have been reports (found by browsing SGI's archive and this list
lately) that similar problems have cropped up under a variety of HW and
SW. Lucky for them, they have gone away after rebooting. But mine is
there to stay. I suppose that I could simply reinstall the OS but that
would not solve the mystery. And everybody loves a good puzzle, eh?

TIA

--
* J. Manuel Urrutia                      |     En tierra de ciegos,   *

 
 
 

Follow Up: Irix 6.2 Hangs On Console Login ("amd" on Irix 6.2)

Post by Dave Ols » Wed, 31 Dec 1997 04:00:00


| At least on Onyx/IR IRIX 6.2, there is (was?) a problem where you
| login from clogin, but 4Dwm fails to start: all you
| get is a single shell window without any wm decorations. You
| can type stuff into that window, but the only way to get back
| control of the machine is to reset it, since shutdown/reboot
| never actually completes(well, "uadmin 2 1" does the
| trick...). From what I have been told, this is due to some race
| condition between the X server and the console driver (does this
| make any sense?).
| In /var/X11/xdm/Xsession{.dt}, if you replace redirection
| of standard error and standard output of applications being started
| from /dev/console to some "real" file in /tmp, the problem goes away.
| I haven't seen this problem in a while, although I don't know
| if this was actually fixed in a patch or I just got lucky.

Yes, I do remember this one.  But that's not what most people would
call a "hang", although some might.  As I recall, this was indeed a
a timing-related bug in the "pseudo-console" code in the kernel that
could only happen on MP systems, and the workaround did avoid it.  I
don't remember if we fixed this in the patches or not.
--

Dave Olson, Silicon Graphics

 
 
 

Follow Up: Irix 6.2 Hangs On Console Login ("amd" on Irix 6.2)

Post by Dave Ols » Wed, 31 Dec 1997 04:00:00



| > "locking up" is such a poorly defined term...
|
| Forgive me, but after reading the posts on this subject I have gotten
| into the habit of assuming that the term would be self-explanatory as in
| "machine looks fine but I can't get any input into it via the keyboard
| or mouse," that is, it has "locked me out!"

The question here is what do you see when you login over the network?
Can you pop up a new window?  What does a par on the X server show (anything,
or nothing)?  If you start killing graphics programs one by one, do things
start working?  Does the vulcan death grip work?  If you turn off nfs,
does the problem disappear?  (I know in this specific case you said the
vulcan death grip works.)

The symptoms could be a graphics or an X bug, or simply a program
grabbing the X server, or the input focus, and that program being buggy
(possibly including something that we ship).

In my narrow view of the world, I'd call this a "graphics hang", or
the like, since the system is probably still alive.  On the other hand,
if you can't get to it over the net, then I'd say the whole system
is hung (although I'd always like a serial console check in those cases,
when possible).
--

Dave Olson, Silicon Graphics

 
 
 

Follow Up: Irix 6.2 Hangs On Console Login ("amd" on Irix 6.2)

Post by J. Manuel Urruti » Wed, 31 Dec 1997 04:00:00




> | > "locking up" is such a poorly defined term...
> |
> | Forgive me, but after reading the posts on this subject I have gotten
> | into the habit of assuming that the term would be self-explanatory as in
> | "machine looks fine but I can't get any input into it via the keyboard
> | or mouse," that is, it has "locked me out!"

Much thanks for your reply. Indeed, there are many things I left out of
the message you quote, maily for lack of space and for not wanting to
provide too many details at once (even though the Devil _is_ in those
details). Interleaved with your thoughtful reply are my answers to your
questions:

Quote:> The question here is what do you see when you login over the network?

I can login over the network. I can start any application in a telnet
window that pops an X window on my remote console (i.e., jot, showcase,
imageworks, etc.). However, because I had edited my X scripts (those
that reside in /var/X11/xdm) so that no external host had access to my
console ("/usr/bin/X11/xhost -"), I could not remotely start any X
windows. A few days ago, I changed that and I, accidentally, started a
remote session while wrongly trying to execute the command

showfiles -c -m -s | tar cvf "filename"

which, incidentally, did not work as it produced a 0 length file (why?).
So, yes, I can pop an X-window in my remote display.

Quote:> Can you pop up a new window?  What does a par on the X server show (anything,
> or nothing)?  If you start killing graphics programs one by one, do things

It produces all kinds of output which I don't understand. I invoked

par -s -SS Xsgi

and the first output lines are:

    0mS was sent signal SIGUSR1
    2mS END-pause() errno = 4 (Interrupted system call)
    2mS received signal SIGUSR1
    2mS sigreturn(0x7fffaa68) OK
    3mS execve(/usr/sbin/Xsgi, 0x7fffaf10, 0x7fffaf18) errno = 2 (No
such file or directory)
    7mS execve(/usr/bsd/Xsgi, 0x7fffaf10, 0x7fffaf18) errno = 2 (No such
file or directory)
.
.
etc....

I hope that you had this in mind.

Quote:> start working?  Does the vulcan death grip work?  If you turn off nfs,

There are no graphics programs working at this time (only MediaMail, but
whether it runs or not, does not make any difference to the graphics
hang problem). As for turning nfs off, I did not think of it since
nobody else is using the machine and the nsf mounted disks are not being
accessed (nfs is on because our last graduate student set it up).

Quote:> does the problem disappear?  (I know in this specific case you said the
> vulcan death grip works.)

> The symptoms could be a graphics or an X bug, or simply a program
> grabbing the X server, or the input focus, and that program being buggy
> (possibly including something that we ship).

As I said elsewhere, the problem came out of left field since the system
had been running stably for years (for sure since we upgraded to 5.3).
The only graphics program running locally at the onset of the problem
was MediaMail. The system was also being used to display a remotely-ran
Netscape (runs faster in our Indigo^2).  This configuration had been
used for months if not years. The only possibility of a problem with our
graphics system has been a message that shows up in the SYSLOG after the
graphics system is restarted (either via a reboot or by stop/ster gx)
for quite some time now (years?) that says:

Dec 29 11:59:37 2A:hobbes unix:
Dec 29 11:59:37 2A:hobbes unix: gm-2 (configured for IP7/9) $
Dec 29 11:59:37 2A:hobbes unix:
Dec 29 11:59:37 2A:hobbes unix: Warning: EEprom Vof is out of date.
Dec 29 11:59:37 2A:hobbes unix: Warning: Reloading EEprom with default
60Hz Vof.
Dec 29 11:59:37 2A:hobbes unix: DEBUG_NOISE at 0x9806F116

But the system has always performed acceptably until now. Of course,
there is the fact that clogin works properly (I tested xdm by renaming
it, and of course, the clogin screen never showed up until I restored,
via a telnet session, xdm's original name) and takes mouse and keyboard
commands.

Since MediaMail shows up very reliably on a remote system, I don't think
that it is the program itself that is buggy.

There is one added piece of evidence that suggests that a daemon is
going to sleep or that a configuration file is corrupt: while the
desktop is coming up, I am sometimes able to gain focus of an open
winterm by clicking on it while things are coming up. Thereafter, the
winterm will take keyboard input but will not obey any of the X
commands, either via mouse clicks (no menu pops if the left-upper button
is clicked, does not iconify or expand if the corresponding buttons are
pressed) or keyboard shorcuts (e.g., ALT-F9 does not cause the window to
iconify). If I don't do it at the right time, I can't access any
windows. So, which daemon or script is messed up?

Quote:> In my narrow view of the world, I'd call this a "graphics hang", or
> the like, since the system is probably still alive.  On the other hand,

Correct. The system is very much alive. My apologies for not being
sufficiently precise.

Quote:> if you can't get to it over the net, then I'd say the whole system
> is hung (although I'd always like a serial console check in those cases,
> when possible).

Point taken. Again, thank you very much for your interest.

--
* J. Manuel Urrutia                      |     En tierra de ciegos,   *

 
 
 

Follow Up: Irix 6.2 Hangs On Console Login ("amd" on Irix 6.2)

Post by ptgrun.. » Wed, 31 Dec 1997 04:00:00




> There have been reports (found by browsing SGI's archive and this list
> lately) that similar problems have cropped up under a variety of HW and
> SW. Lucky for them, they have gone away after rebooting. But mine is
> there to stay. I suppose that I could simply reinstall the OS but that
> would not solve the mystery. And everybody loves a good puzzle, eh?

> TIA

> --
> * J. Manuel Urrutia                      |     En tierra de ciegos,   *


I've reinstalled the OS, a couple times. Did not help me. Changing over to
the new type filesystem (XFS) helped on one of two of my INDIGO2s for
existing
accounts. But, when I create new accounts on that system, those accounts
have the same old problem.

Pete

--
Consciousness -- those annoying periods between naps.

 
 
 

1. "couldn't spawn child process error" on Apache 1.2/IRIX 6.2

Hey there.  I've got Apache 1.2 running on an IRIX 6.2 machine and am having
trouble configuring it to handle CGI's effectively.  A (relatively) small
percentage of the scripts error out with the error "access to <filename>
failed for <hostname>, reason: couldn't spawn child process".  Each script
functions properly most of the time, ie. it's not a coding problem.  I
foraged around, and adjusted such directives as RLimitNPROC, but to no
avail.  Has anyone else seen this behavior and solved it?

Thanks in advance.

Aaron

2. UDP question

3. IRIX 6.5.3 - shell changes from IRIX 6.2?

4. Can't get NCNT of a semaphore on SunOS 5.3

5. Binary port Irix 6.2 to Irix 6.4

6. Problems with serial and paralell ports on compaq presario

7. Apache & IRIX 6.2 => Re: what happened to somaxconn in 6.2?

8. short vi question

9. IRIX 6.2 and Netscape...

10. Compiling OPIE under Irix 6.2

11. linking trouble on IRIX 6.2

12. less for IRIX 6.2

13. gdb and Irix 6.2