System time warping around real time problem - please help

System time warping around real time problem - please help

Post by Richard B. Johnso » Wed, 26 Mar 2003 19:10:15




> Hello all,

> I have got an increasingly annoying problem with our fairly new (fall
> '02) Dual Athlon2k+ Gigabyte 7dpxdw linux system running 2.4.20.
> The only kernel patch applied is Alan Cox's ptrace patch.

I am using the exact same kernel (a lot of folks are). There
is no such jumping on my system.
Try this program:

#include <stdio.h>
#include <time.h>
int main() {
   time_t x,y;
   (void)time(&x);
   (void)time(&y);
   for(;;) {
       (void)time(&x);
       if(x < y)
           printf("Prev %ld New %ld\n", y, x);
       y = x;
   }
   return 0;

Quote:}

If this shows time jumping around you have one of either:

(1)     Bad timer channel 0 chip (PIT).
(2)     Some daemon trying to sync time with another system.
(3)     You are traveling too close to the speed of light.

Now, your script shows time in fractional seconds.

Quote:> 1048608745.61 > 1048608745.60

You can modify the program to do this:

#include <stdio.h>
#include <sys/time.h>
int main() {
   struct timeval tv;
   double x, y;
   (void)gettimeofday(&tv, NULL);
   x = (double) tv.tv_sec * 1e6;
   x += (double) tv.tv_usec;
   y = x;
   for(;;) {
       (void)gettimeofday(&tv, NULL);
       x = (double) tv.tv_sec * 1e6;
       x += (double) tv.tv_usec;
       if(x < y)
           printf("Prev %f New %f\n", y, x);
       y = x;
   }
   return 0;

Quote:}

There should be no jumping around -- and there isn't on
any system I've tested this on.

Quote:> Software crashes are regularly - naturally. No programmer expects system
> timers going back in time.

Hmmm, software should never crash. Even if the timers jump backwards
as you say, they should eventually time-out. If you have crashes, this
may point to other hardware problems as well.

Cheers,
* Johnson
Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://www.veryComputer.com/
Please read the FAQ at  http://www.veryComputer.com/

 
 
 

System time warping around real time problem - please help

Post by Tim Schmiela » Wed, 26 Mar 2003 19:20:11




> > I have got an increasingly annoying problem with our fairly new (fall
> > '02) Dual Athlon2k+ Gigabyte 7dpxdw linux system running 2.4.20.
> > The only kernel patch applied is Alan Cox's ptrace patch.

[...]
> If this shows time jumping around you have one of either:

> (1)        Bad timer channel 0 chip (PIT).
> (2)        Some daemon trying to sync time with another system.
> (3)        You are traveling too close to the speed of light.

(4) Unsync'ed TSCs?

See help text for CONFIG_X86_TSC_DISABLE. Never had this problem
myself, though.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

System time warping around real time problem - please help

Post by Fionn Behren » Wed, 26 Mar 2003 20:20:10




> > I have got an increasingly annoying problem with our fairly new (fall
> > '02) Dual Athlon2k+ Gigabyte 7dpxdw linux system running 2.4.20.
> I am using the exact same kernel (a lot of folks are). There
> is no such jumping on my system.
> Try this program:

[... prg1.c ...]

Quote:> If this shows time jumping around you have one of either:

> (1)        Bad timer channel 0 chip (PIT).
> (2)        Some daemon trying to sync time with another system.
> (3)        You are traveling too close to the speed of light.

It just exits immediately with exit code 1. (*shrug*)

Quote:> Now, your script shows time in fractional seconds.

> > 1048608745.61 > 1048608745.60

> You can modify the program to do this:

[... prg2.c ...]

Quote:> There should be no jumping around -- and there isn't on
> any system I've tested this on.

When I run this code it begins to put out Prev N New M lines.

Prev 1048615862810879.000000 New 1048615862759879.000000
Prev 1048615862870879.000000 New 1048615862819878.000000
Prev 1048615862900879.000000 New 1048615862849902.000000
Prev 1048615862960882.000000 New 1048615862909875.000000
[-------- cut --------]

After a few seconds of run time random processes on my machine begin to
crash, or I get kernel oopses and kernel freezes. Looks very much like
heavy use of gettimeofday() causes random writes in system memory.

Quote:> > Software crashes are regularly - naturally. No programmer expects system
> > timers going back in time.
> Hmmm, software should never crash. Even if the timers jump backwards
> as you say, they should eventually time-out. If you have crashes, this
> may point to other hardware problems as well.

E.g. which type of hardware problem?

Thanks a million for your help so far, it is great to experience how
fast people are respoding!

I'll evaluate that other suggestion about TSC_DISABLE now and will get
back to you as soon as I can tell you more.

Kind regards,
                F. Behrens
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

System time warping around real time problem - please help

Post by Richard B. Johnso » Wed, 26 Mar 2003 20:40:12





> > > I have got an increasingly annoying problem with our fairly new (fall
> > > '02) Dual Athlon2k+ Gigabyte 7dpxdw linux system running 2.4.20.

> > I am using the exact same kernel (a lot of folks are). There
> > is no such jumping on my system.
> > Try this program:

> [... prg1.c ...]

> > If this shows time jumping around you have one of either:

> > (1)   Bad timer channel 0 chip (PIT).
> > (2)   Some daemon trying to sync time with another system.
> > (3)   You are traveling too close to the speed of light.

> It just exits immediately with exit code 1. (*shrug*)

Hmmm. Note that the for(;;) { } provides no exit path.
So, you probably have some bad RAM or your CPU is too
hot (broken fan??), or something like that.

- Show quoted text -

Quote:> > Now, your script shows time in fractional seconds.

> > > 1048608745.61 > 1048608745.60

> > You can modify the program to do this:

> [... prg2.c ...]

> > There should be no jumping around -- and there isn't on
> > any system I've tested this on.

> When I run this code it begins to put out Prev N New M lines.

> Prev 1048615862810879.000000 New 1048615862759879.000000
> Prev 1048615862870879.000000 New 1048615862819878.000000
> Prev 1048615862900879.000000 New 1048615862849902.000000
> Prev 1048615862960882.000000 New 1048615862909875.000000
> [-------- cut --------]

> After a few seconds of run time random processes on my machine begin to
> crash, or I get kernel oopses and kernel freezes. Looks very much like
> heavy use of gettimeofday() causes random writes in system memory.

Looks very much like you have a real bad hardware problem.

Quote:

> E.g. which type of hardware problem?

Look inside and see if your CPU fan has stopped. Also move your RAM
sticks around after wiping any dirt off the contacts. Since the
machine used to work last fall, It's probably just a FAN or RAM
problems.

Quote:> Thanks a million for your help so far, it is great to experience how
> fast people are respoding!

> I'll evaluate that other suggestion about TSC_DISABLE now and will get
> back to you as soon as I can tell you more.

I doubt that this will help you, but it's worth trying.

Cheers,
* Johnson
Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://www.veryComputer.com/
Please read the FAQ at  http://www.veryComputer.com/

 
 
 

System time warping around real time problem - please help

Post by Fionn Behren » Wed, 26 Mar 2003 23:20:18






> > > > I have got an increasingly annoying problem with our fairly new
> > > > (fall '02) Dual Athlon2k+ Gigabyte 7dpxdw linux system running
> > > > 2.4.20.

> > > I am using the exact same kernel (a lot of folks are). There
> > > is no such jumping on my system.
> > > Try this program:

> > [... prg1.c ...]

> > > If this shows time jumping around you have one of either:

> > > (1) Bad timer channel 0 chip (PIT).
> > > (2) Some daemon trying to sync time with another system.
> > > (3) You are traveling too close to the speed of light.

> > It just exits immediately with exit code 1. (*shrug*)
> Hmmm. Note that the for(;;) { } provides no exit path.

I noticed that well and investigated the issue using ddd. Funnily enough
the program runs well in ddd until X crashes. But in the shell it still
behaves like it would be nothing but exit(1);

Quote:> So, you probably have some bad RAM or your CPU is too
> hot (broken fan??), or something like that.

None of the above. The system is liquid cooled and subject to contiuous
thermal monitoring. The RAM is 1GB Infineon ECC. Before the weekend I
had the machine running overnight with memtest86 - 14 hours, all tests
activated. Not a single error.
I also tried an endless kernel compile loop the other day and the
machine compiled about 100 kernels in approx two hours without a hitch.

Quote:> > [... prg2.c ...]

> > When I run this code it begins to put out Prev N New M lines.
> > Prev 1048615862810879.000000 New 1048615862759879.000000
> > After a few seconds of run time random processes on my machine begin
> > to crash, or I get kernel oopses and kernel freezes. Looks very
> > much like heavy use of gettimeofday() causes random writes in system
> > memory.
> Looks very much like you have a real bad hardware problem.

Just what, that is the question. After having activated the notsc
feature the system has not yet exposed the warp symptons but as I noted
in the beginning it may well take a day or two for that to happen.

Yet still, running the first (in ddd) or second test programs - despite
the current absence of any error message - causes random processes to
crash until the program is being stopped (by a crashed terminal, X or
kernel, that is).

Oddly enough, the system runs pretty stable for at least days of normal
use as long as the clock symptoms dont show up (and you dont run those
test programs). Which means it has not crashed a lot recently, just
being rebooted by me because of the jumping clock annoyance which -
among others - results in sluggishly behaving UI components and frequent
short connection freezes in ssh connections.

Quote:> > E.g. which type of hardware problem?
> Since the machine used to work last fall, It's probably just a
> FAN or RAM  problems.

I'll swap the RAM sticks around for now but I suspect its something
else. I just still fail to grasp  how calls to gettimeofday() are able
to cause random writes to memory...

Summary:
       - No apparent hardware issue.
       - System runs stable as long as you dont for (;;) gettimeofday();
       - notsc being evaluated. I will get back to you later.
         Does not resolve the odd test software crash, though.

Kind regards,
                Fionn

P.S.: Please keep sending me a Cc:, I grabbed this one from the archive
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

System time warping around real time problem - please help

Post by george anzinge » Thu, 27 Mar 2003 00:30:14







>>>>>I have got an increasingly annoying problem with our fairly new
>>>>>(fall '02) Dual Athlon2k+ Gigabyte 7dpxdw linux system running
>>>>>2.4.20.

>>>>I am using the exact same kernel (a lot of folks are). There
>>>>is no such jumping on my system.
>>>>Try this program:

>>>[... prg1.c ...]

>>>>If this shows time jumping around you have one of either:

>>>>(1) Bad timer channel 0 chip (PIT).
>>>>(2) Some daemon trying to sync time with another system.
>>>>(3) You are traveling too close to the speed of light.

>>>It just exits immediately with exit code 1. (*shrug*)

>>Hmmm. Note that the for(;;) { } provides no exit path.

> I noticed that well and investigated the issue using ddd. Funnily enough
> the program runs well in ddd until X crashes. But in the shell it still
> behaves like it would be nothing but exit(1);

>>So, you probably have some bad RAM or your CPU is too
>>hot (broken fan??), or something like that.

> None of the above. The system is liquid cooled and subject to contiuous
> thermal monitoring. The RAM is 1GB Infineon ECC. Before the weekend I
> had the machine running overnight with memtest86 - 14 hours, all tests
> activated. Not a single error.
> I also tried an endless kernel compile loop the other day and the
> machine compiled about 100 kernels in approx two hours without a hitch.

>>>[... prg2.c ...]

>>>When I run this code it begins to put out Prev N New M lines.

>>>Prev 1048615862810879.000000 New 1048615862759879.000000

>>>After a few seconds of run time random processes on my machine begin
>>>to crash, or I get kernel oopses and kernel freezes. Looks very
>>>much like heavy use of gettimeofday() causes random writes in system
>>>memory.

>>Looks very much like you have a real bad hardware problem.

> Just what, that is the question. After having activated the notsc
> feature the system has not yet exposed the warp symptons but as I noted
> in the beginning it may well take a day or two for that to happen.

> Yet still, running the first (in ddd) or second test programs - despite
> the current absence of any error message - causes random processes to
> crash until the program is being stopped (by a crashed terminal, X or
> kernel, that is).

> Oddly enough, the system runs pretty stable for at least days of normal
> use as long as the clock symptoms dont show up (and you dont run those
> test programs). Which means it has not crashed a lot recently, just
> being rebooted by me because of the jumping clock annoyance which -
> among others - results in sluggishly behaving UI components and frequent
> short connection freezes in ssh connections.

>>>E.g. which type of hardware problem?

>>Since the machine used to work last fall, It's probably just a
>>FAN or RAM  problems.

> I'll swap the RAM sticks around for now but I suspect its something
> else. I just still fail to grasp  how calls to gettimeofday() are able
> to cause random writes to memory...

> Summary:
>        - No apparent hardware issue.
>        - System runs stable as long as you dont for (;;) gettimeofday();
>        - notsc being evaluated. I will get back to you later.
>          Does not resolve the odd test software crash, though.

> Kind regards,
>            Fionn

> P.S.: Please keep sending me a Cc:, I grabbed this one from the archive
> -

This all sounds very much like the TSCs are drifting WRT each other.
Is it possible that you have some power management code (or hardware)
that is slowing one cpu and not the other?

--

High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

System time warping around real time problem - please help

Post by Fionn Behren » Thu, 27 Mar 2003 01:00:24




> > Summary:
> >        - No apparent hardware issue.
> >        - System runs stable as long as you dont for (;;) gettimeofday();
> >        - notsc being evaluated. I will get back to you later.
> >          Does not resolve the odd test software crash, though.
> This all sounds very much like the TSCs are drifting WRT each other.
> Is it possible that you have some power management code (or hardware)
> that is slowing one cpu and not the other?

Well, I still don't really know what TSCs actually are (or what TSC
stands for).

The only suspect in that case would be the amd76x_pm.o kernel module
which I am admittedly using. It saves about 90Watts of power when the
machine is idle...

I'll check what happens when the system boots without amd76x_pm.
Will report back tomorrow.

Thanks to all for keeping the suggestions going!

Regards,
        F. Behrens
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

System time warping around real time problem - please help

Post by Alan Co » Thu, 27 Mar 2003 01:10:15



> > This all sounds very much like the TSCs are drifting WRT each other.
> > Is it possible that you have some power management code (or hardware)
> > that is slowing one cpu and not the other?

> Well, I still don't really know what TSCs actually are (or what TSC
> stands for).

> The only suspect in that case would be the amd76x_pm.o kernel module
> which I am admittedly using. It saves about 90Watts of power when the
> machine is idle...

If you are using amd76x_pm boot with "notsc", ditto for that matter
on dual athlons with APM or ACPI in some cases. In fact I wish people
would stop using the tsc for clock timing altogether. It simply doesn't
work on a lot of modern systems

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

System time warping around real time problem - please help

Post by george anzinge » Thu, 27 Mar 2003 04:40:09




>>>This all sounds very much like the TSCs are drifting WRT each other.
>>>Is it possible that you have some power management code (or hardware)
>>>that is slowing one cpu and not the other?

>>Well, I still don't really know what TSCs actually are (or what TSC
>>stands for).

Stands for Time Stamp Counter.  It is a special cpu register that
basically counts cpu cycles.  Some times (incorrectly me thinks) it is
affected by power management code which slows the cpu by changing the
cpu frequency.
Quote:

>>The only suspect in that case would be the amd76x_pm.o kernel module
>>which I am admittedly using. It saves about 90Watts of power when the
>>machine is idle...

> If you are using amd76x_pm boot with "notsc", ditto for that matter
> on dual athlons with APM or ACPI in some cases. In fact I wish people
> would stop using the tsc for clock timing altogether. It simply doesn't
> work on a lot of modern systems

I agree, however, what is really needed is not available in x86
machines, i.e. a cpu register that has a fixed and stable count rate.
  An I/O register is second best because of the long time it takes to
read it.

--

High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

System time warping around real time problem - please help

Post by Chris Friese » Thu, 27 Mar 2003 05:20:09



> If you are using amd76x_pm boot with "notsc", ditto for that matter
> on dual athlons with APM or ACPI in some cases. In fact I wish people
> would stop using the tsc for clock timing altogether. It simply doesn't
> work on a lot of modern systems

But its awfully nice for low-impact high-resolution timestamps.

Maybe someday hardware manufacturers will give us a monotonic GHz+ clock that is
synced across all cpus and is cheap to read...

Chris

--
Chris Friesen                    | MailStop: 043/33/F10
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

System time warping around real time problem - please help

Post by Fionn Behren » Thu, 27 Mar 2003 12:50:14




> > > This all sounds very much like the TSCs are drifting WRT each other.
> > > Is it possible that you have some power management code (or hardware)
> > > that is slowing one cpu and not the other?

> > The only suspect in that case would be the amd76x_pm.o kernel module
> > which I am admittedly using. It saves about 90Watts of power when the
> > machine is idle...

> If you are using amd76x_pm boot with "notsc", ditto for that matter
> on dual athlons with APM or ACPI in some cases.

I booted without amd76x_pm today and the problems are gone. I tried
notsc yesterday and dmesg said TSC had been deactivated on both CPUs. No
libc6 problems - debian is using the i386 version by default.
Oddly enough the system still crashed on those two for (;;) time(); test
loops posted earlier in this thread. So the only (unsatisfying) solution
I see for now is to keep the CPUs glowing hot for the sake of stability.

Any idea what else could cause the crashes in the absence of TSC usage?

As a yet unresolved side note I am still unable to execute the first
test program with my default user (immediately exits with retval 1).
Being run as root or as the system test user, the program runs as
expected (including crash with amd76x_pm). ldd shows no difference. Same
shell being used.

With kind regards,
                F. Behrens
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

System time warping around real time problem - please help

Post by Alan Co » Thu, 27 Mar 2003 15:30:13



> But its awfully nice for low-impact high-resolution timestamps.

> Maybe someday hardware manufacturers will give us a monotonic GHz+ clock that is
> synced across all cpus and is cheap to read...

x86-64 has HPET

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

System time warping around real time problem - please help

Post by Alan Co » Thu, 27 Mar 2003 15:30:16



> Stands for Time Stamp Counter.  It is a special cpu register that
> basically counts cpu cycles.  Some times (incorrectly me thinks) it is
> affected by power management code which slows the cpu by changing the
> cpu frequency.

Not incorrectly. It counts cpu clocks, its designed for profiling and
the like. There is no guarantee in any Intel MP standard that the clocks
are synched up.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

1. System time warping around real time problem - please help

Ok, I had the system running about a week with "notsc" AND no power
management. No system crashes so far. Nevertheless, I keep getting some
kernel oopses like this from time to time. The call trace suggests that
there is still an issue with timing.

Apr  3 15:09:51 rtfm kernel:  printing eip:
Apr  3 15:09:51 rtfm kernel: 49199fd0
Apr  3 15:09:51 rtfm kernel: *pde = 00000000
Apr  3 15:09:51 rtfm kernel: Oops: 0000
Apr  3 15:09:51 rtfm kernel: CPU:    0
Apr  3 15:09:51 rtfm kernel: EIP:    0010:[<49199fd0>]    Tainted: P
Apr  3 15:09:51 rtfm kernel: EFLAGS: 00210287
Apr  3 15:09:51 rtfm kernel: eax: 3e8c329f   ebx: cfd15fac   ecx:
054f7f1e   edx: 000e213f
Apr  3 15:09:51 rtfm kernel: esi: bffffa50   edi: 00000000   ebp:
bffffa58   esp: cfd15f9c
Apr  3 15:09:51 rtfm kernel: ds: 0018   es: 0018   ss: 0018
Apr  3 15:09:51 rtfm kernel: Process lmule (pid: 27969,
stackpage=cfd15000)
Apr  3 15:09:51 rtfm kernel: Stack: c0122b4b bffffa50 cfd15fac 00000008
3e8c329f 000e213f cfd14000 bffffa50
Apr  3 15:09:51 rtfm kernel:        bffffab0 c01091ff bffffa50 00000000
408584d4 bffffa50 bffffab0 bffffa58
Apr  3 15:09:51 rtfm kernel:        0000004e 0000002b 0000002b 0000004e
40655501 00000023 00200287 bffffa1c
Apr  3 15:09:51 rtfm kernel: Call Trace:    [sys_gettimeofday+59/128]
[system_call+51/56]
Apr  3 15:09:51 rtfm kernel:
Apr  3 15:09:51 rtfm kernel: Code:  Bad EIP value.

Do you have any more ideas regarding this issue? I'd hate trying to send
the board in for a check...

Regards,
        Fionn (not subscribed to lklm)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2. image backup of harddisc & making a maintainance disc

3. Help! My system is in a time warp!

4. Web server test software

5. Daylight Savings time, system time, hardware time....

6. need help with pci_module_init

7. Real Time timing problem

8. Toshiba Satellite 1950CT: Xconfig needed

9. convert UT time in local time / local time in UT time

10. time time time how can I tell the time

11. real time and socket-HELP ME PLEASE :-(

12. Please, need help with real-time app under unix.

13. (REPOST:) Need help: Real-time data aquisition - please...