Solaris real-time thread capabilities

Solaris real-time thread capabilities

Post by capnw.. » Sun, 04 Sep 2005 11:37:44



========
QUESTION
========
How can I get 10 millisecond timing reliably on a 2 CPU UltraSPARC-II
450MHz system?

=================
PROBLEM STATEMENT
=================
I have a Solaris application in which I need a thread to process
packages approximately every 10 milliseconds. The thread should be
woken up every 10 ms to send the queued packages and the thread should
then "sleep" until the next 10 ms interval. There is a jitter buffer at
the other end, so I don't need _EXACTLY_ 10 ms... even if I get woken
up within 20 ms it is OK... but being off by 100 ms would be bad.

=======
SUMMARY
=======
I have tried but I can't get the required timing despite using
"hires_tick", and despite creating a RT (realtime) thread, and despite
creating a processor set and allocating one entire CPU to said thread
on a 2 CPU machine. I have read everything I can get my hands on and I
have tried every trick in my book, but I have failed to maintain the
required timing... I tried using SIGALRM to get the required timing, I
tried nanosleep, and I even tried a spin loop, and nothing works... any
help would be appreciated.

===========
DESCRIPTION
===========
I have tried many, many different combinations of the different
strategies shown below... None have yielded adequate results. Anybody
have any other hints or tricks?

(A) Thread priority
-------------------------------------
I have tried increasing the priority of the given thread using
"pthread_attr_setschedparam". Even after setting the maximum priority,
the thread still does not get woken up in time.

(B) Using high resolution timers
-------------------------------------
I have tried enabling "hires_tick" in /etc/system, as follows:

----------
set hires_tick = 1
----------

Even after setting "hires_tick" my thread does not get woken up
reliably every 10 ms.

(C) Using RT (realtime) class
-------------------------------------
I have tried using a Solaris realtime thread using "priocntl"... This
type of thread will not get preempted by the operating system (and you
need root access to create one of these threads). This did not help
either.

(D) Using processor sets
-------------------------------------
I have created a processor set using "psrset", and I have allocated the
second processor (processor 2) to the processor set:

----------

user processor set 1: processor 2
----------

I have allocated only 1 (one) thread to the processor set. In other
words, only my realtime LWP is running on processor #2 (nothing else is
running on said processor). Even with an entire CPU at it's disposal,
the thread still does not get woken up in time.

(E) Using processor NO-INTERRUPT mode
-------------------------------------
I have prevented processor 2 from being interrupted by I/O devices by
setting it to "no-interrupt" mode:

----------

0       on-line   since 08/26/05 16:17:01
2       no-intr   since 08/26/05 17:19:48
----------

Even if the processor is set to "no-intr", the thread still does not
maintain reliable timing.

(F) Using SIGALRM
-------------------------------------
My initial attempt was to use SIGALRM to deliver the required timing,
but under Solaris 8 this did not work reliably.

(G) Using nanosleep
-------------------------------------
I have tried using "nanosleep" to sleep the prescribed amount of time.
Most of the time nanosleep wakes up my thread reliably, but once in a
while the thread is woken up too late (i.e. more than 100 ms have
elapsed).

(H) Using spin loop
-------------------------------------
In desperation I have also tried using a spin loop using "gethrtime()"
in order to _NOT_ call nanosleep, as shown conceptually below:

----------
t2 = gethrtime();
while(t2 < target_time)
{
  t1 = gethrtime();
  t2 = gethrtime();

  diff = t2 - t1;
  [...cut...]

Quote:}

----------

In the above loop, the average difference between t1 and t2 is about
1000 nanoseconds (this is good), but once in a while the difference
between t1 and t2 is 300,000,000 nanoseconds (this is bad)!!!

(H) I am out of ideas...
-------------------------------------
Anybody have any suggestions? Is there any way to get reliable 10
millisecond resolution from Solaris?

======================
HARDWARE CONFIGURATION
======================
System Configuration:  Sun Enterprise 220R
CPU: 2 X UltraSPARC-II 450MHz
System clock frequency: 113 MHz
Memory size: 2048 Megabytes

Thanks!

 
 
 

Solaris real-time thread capabilities

Post by Logan Sha » Sun, 04 Sep 2005 12:37:52



> ========
> QUESTION
> ========
> How can I get 10 millisecond timing reliably on a 2 CPU UltraSPARC-II
> 450MHz system?
> (H) Using spin loop
> -------------------------------------
> In desperation I have also tried using a spin loop using "gethrtime()"
> in order to _NOT_ call nanosleep, as shown conceptually below:

> ----------
> t2 = gethrtime();
> while(t2 < target_time)
> {
>   t1 = gethrtime();
>   t2 = gethrtime();

>   diff = t2 - t1;
>   [...cut...]
> }
> ----------

> In the above loop, the average difference between t1 and t2 is about
> 1000 nanoseconds (this is good), but once in a while the difference
> between t1 and t2 is 300,000,000 nanoseconds (this is bad)!!!

If I had to guess (and a guess is exactly what it'd be), I would say
that what's going on here is that your code is running as a user
process, and every now and then there is a kernel thread that is a
higher priority that runs and excludes your thread from the
processor.

If I recall correctly, the range of priorities for userspace threads
is a subset of the range of priorities for kernel threads, so it is
possible to have kernel threads that have a priority higher than your
userspace thread, no matter how high you make its priority, and I
believe that even applies if you make your thread a realtime thread.
I've tried to find my Solaris 8 Internals book to confirm that, but I
don't see it, and I don't have time to undertake a major excavation
on my desk right now...  ;-)

   - Logan

 
 
 

Solaris real-time thread capabilities

Post by Greg Menk » Sun, 04 Sep 2005 12:52:06


Until you measure your application scheduling with a hires timer, you
have no idea where the jitter is coming from.

IP does not guarantee you realtime transfer of data- or even reasonably
bounded latency.  Its not clear what your "packages" are or how you're
transferring them, so I assume you're sending packets via UDP.  If
you're sending them via TCP, you have even bigger problems.

An IP stack can easily add 10's of milliseconds to some packets and not
others, or it will plain drop them with no notification to you.  This
often leads people to try TCP, which can, will and invisibly adds all
kinds of delays you cannot control or even observe.

You should not be relying on a network for realtime transfers until &
unless you have something like a TDMA stack, driver and phy layer and
your app is designed to use it.  Even if the operating systems and IP
stacks on both end of the transfer offer some form of bounded latency,
your comms timing is then wholly dependent on how much latency
intervening switches & routers add, which to a considerable degree
depends on ambient traffic which is still more random.

You might be able to get around it by doing your realtime sampling in
one task, then accumulating timestamped data in a 2nd regular priority
task which handles transfers to the destination machine.

It sounds to me as if you have not designed your system to handle
real-world characteristics of computers and networks.

Gregm

 
 
 

Solaris real-time thread capabilities

Post by David Hopwoo » Sun, 04 Sep 2005 13:18:55



> IP does not guarantee you realtime transfer of data- or even reasonably
> bounded latency.  Its not clear what your "packages" are or how you're
> transferring them, so I assume you're sending packets via UDP.

That's a big assumption, isn't it? He/she didn't say anything about networking.

Then again, Solaris is not a real-time OS.

--

 
 
 

Solaris real-time thread capabilities

Post by Greg Menk » Sun, 04 Sep 2005 13:42:06




> > IP does not guarantee you realtime transfer of data- or even reasonably
> > bounded latency.  Its not clear what your "packages" are or how you're
> > transferring them, so I assume you're sending packets via UDP.

> That's a big assumption, isn't it? He/she didn't say anything about networking.

Sure it is, but it sounded like he was sampling data, then sending it.
Unless he was playing with message queues of some flavor or playing
around with shared memory.

Quote:

> Then again, Solaris is not a real-time OS.

Truly.  I am curious about how good the realtime scheduling is though.

Gregm

 
 
 

Solaris real-time thread capabilities

Post by Andrew Gabri » Sun, 04 Sep 2005 23:37:41




Quote:

>Truly.  I am curious about how good the realtime scheduling is though.

It gets significantly better with more recent releases,
particularly in the case of realtime threads using the IP stack.
OP hasn't given any clue what Solaris release he's using though.
Hopefully it's at least something _after_ Solaris 8 FCS.

It would also be useful to know if there are any 3rd-party drivers
in use. These have been known to*up realtime response.

--
Andrew Gabriel

 
 
 

Solaris real-time thread capabilities

Post by capnw.. » Thu, 08 Sep 2005 06:06:02


Hello Logan, My Solaris Internals book (c) 2001, mentions on page 47
that the average latency for gethrtime() is 320 nanoseconds... If this
were true it would make me very happy :-) The problem is when I get a
latency one million (1,000,000) times the published value... that's
when I start wondering if I am doing something wrong... By the way,
Austin is a nice town, isn't it :)
 
 
 

Solaris real-time thread capabilities

Post by capnw.. » Thu, 08 Sep 2005 06:26:05


Hello Greg, Thanks for your reply. My current question is related to
real-time scheduling capabilities of Solaris 8 and/or Solaris 9. The
problem can be boiled down to the following very simple tests:

1. Can I use nanosleep to try to wake up every 10 ms (on average)?
2. Can I use SIGALRM instead?
3. Can I use gethrtime instead?

For the time being, you can completely ignore the network, transfering
data, IP, TCP, UDP and/or jitter buffers...


> It sounds to me as if you have not designed your system to
> handle real-world characteristics of computers and networks.

Wow... that's why I hate posting on Usenet... because people jump into
conclusions... I'm sorry Greg... I appreciate your taking the time to
share your knowledge, but you know _NOTHING_ about the design of my
"real-world" application... all you know is the PROBLEM STATEMENT as
stated in my opening post: "--How to obtain reliable 10 ms timing from
Solaris on a 450 MHz UltraSparc2?"... Even though my post was very well
articulated, you start making assumptions about delays in intervening
switches and routers and claiming that "it sounds like I have not
designed my system properly"... That's quite a disappointing response
from someone like yourself who could probably offer something positive
to the community...
 
 
 

Solaris real-time thread capabilities

Post by capnw.. » Thu, 08 Sep 2005 06:30:05


Hello Andrew, Thanks for the reply... I will talk to my sysadmin and
post a reply with any "suspicious" 3rd-party stuff that may be screwing
up realtime response... To my knowledge it's as close as
"out-of-the-box" as it gets... By the way, buried somewhere in my
lengthy opening post was the fact that I was running Solaris 8. I
haven't tried my full test suite against Solaris 9, but a simple while
loop testing the latency of "gethrtime()" showed that Solaris 9 was no
better than Solaris 8.
 
 
 

Solaris real-time thread capabilities

Post by Eric Sosma » Thu, 08 Sep 2005 06:53:25



> Hello Logan, My Solaris Internals book (c) 2001, mentions on page 47
> that the average latency for gethrtime() is 320 nanoseconds... If this
> were true it would make me very happy :-) The problem is when I get a
> latency one million (1,000,000) times the published value... that's
> when I start wondering if I am doing something wrong...

    You can only make gethrtime() calls at a rate of three
per second?  I share the wonderment in your final sentence,
along with your suspicion that something unaccounted-for
is going on.  ("Unaccounted-for" sounds so much more tactful
than "wrong," don't you think?)  How are you measuring the
rate?  Please supply a code sample.

    FWIW, I just ran a brief test on my desktop system, a
venerable Ultra-10 running Solaris 9 on an ancient UltraSPARC-II
processor at a whopping 360 MHz.  I ran this simple loop:

        for (i = 0;  i < 101;  ++i)
            when[i] = gethrtime();

... and then post-analyzed the 100 sample-to-sample intervals.
The slowest observed was 692 ns, the fastest was 291 ns, the
average was 296 ns, and the standard deviation was 40 ns.
That's reasonably close to the 320 ns figure cited in the
book, but nothing at all like your 320,000,000 ns ...

    Disclaimer: I'm writing about Sun products and I work for
Sun, but I do not speak for Sun.

--

 
 
 

Solaris real-time thread capabilities

Post by capnw.. » Thu, 08 Sep 2005 07:07:08


Hello Eric, Thanks a lot for your reply! Code sample coming up... I
basically kept track of the maximum latency between t1 and t2 (as shown
in my while loop on my initial post)... the average latency was about
1000 ns (over a 5 minute period) but the max latency was 323,000,000!
I'll post the sample code shortly...
 
 
 

Solaris real-time thread capabilities

Post by Trond Norby » Thu, 08 Sep 2005 07:13:01



> Hello Eric, Thanks a lot for your reply! Code sample coming up... I
> basically kept track of the maximum latency between t1 and t2 (as shown
> in my while loop on my initial post)... the average latency was about
> 1000 ns (over a 5 minute period) but the max latency was 323,000,000!
> I'll post the sample code shortly...

hmm... who else is using your computer ;) or do you have broken
hardware? any messages in /var/adm/messages?

Trond

 
 
 

Solaris real-time thread capabilities

Post by Greg Menk » Thu, 08 Sep 2005 07:42:44



> Hello Greg, Thanks for your reply. My current question is related to
> real-time scheduling capabilities of Solaris 8 and/or Solaris 9. The
> problem can be boiled down to the following very simple tests:

> 1. Can I use nanosleep to try to wake up every 10 ms (on average)?
> 2. Can I use SIGALRM instead?
> 3. Can I use gethrtime instead?

Have you run empty loops using those various techniques to see if you
can achieve the timing you're looking for?  If yes (and I imagine
nanosleep and gethrtime will be a reasonable way to approach it), then
the problem is elsewhere.  Perhaps you can do the empty loop test with
nanosleep and SIGALRM and see which gives you the minimum jitter.

Quote:> For the time being, you can completely ignore the network, transfering
> data, IP, TCP, UDP and/or jitter buffers...

Once you've confirmed you can run a simple for loop at something close
to your target rate, then these will be the next hurdle.  That is almost
certainly where the problems are coming from.


> > It sounds to me as if you have not designed your system to
> > handle real-world characteristics of computers and networks.

> Wow... that's why I hate posting on Usenet... because people jump into
> conclusions... I'm sorry Greg... I appreciate your taking the time to
> share your knowledge, but you know _NOTHING_ about the design of my
> "real-world" application... all you know is the PROBLEM STATEMENT as
> stated in my opening post: "--How to obtain reliable 10 ms timing from
> Solaris on a 450 MHz UltraSparc2?"... Even though my post was very well
> articulated, you start making assumptions about delays in intervening
> switches and routers and claiming that "it sounds like I have not
> designed my system properly"... That's quite a disappointing response
> from someone like yourself who could probably offer something positive
> to the community...

Your post certainly was articulated, but lacked essentially all the
detail needed to evaluate the problem- which lead to my (hasty)
assumption.  Most of the length of your post is detailing the scheduling
parameters you were changing around- but there wasn't even the results
of prstat showing how busy the cpu was while executing your job.

Latency and jitter don't happen in isolation, they are effects and the
first set of those to examine are the things your code does.  In order
to try and find your problem we need details about what "packages" are,
how you acquire them, how long it takes to process them, how you send
them, etc..  more than likely your latency problem is coming from some
of these processing steps.  The more detailed your question, the more
detailed the answer.

I do apologize for jumping off the handle.  At work I get a lot of
questions about bugs in operating systems which have almost all been due
to trivial bugs, assumptions and misunderstandings made by the
applications programmer.  For some reason its often easier for people to
suspect a subtle problem in the OS than a simple one in the program
they're writing.

Gregm

 
 
 

Solaris real-time thread capabilities

Post by Chris Friese » Thu, 08 Sep 2005 07:53:20



> Hello Eric, Thanks a lot for your reply! Code sample coming up... I
> basically kept track of the maximum latency between t1 and t2 (as shown
> in my while loop on my initial post)... the average latency was about
> 1000 ns (over a 5 minute period) but the max latency was 323,000,000!
> I'll post the sample code shortly...

You might try building some kind of histogram with buckets, as it would
give more information.  If you have one or two samples where the max
latency is really high, it will seriously throw off your average.

A single point where you've got a 323ms worst-case delay isn't actually
all that bad for most purposes.  This may actually be typical for
solaris--do you have any reason to expect better?

Chris

 
 
 

Solaris real-time thread capabilities

Post by capnw.. » Thu, 08 Sep 2005 08:49:07


Hello Trond, I have tried my tests on two systems (a 4 CPU 1050 MHz
system and a 2 CPU 450 MHz system). The test fails on both... I share
the 4 CPU machine with my other co-workers, but I have exclusive access
to the 2 CPU machine. Nothing else was running... I use psrset to
create a processor set, and only my "test" program was running on CPU 2
(nothing else was assigned to that CPU). No suspicious messages inside
/var/adm/messages. The 4 CPU machine has plenty of messages about
"Accepted publickey" and "sendmail". No messages on the 2 CPU machine.
I hope the hardware is not broken :-) Using prtdiag does not show
anything out of the ordinary...
 
 
 

1. a real-time thread sleep

I've got a real-time thread that reads data from a pci device and stores
it in a user-space buffer.  The system has no flow control, so I need to
read it when the receive fifo becomes half-full.  There's not interrupt
cabability for this device, so I need to poll the device to check it's
state, about every 2milisecs.  I would like this to not burn cpu cycles
in order to poll, so I'd like to put the thread to sleep between polls.

I think I can use the pthread_cond_timedwait routine, but this seems like
a fairly inelegant way to do accomplish this.  Can anyone suggest a more
appropriate method?

Thanks,

-Charles

2. Free/net/open BSD or Linux?

3. real-time threads

4. Apache authentication

5. Sluggish Real-time threads on x86

6. OpenBSD on Ultra5

7. threads in real-time scheduling - help needed

8. Threads, real time and LWPs on Solaris 2.2

9. How do I configure solaris for real time threads

10. real time capability

11. Real-time process and Solaris

12. enabling real-time for solaris 2.3