Bug report: smp affinity patch

Bug report: smp affinity patch

Post by Joe Kor » Sun, 24 Feb 2002 07:10:09



Hi everyone,
  On occasion, the smp affinity patch can leave one or more runnable
processes in such a state that the scheduler never selects them for
execution.  The reason this occurs is unknown.  This note reports
the symptoms and how the problem may be replicated:

I am using the smp affinity patch from Robert Love, which provides
a /proc/pid/affinity interface to the user.  I presume the problem
is also present in the Ingo Molnar patch, since it is so similar
in implementation, although I have not tested it.

I ran across this problem when I wrote a shell script that
implemented cpu shielding.  The shielding script modifies the
affinities of (most) all proceses in the system -- for each process,
it either forces that process to run only on the shielded cpu, or
forces it to avoid the shielded cpu altogether.  The only processes
to which neither are applied are the ksoftirqd_CPUxx daemons, each of
which has to remain on the cpu originally attached to.

Joe

---------------------------------------

Environment:
    linux-2.4.17.tar.gz
    + patch-2.4.18-rc2.gz
    + cpu-affinity-rml-2.4.16-1.patch (from www.tech9.net/rml/linux)
    PC, Pentium III, dual cpu's, dual IO APIC's, scsi, console via com1.

Test Shell Script used (filename `shield'):

    #!/bin/bash
    # shell script that reserves some cpu to some small
    # set of procs: accomplished by tweaking the affinity
    # masks of all procs -- either by removing that cpu from
    # those pids which are not to use it, or by setting the
    # affinity to only that cpu, for those procs that are
    # to be attached to the shielded cpu..
    #
    # usage: shield unshieldmask shieldmask pid pid ...
    # example: shield 2 1 1027 1028
    # meaning: pids 1027,1028 are to run on cpu0; every
    # other procs is to run on cpu 1.
    # example: shield 3 3
    # meaning: make every cpu available to all procs.
    # note: procs 3 & 4 (ksoftirqd_CPU[0-1]) are not and
    # must not have their affinities changed by this script.

    unshieldmask=${1:-e}
    shift
    shieldmask=${1:-1}
    shift
    cd /proc
    for i in $(/bin/ls -d [0-9]*); do
        if [ -d $i ]; then
            case $i in
                3|4) ;;
                ${1:-no}) echo $shieldmask >$i/affinity ;;
                ${2:-no}) echo $shieldmask >$i/affinity ;;
                ${3:-no}) echo $shieldmask >$i/affinity ;;
                ${4:-no}) echo $shieldmask >$i/affinity ;;
                *)  echo $unshieldmask >$i/affinity ;;
            esac
        fi
    done

Test initialization sequence:

    in window #1:
        top -d1 -i
    in window #2:
        echo 'main() {for(;;);}' >l.c && make l
        ./l &
        [1]  1087
        ./l &
        [2]  1088
        ./l &
        [3]  1089
        ./l &
        [4]  1090

Test sequence and results:

    In the below tests, to `Stall' means that the scheduler fails to
    give a runnable process any time.

    Notation:
      1088 Stalls       - pid 1088 stalls.  viewable in the top(1) window
                          as the bottom `running' proc, but has 0% cpu
                          utilization.

      top Stalls        - the top window stops updating.  Due to
                          top itself being a victim of the scheduling
                          bug.  to see: run another top in another window.

    result              command line executed
    ------------------  ---------------------
    ok                  shield 1 2
    ok                  shield 1 2
    ok                  shield 3 1
    ok                  shield 3 1 1087
    ok                  shield 2 1 1087
    ok                  shield 3 3
    top Stalls          shield 2 1 1090
    ok                  shield 3 3
    top Stalls          shield 2 1 1090
    ok                  shield 3 3
    top Stalls          shield 2 1 1090
    ok                  shield 3 3
    1088 Stalls         shield 2 1 1090
    ok                  shield 3 3
    ok                  shield 2 1
    ok                  shield 1 2
    ok                  shield 2 1
    ok                  shield 1 2
    ok                  shield 2 1
    1090 Stalls         shield 1 2 1090
    1090 Stalls         shield 1 2 1090
    top Stalls plus     shield 2 1 1090
    1087, sendmail,
    crond,init,lots
    of kjournals, and
    syslogd
    ok                  shield 3 3
    top + shell window  shield 2 1 1090
    Stalls
    ok                  shield 3 3              (executed in another window)
    1087 Stalls         shield 1 2 1090

Sample good and bad top(1) Displays:

------------------------------------------------------------------- good

  9:18pm  up 38 min,  3 users,  load average: 4.00, 4.31, 4.69
48 processes: 43 sleeping, 5 running, 0 zombie, 0 stopped
CPU0 states: 100.0% user,  0.0% system,  0.0% nice,  0.0% idle
CPU1 states: 99.0% user,  1.0% system,  0.0% nice,  0.0% idle
Mem:   513160K av,   44600K used,  468560K free,       0K shrd,   10156K buff
Swap: 1052216K av,       0K used, 1052216K free                   17420K cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 1088 root      20   0   268  268   220 R    52.0  0.0  15:28 l
 1089 root      20   0   268  268   220 R    50.0  0.0  15:54 l
 1087 root      18   0   268  268   220 R    48.0  0.0  13:38 l
 1090 root      18   0   268  268   220 R    48.0  0.0  22:44 l
 1027 root      10   0  1060 1060   856 R     1.0  0.2   0:13 top

------------------------------------------------------------------- bad

  9:20pm  up 40 min,  3 users,  load average: 6.12, 5.19, 4.97
49 processes: 43 sleeping, 6 running, 0 zombie, 0 stopped
CPU0 states: 97.0% user,  3.0% system,  0.0% nice,  0.0% idle
CPU1 states: 100.0% user,  0.0% system,  0.0% nice,  0.0% idle
Mem:   513160K av,   44940K used,  468220K free,       0K shrd,   10392K buff
Swap: 1052216K av,       0K used, 1052216K free                   17420K cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 1090 root      16   0   268  268   220 R    99.9  0.0  24:24 l
 1089 root      16   0   268  268   220 R    64.9  0.0  16:58 l
 1088 root       9   0   268  268   220 R    33.9  0.0  16:31 l
  978 root       9   0   664  664   552 R     0.0  0.1   0:00 in.telnetd
 1027 root       9   0  1060 1060   856 R     0.0  0.2   0:14 top
 1087 root       9   0   268  268   220 R     0.0  0.0  14:05 l

-------------------------------------------------------------------------
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/