reader-writer livelock problem

reader-writer livelock problem

Post by William Lee Irwin II » Sat, 09 Nov 2002 06:00:11




> This is a follow-up to the email thread I started on July 29th.
> See http://www.cs.helsinki.fi/linux/linux-kernel/2002-30/0446.html
> and the following discussion on LKML.
> I'll summarize the problem again to refresh the issue.
> Again, this is a correctness issue, not a performance one.
> I am seeing a problem on medium-sized SMPs where user programs are
> able to livelock the Linux kernel to such an extent that the
> system appears dead.  With the help of some hardware debugging
> tools, I was able to determine that the problem is caused by
> the reader-preference reader/writer locks in the Linux kernel.

This is a very serious problem which I have also encountered. My
strategy was to make the readers on the tasklist_lock more well-behaved,
and with Ingo's help and co-authorship those changes were cleaned up,
tuned to provide performance benefits for smaller systems, bugfixed,
and incorporated in the kernel. They have at least provided 16x systems
in my lab with much more stability. The issues are still triggerable on
32x systems in my lab, to which I do not have regular access.

Rusty, Dave, Ingo, and Linus cc:'d for additional commentary/help.

Bill
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

reader-writer livelock problem

Post by Jeremy Fitzharding » Sat, 09 Nov 2002 19:20:07




> > This is a follow-up to the email thread I started on July 29th.
> > See http://www.cs.helsinki.fi/linux/linux-kernel/2002-30/0446.html
> > and the following discussion on LKML.
> > I'll summarize the problem again to refresh the issue.
> > Again, this is a correctness issue, not a performance one.
> > I am seeing a problem on medium-sized SMPs where user programs are
> > able to livelock the Linux kernel to such an extent that the
> > system appears dead.  With the help of some hardware debugging
> > tools, I was able to determine that the problem is caused by
> > the reader-preference reader/writer locks in the Linux kernel.

> This is a very serious problem which I have also encountered. My
> strategy was to make the readers on the tasklist_lock more well-behaved,
> and with Ingo's help and co-authorship those changes were cleaned up,
> tuned to provide performance benefits for smaller systems, bugfixed,
> and incorporated in the kernel. They have at least provided 16x systems
> in my lab with much more stability. The issues are still triggerable on
> 32x systems in my lab, to which I do not have regular access.

The normal way of solving this fairness problem is to make pending write
locks block read lock attempts, so that the reader count is guaranteed
to drop to zero as read locks are released.  I haven't looked at the
Linux implementation of rwlocks, so I don't know how hard this is to
do.  Or perhaps there's some other reason for not implementing it this
way?

        J

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

reader-writer livelock problem

Post by Linus Torvald » Sat, 09 Nov 2002 19:30:10



Quote:

> The normal way of solving this fairness problem is to make pending write
> locks block read lock attempts, so that the reader count is guaranteed
> to drop to zero as read locks are released.  I haven't looked at the
> Linux implementation of rwlocks, so I don't know how hard this is to
> do.  Or perhaps there's some other reason for not implementing it this
> way?

There's another reason for not doing it that way: allowing readers to keep
interrupts on even in the presense of interrupt uses of readers.

If you do the "pending writes stop readers" approach, you get

                cpu1                    cpu2

                read_lock() - get

                                        write_lock_irq() - pending

                irq happens
                 - read_lock() - deadlock

and that means that you need to make readers protect against interrupts
even if the interrupts only read themselves.

NOTE! I'm not saying the existing practice is necessarily a good tradeoff,
and maybe we should just make sure to find all such cases and turn the
read_lock() calls into read_lock_irqsave() and then make the rw-locks
block readers on pending writers. But it's certainly more work and cause
for subtler problems than just naively changing the rw implementation.

                Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

reader-writer livelock problem

Post by Linus Torvald » Sat, 09 Nov 2002 19:40:05



> NOTE! I'm not saying the existing practice is necessarily a good tradeoff,
> and maybe we should just make sure to find all such cases and turn the
> read_lock() calls into read_lock_irqsave() and then make the rw-locks
> block readers on pending writers. But it's certainly more work and cause
> for subtler problems than just naively changing the rw implementation.

Actually, giving this som emore thought, I really suspect that the
simplest solution is to alloc a separate "fair_read_lock()", and paths
that need to care about fairness (and know they don't have the irq issue)  
can use that, slowly porting users over one by one...

                Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

reader-writer livelock problem

Post by David Howell » Sat, 09 Nov 2002 19:40:10


> The normal way of solving this fairness problem is to make pending write
> locks block read lock attempts, so that the reader count is guaranteed
> to drop to zero as read locks are released.  I haven't looked at the
> Linux implementation of rwlocks, so I don't know how hard this is to
> do.  Or perhaps there's some other reason for not implementing it this
> way?

Actually implementing a fair spinlocks and fair rwlocks on the x86 arch are
very easy (at least, if you have XADD it is). Any arch which has CMPXCHG can
also do it, just not so easily.

I've attached an implementation of a fair spinlock and an implementation of a
fair rwlock (which can be compiled and played with in u-space).

David

typedef struct fairlock {
        u8      curr;
        u8      ticket;

} fairlock_t;

static inline init_fairlock(fairlock_t *lock)
{
        memset(lock,0,sizeof(*lock));

}

/*
 * spin lock fairly
 */
static inline void fair_lock(fairlock_t *lock)
{
        u8 number = 1;

        __asm__ __volatile__(
                "# beginning fair_lock\n\t"
LOCK_PREFIX     "  xaddb     %b2,%1\n\t"      /* number = lock->ticket; lock->ticket++; */
                "1:\n\t"
                "  cmpb         %b2,%0\n\t"
                "  jne       2f\n\t"          /* jump if number!=lock->curr */
                LOCK_SECTION_START("")
                "2:\n\t"
                "  rep; nop\n\t"
                "  jmp       1b\n"
                LOCK_SECTION_END
                "# ending fair_lock\n\t"
                : "=m"(lock->curr), "=m"(lock->ticket), "=r"(number)
                : "r"(lock), "m"(lock->curr), "m"(lock->ticket), "2"(number)
                : "memory", "cc");

}

/*
 * spin trylock fairly
 */
static inline void fair_trylock(fairlock_t *lock)
{
        u32 tmp;

        __asm__ __volatile__(
                "# beginning fair_trylock\n\t"
                "  movw      (%3),%%ax\n\t"   /* AL=curr, AH=ticket */
                "1:\n\t"
                "  cmpb      %%al,%%ah\n\t"
                "  je        3f\n\t"          /* jump if maybe we can get it */
                "2:\n\t"
                LOCK_SECTION_START("")
                "3:\n\t"
                "  leaw      1(%ax),%w2\n\t"  /* [propose] ticket=ticket+1 */
LOCK_PREFIX     "  cmpxchgw  %w2,(%3)\n\t"
                "  je        3b\n\t"          /* jump if worked */
                "  rep; nop\n\t"              /* didn't work; AL & AH have been updated */
                "  jmp       1b\n"
                LOCK_SECTION_END
                "# ending fair_trylock\n\t"
                : "=m"(lock->curr), "=m"(lock->ticket), "=$r"(tmp)
                : "r"(lock), "m"(lock->curr), "m"(lock->ticket)
                : "memory", "cc", "eax");

}

/*
 * spin unlock fairly
 */
static inline void fair_unlock(fairlock_t *lock)
{
        u8 number;

        __asm__ __volatile__(
                "# beginning fair_unlock\n\t"
LOCK_PREFIX     "  incb      %1\n\t"          /* lock->curr++; */
                "# ending fair_unlock\n\t"
                : "=m"(lock->curr)
                : "r"(lock), "m"(lock->curr)
                : "memory", "cc");

}

/* rwlock.c: fair read/write spinlocks
 *
 * Copyright (c) 2002   David Howells (dhowe...@redhat.com).
 */

#include <linux/types.h>

typedef unsigned char u8;
typedef unsigned int u32;

#define LOCK_PREFIX "lock;"

typedef struct rwlock {

        /* these member variables must be at the beginning and in this order */
        u8      rd_curr;        /* next reader ticket allowed to become active */
        u8      curr;           /* currently active ticket */
        u8      ticket;         /* next ticket to be issued */
        u8      __pad;

} __attribute__((aligned(4))) rwlock_t;

#define RWLOCK_UNLOCKED (rwlock_t) { 0, 0, 0, 0 }

#define rwlock_debug(LOCK,WHERE)                                                        \
do {                                                                                    \
        printf(WHERE"{%02x,%02x,%02x}\n",LOCK->rd_curr,LOCK->curr,LOCK->ticket);     \

} while(0)

/*
 * obtain a write lock
 * - pull the next ticket from lock->ticket (which is subsequently incremented)
 * - spin until lock->curr catches up to the value that lock->ticket had before the XADD
 * - lock->rd_curr is left equal to the lock->curr (and thus my ticket number) to prevent reads
 *   getting a lock
 */
static inline void write_lock(rwlock_t *lock)
{
        u32 eax;

        rwlock_debug(lock,"-->write_lock");

        asm volatile(
                "  # begin write_lock      \n"
LOCK_PREFIX     "  xaddw   %%ax,1(%3)      \n"        /* my ticket in AH */
                "0:        cmpb    %%al,%%ah       \n"        /* lock->curr in AL */
                "  jne     2f              \n"
                "  .section .text.lock,\"ax\"\n"
                "2:                                \n"
                "  rep; nop                \n"
                "  movb    1(%3),%%al      \n"        /* reload AL from lock->curr */
                "  jmp     0b              \n"
                "  .previous               \n"
                "  # end write_lock        \n"
                : "=m"(lock), "=a"(eax)
                : "m"(lock), "r"(lock), "a"(0x0100)
                : "memory", "cc"
                );

        rwlock_debug(lock,"<--write_lock");

}

/*
 * release a write lock
 * - advance both lock->rd_curr and lock->curr by one to enable the next lock to be granted
 */
static inline void write_unlock(rwlock_t *lock)
{
        u32 eax;

        rwlock_debug(lock,"-->write_unlock");

        asm volatile(
                "  # begin write_unlock    \n"
                "  movw    0(%3),%%ax      \n"
                "  incb    %%al            \n"        /* lock->rd_curr++ */
                "  incb    %%ah            \n"        /* lock->curr++ */
                "  movw    %%ax,0(%3)      \n"
                "  # end write_unlock      \n"
                : "=m"(lock), "=&a"(eax)
                : "m"(lock), "r"(lock)
                : "cc"
                );

        rwlock_debug(lock,"<--write_unlock");

}

/*
 * obtain a read lock
 * - pull the next ticket from lock->ticket (which is subsequently incremented)
 * - spin until lock->rd_curr catches up to the value that lock->ticket had before the XADD
 * - lock->rd_curr is then advanced by one to allow the next read lock to be granted
 * - lock->curr is irrelevant
 */
static inline void read_lock(rwlock_t *lock)
{
        u32 eax;

        rwlock_debug(lock,"-->read_lock");

        asm volatile(
                "  # begin read_lock       \n"
LOCK_PREFIX     "  xaddb   %%ah,2(%3)      \n"        /* my ticket in AH */
                "0:        movb    0(%3),%%al      \n"        /* AL = lock->rd_curr */
                "  cmpb    %%al,%%ah       \n"        /* if (ticket!=lock->rd_curr)  */
                "  jne     2f              \n"        /* then jump */
                "  incb    0(%3)           \n"        /* lock->rd_curr */
                "  .section .text.lock,\"ax\"\n"
                "2:                                \n"
                "  rep; nop                \n"
                "  jmp     0b              \n"
                "  .previous               \n"
                "  # end read_lock \n"
                : "=m"(lock), "=a"(eax)
                : "m"(lock), "r"(lock), "a"(0x0100)
                : "memory", "cc"
                );

        rwlock_debug(lock,"<--read_lock");

}

/*
 * release a read lock
 * - just advance the lock->curr count so that any spinning write lock can happen
 */
static inline void read_unlock(rwlock_t *lock)
{
        rwlock_debug(lock,"-->read_unlock");

        asm volatile(
                "  # begin read_unlock     \n"
LOCK_PREFIX     "  incb    %0              \n"        /* lock->curr++ */
                "  # end read_unlock       \n"
                : "=m"(lock->curr)
                : "m"(lock->curr)
                : "cc"
                );

        rwlock_debug(lock,"<--read_unlock");

}

/*****************************************************************************/
/*
 * testing stuff
 */
rwlock_t mylock = RWLOCK_UNLOCKED;

#define wibble() asm("nop")

void get_read_lock(void)
{
        wibble();
        read_lock(&mylock);
        read_lock(&mylock);
        read_unlock(&mylock);
        wibble();
        read_lock(&mylock);
        read_unlock(&mylock);
        read_unlock(&mylock);
        wibble();

}

void get_write_lock(void)
{
        wibble();
        write_lock(&mylock);
        wibble();
        write_unlock(&mylock);
        wibble();

}

int main()
{
        get_read_lock();
        get_write_lock();
        get_write_lock();
        get_read_lock();
        return 0;

- Show quoted text -

}

 
 
 

reader-writer livelock problem

Post by Jeremy Fitzharding » Sat, 09 Nov 2002 19:50:06



> There's another reason for not doing it that way: allowing readers to keep
> interrupts on even in the presense of interrupt uses of readers.

> If you do the "pending writes stop readers" approach, you get

>            cpu1                    cpu2

>            read_lock() - get

>                                    write_lock_irq() - pending

>            irq happens
>             - read_lock() - deadlock

> and that means that you need to make readers protect against interrupts
> even if the interrupts only read themselves.

Even without interrupts that would be a bug.  It isn't ever safe to
attempt to retake a read lock if you already hold it, because you may
deadlock with a pending writer.  Fair multi-reader locks aren't
recursive locks.

Quote:> NOTE! I'm not saying the existing practice is necessarily a good tradeoff,
> and maybe we should just make sure to find all such cases and turn the
> read_lock() calls into read_lock_irqsave() and then make the rw-locks
> block readers on pending writers. But it's certainly more work and cause
> for subtler problems than just naively changing the rw implementation.

Yes, I'd agree.  It would definitely be a behavioural change with
respect to the legality of retaking a lock for reading, which would
probably be quite irritating to find (since they'd only cause a problem
if they actually coincide with an attempted write lock).

Quote:> Actually, giving this som emore thought, I really suspect that the
> simplest solution is to alloc a separate "fair_read_lock()", and paths
> that need to care about fairness (and know they don't have the irq
> issue)  
> can use that, slowly porting users over one by one...

Do you mean have a separate lock type, or have two different read_lock
operations on the current type?

        J

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

reader-writer livelock problem

Post by Van Maren, Kevi » Sat, 09 Nov 2002 19:50:09


Yes, that was one of the options I suggested in the original post
to the IA64 list.  I'll bounce it to LKML for reference, now that
the discussion has moved there.

In my proposal, however, I proposed making (the current) reader
locks recursive spinlocks (to eliminate the starvation problem,
at the expense of eliminating reader parallelism), which would
have the added benefit of "encouraging" people to move to the
fair locks.  But your proposal is probably at least as good.

Of course, normal spinlocks do not scale either, since they
currently require N cache-cache transfers for a processor to
drop the lock, which results in N^2 cache transfers for each
processor to acquire/release the lock once.  So with 32 processors
contending for the lock, at 1us per cache-cache transfer (picked
for easy math, but that is reasonable for large systems), it
takes 1ms for each processor to acquire the spinlock and hold it
for 10 cpu cycles.

So I'd _also_ like to see an MCS lock implementation replace the current
spinlock code (IBM "NMCS" lock patch can be trivially used to replace
all spinlocks).

Kevin

-----Original Message-----
From: Linus Torvalds
To: Jeremy Fitzhardinge



Sent: 11/8/02 12:28 PM
Subject: Re: [Linux-ia64] reader-writer livelock problem


> NOTE! I'm not saying the existing practice is necessarily a good
tradeoff,
> and maybe we should just make sure to find all such cases and turn the
> read_lock() calls into read_lock_irqsave() and then make the rw-locks
> block readers on pending writers. But it's certainly more work and
cause
> for subtler problems than just naively changing the rw implementation.

Actually, giving this som emore thought, I really suspect that the
simplest solution is to alloc a separate "fair_read_lock()", and paths
that need to care about fairness (and know they don't have the irq
issue)  
can use that, slowly porting users over one by one...

  Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

reader-writer livelock problem

Post by David Howell » Sat, 09 Nov 2002 19:50:11


Quote:> Do you mean have a separate lock type, or have two different read_lock
> operations on the current type?

You'd probably be better off with separate types... that way you can avoid
problems with conditionals and also with different tracking fields being
required by each grade of lock.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

reader-writer livelock problem

Post by Stephen Hemminge » Sat, 09 Nov 2002 20:10:05



> > The normal way of solving this fairness problem is to make pending write
> > locks block read lock attempts, so that the reader count is guaranteed
> > to drop to zero as read locks are released.  I haven't looked at the
> > Linux implementation of rwlocks, so I don't know how hard this is to
> > do.  Or perhaps there's some other reason for not implementing it this
> > way?

> Actually implementing a fair spinlocks and fair rwlocks on the x86 arch are
> very easy (at least, if you have XADD it is). Any arch which has CMPXCHG can
> also do it, just not so easily.

> I've attached an implementation of a fair spinlock and an implementation of a
> fair rwlock (which can be compiled and played with in u-space).

> David

There are a selection of similar algorithms here:
        http://www.cs.rochester.edu/u/scott/synchronization/pseudocode/rw.htm...

How does yours compare?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

reader-writer livelock problem

Post by Linus Torval » Sat, 09 Nov 2002 20:10:08




Quote:

>Even without interrupts that would be a bug.  It isn't ever safe to
>attempt to retake a read lock if you already hold it, because you may
>deadlock with a pending writer.  Fair multi-reader locks aren't
>recursive locks.

.. but I don't think we have any real users who use them for recursion,
so the only "recursion" right now is through interrupts that use this
feature.

(At least that was true a long time time ago, maybe we've added truly
recursive users since)

Quote:>> Actually, giving this som emore thought, I really suspect that the
>> simplest solution is to alloc a separate "fair_read_lock()", and paths
>> that need to care about fairness (and know they don't have the irq
>> issue)  
>> can use that, slowly porting users over one by one...

>Do you mean have a separate lock type, or have two different read_lock
>operations on the current type?

That depends on whether it is even sanely implementable to share the
same lock. It may not be.

From a migration standpoint it would be easiest (by far) to be able to
share the lock type and to mix operations (ie an interrupt - or
recursive user - could just use the non-fair version, while others could
use the fair version on the same lock).  However, I have this nagging
suspicion that it might be a total nightmare to implement efficiently in
practice.

I've not looked at it. Any ideas?

                Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

reader-writer livelock problem

Post by David Howell » Sat, 09 Nov 2002 20:10:08


> I've attached an implementation of a fair spinlock and an implementation of a
> fair rwlock (which can be compiled and played with in u-space).

For those that prefer patches (or at least something that'll compile without
warnings)...

David

diff -uNr linux-2.5.46/include/asm-i386/fairlock.h linux-fair-2546/include/asm-i386/fairlock.h
--- linux-2.5.46/include/asm-i386/fairlock.h    1970-01-01 01:00:00.000000000 +0100
+++ linux-fair-2546/include/asm-i386/fairlock.h 2002-11-08 17:51:30.000000000 +0000
@@ -0,0 +1,210 @@
+/* fairlock.h: fair spinlocks
+ *
+ * Copyright (C) 2002 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowe...@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _ASM_FAIRLOCK_H
+#define _ASM_FAIRLOCK_H
+
+#include <linux/config.h>
+#include <linux/string.h>
+#include <asm/atomic.h>
+
+typedef struct fairlock {
+       u8      curr;
+       u8      ticket;
+} fairlock_t;
+
+static inline void init_fairlock(fairlock_t *lock)
+{
+       memset(lock,0,sizeof(*lock));
+}
+
+typedef struct rwfairlock {
+
+       /* these member variables must be at the beginning and in this order */
+       u8      rd_curr;        /* next reader ticket allowed to become active */
+       u8      curr;           /* currently active ticket */
+       u8      ticket;         /* next ticket to be issued */
+       u8      __pad;
+
+} __attribute__((aligned(4))) rwfairlock_t;
+
+#define RWFAIRLOCK_UNLOCKED (rwfairlock_t) { 0, 0, 0, 0 }
+
+/*****************************************************************************/
+/*
+ * spin lock fairly
+ */
+static inline void fair_lock(fairlock_t *lock)
+{
+       u8 number = 1;
+
+       __asm__ __volatile__(
+               "# beginning fair_lock\n\t"
+LOCK_PREFIX    "  xaddb     %b2,%1\n\t"      /* number = lock->ticket; lock->ticket++; */
+               "1:\n\t"
+               "  cmpb         %b2,%0\n\t"
+               "  jne       2f\n\t"          /* jump if number!=lock->curr */
+               LOCK_SECTION_START("")
+               "2:\n\t"
+               "  rep; nop\n\t"
+               "  jmp       1b\n"
+               LOCK_SECTION_END
+               "# ending fair_lock\n\t"
+               : "=m"(lock->curr), "=m"(lock->ticket), "=r"(number)
+               : "r"(lock), "m"(lock->curr), "m"(lock->ticket), "2"(number)
+               : "memory", "cc");
+}
+
+/*****************************************************************************/
+/*
+ * spin trylock fairly
+ */
+static inline void fair_trylock(fairlock_t *lock)
+{
+       u32 tmp;
+
+       __asm__ __volatile__(
+               "# beginning fair_trylock\n\t"
+               "  movw      (%3),%%ax\n\t"   /* AL=curr, AH=ticket */
+               "1:\n\t"
+               "  cmpb      %%al,%%ah\n\t"
+               "  je        3f\n\t"          /* jump if maybe we can get it */
+               "2:\n\t"
+               LOCK_SECTION_START("")
+               "3:\n\t"
+               "  leaw      1(%ax),%w2\n\t"  /* [propose] ticket=ticket+1 */
+LOCK_PREFIX    "  cmpxchgw  %w2,(%3)\n\t"
+               "  je        3b\n\t"          /* jump if worked */
+               "  rep; nop\n\t"              /* didn't work; AL & AH have been updated */
+               "  jmp       1b\n"
+               LOCK_SECTION_END
+               "# ending fair_trylock\n\t"
+               : "=m"(lock->curr), "=m"(lock->ticket), "=$r"(tmp)
+               : "r"(lock), "m"(lock->curr), "m"(lock->ticket)
+               : "memory", "cc", "eax");
+}
+
+/*****************************************************************************/
+/*
+ * spin unlock fairly
+ */
+static inline void fair_unlock(fairlock_t *lock)
+{
+       __asm__ __volatile__(
+               "# beginning fair_unlock\n\t"
+LOCK_PREFIX    "  incb      %1\n\t"          /* lock->curr++; */
+               "# ending fair_unlock\n\t"
+               : "=m"(lock->curr)
+               : "r"(lock), "m"(lock->curr)
+               : "memory", "cc");
+}
+
+/*****************************************************************************/
+/*
+ * obtain a write lock
+ * - pull the next ticket from lock->ticket (which is subsequently incremented)
+ * - spin until lock->curr catches up to the value that lock->ticket had before the XADD
+ * - lock->rd_curr is left equal to the lock->curr (and thus my ticket number) to prevent reads
+ *   getting a lock
+ */
+static inline void fair_write_lock(rwfairlock_t *lock)
+{
+       u32 eax;
+
+       asm volatile(
+               "  # begin write_lock      \n"
+LOCK_PREFIX     " xaddw   %%ax,1(%3)      \n"        /* my ticket in AH */
+               "0:        cmpb    %%al,%%ah       \n"        /* lock->curr in AL */
+               "  jne     2f              \n"
+               "  .section .text.lock,\"ax\"\n"
+               "2:                                \n"
+               "  rep; nop                \n"
+               "  movb    1(%3),%%al      \n"        /* reload AL from lock->curr */
+               "  jmp     0b              \n"
+               "  .previous               \n"
+               "  # end write_lock        \n"
+               : "=m"(lock), "=a"(eax)
+               : "m"(lock), "r"(lock), "a"(0x0100)
+               : "memory", "cc"
+               );
+}
+
+/*****************************************************************************/
+/*
+ * release a write lock
+ * - advance both lock->rd_curr and lock->curr by one to enable the next lock to be granted
+ */
+static inline void fair_write_unlock(rwfairlock_t *lock)
+{
+       u32 eax;
+
+       asm volatile(
+               "  # begin write_unlock    \n"
+               "  movw    0(%3),%%ax      \n"
+               "  incb    %%al            \n"        /* lock->rd_curr++ */
+               "  incb    %%ah            \n"        /* lock->curr++ */
+               "  movw    %%ax,0(%3)      \n"
+               "  # end write_unlock      \n"
+               : "=m"(lock), "=&a"(eax)
+               : "m"(lock), "r"(lock)
+               : "cc"
+               );
+}
+
+/*****************************************************************************/
+/*
+ * obtain a read lock
+ * - pull the next ticket from lock->ticket (which is subsequently incremented)
+ * - spin until lock->rd_curr catches up to the value that lock->ticket had before the XADD
+ * - lock->rd_curr is then advanced by one to allow the next read lock to be granted
+ * - lock->curr is irrelevant
+ */
+static inline void fair_read_lock(rwfairlock_t *lock)
+{
+       u32 eax;
+
+       asm volatile(
+               "  # begin read_lock       \n"
+LOCK_PREFIX     " xaddb   %%ah,2(%3)      \n"        /* my ticket in AH */
+               "0:        movb    0(%3),%%al      \n"        /* AL = lock->rd_curr */
+               "  cmpb    %%al,%%ah       \n"        /* if (ticket!=lock->rd_curr)  */
+               "  jne     2f              \n"        /* then jump */
+               "  incb    0(%3)           \n"        /* lock->rd_curr */
+               "  .section .text.lock,\"ax\"\n"
+               "2:                                \n"
+               "  rep; nop                \n"
+               "  jmp     0b              \n"
+               "  .previous               \n"
+               "  # end read_lock \n"
+               : "=m"(lock), "=a"(eax)
+               : "m"(lock), "r"(lock), "a"(0x0100)
+               : "memory", "cc"
+               );
+}
+
+/*****************************************************************************/
+/*
+ * release a read lock
+ * - just advance the lock->curr count so that any spinning write lock can happen
+ */
+static inline void read_unlock(rwfairlock_t *lock)
+{
+       asm volatile(
+               "  # begin read_unlock     \n"
+LOCK_PREFIX    "  incb    %0              \n"        /* lock->curr++ */
+               "  # end read_unlock       \n"
+               : "=m"(lock->curr)
+               : "m"(lock->curr)
+               : "cc"
+               );
+}
+
+#endif /* _ASM_FAIRLOCK_H */
diff -uNr linux-2.5.46/include/asm-i386/spinlock.h linux-fair-2546/include/asm-i386/spinlock.h
--- linux-2.5.46/include/asm-i386/spinlock.h    2002-10-15 10:12:35.000000000 +0100
+++ linux-fair-2546/include/asm-i386/spinlock.h 2002-11-08 17:50:39.000000000 +0000
@@ -4,6 +4,7 @@
 #include <asm/atomic.h>
 #include <asm/rwlock.h>
 #include <asm/page.h>
+#include <asm/fairlock.h>
 #include <linux/config.h>

 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

reader-writer livelock problem

Post by Matthew Wilco » Sat, 09 Nov 2002 20:10:09



> processor to acquire/release the lock once.  So with 32 processors
> contending for the lock, at 1us per cache-cache transfer (picked

if you have 32 processors contending for the same spinlock, you have
bigger problems.

--
Revolutions do not require corporate support.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

reader-writer livelock problem

Post by Van Maren, Kevi » Sat, 09 Nov 2002 20:10:14


Absolutely you should minimize the locking contention.
However, that isn't always possible, such as when you
have 64 processors contending on the same resource.
With the current kernel, the trivial example with reader/
writer locks was having them all call gettimeofday().
But try having 64 processors fstat() the same file,
which I have also seen happen (application looping,
waiting for another process to finish setting up the
file so they can all mmap it).

What MCS locks do is they reduce the number of times
the cacheline has to be flung around the system in
order to get work done: they "scale" much better with
the number of processors: O(N) instead of O(N^2).

Kevin

-----Original Message-----
From: Matthew Wilcox
To: Van Maren, Kevin

Cc: 'Linus Torvalds '; 'Jeremy Fitzhardinge '; 'William Lee Irwin III ';


Sent: 11/8/02 12:52 PM
Subject: Re: [Linux-ia64] reader-writer livelock problem


> processor to acquire/release the lock once.  So with 32 processors
> contending for the lock, at 1us per cache-cache transfer (picked

if you have 32 processors contending for the same spinlock, you have
bigger problems.

--
Revolutions do not require corporate support.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

reader-writer livelock problem

Post by Matthew Wilco » Sat, 09 Nov 2002 21:30:10



> Absolutely you should minimize the locking contention.
> However, that isn't always possible, such as when you
> have 64 processors contending on the same resource.

if you've got 64 processors contending on the same resource, maybe you
need to split that resource up so they can have a copy each.  all that
cacheline bouncing can't do your numa boxes any good.

Quote:> With the current kernel, the trivial example with reader/
> writer locks was having them all call gettimeofday().

i hear x86-64 has a lockless gettimeofday.  maybe that's the solution.

Quote:> But try having 64 processors fstat() the same file,
> which I have also seen happen (application looping,
> waiting for another process to finish setting up the
> file so they can all mmap it).

umm.. the call trace:

sys_fstat
|-> vfs_fstat
|   |-> fget
|       |-> read_lock(&files->file_lock)
|   |-> vfs_getattr
|       |-> inode->i_op->getattr
|       |-> generic_fillattr
|-> cp_new_stat64
    |-> memset
    |-> copy_to_user

so you're talking about contention on files->file_lock, right?  it's really
not the kernel's fault that your app is badly written.  that lock's private
to process & children, so it's not like another application can hurt you.

Quote:> What MCS locks do is they reduce the number of times
> the cacheline has to be flung around the system in
> order to get work done: they "scale" much better with
> the number of processors: O(N) instead of O(N^2).

yes, but how slow are they in the uncontended case?

--
Revolutions do not require corporate support.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

reader-writer livelock problem

Post by David Mosberge » Sat, 09 Nov 2002 21:40:05


  Matthew> On Fri, Nov 08, 2002 at 12:05:30PM -0600, Van Maren, Kevin
  >> Absolutely you should minimize the locking contention.  However,
  >> that isn't always possible, such as when you have 64 processors
  >> contending on the same resource.

  Matthew> if you've got 64 processors contending on the same
  Matthew> resource, maybe you need to split that resource up so they
  Matthew> can have a copy each.  all that cacheline bouncing can't do
  Matthew> your numa boxes any good.

Matthew, please understand that this is NOT a performance problem.
It's a correctness problem.  If livelock can resolut from read-write locks,
it's a huge security problem.  Period.

        --david
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

1. +AFs-Linux-ia64+AF0- reader-writer livelock proble

  Van> I have not looked at this, but I don't believe it is the right
  Van> way to solve the problem: users who +AF8-need+AF8- to use all
  Van> the CPUs for computation would be punished just to work around
  Van> a kernel implementation issue: that's like saying don't allow
  Van> processes to allocate virtual memory because if the VM is
  Van> over-committed by X amount the kernel deadlocks.

  Van> It would be a bad hack to limit the system-call rate just to
  Van> prevent livelock.

Certainly.  I didn't suggest PRM as a way to _solve_ the livelock
problem, but since Mario asked for a method to cap CPU utilization, I
mentiond it.

        --david
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2. Unable to connect to FTP.

3. DS-XG Configuration Problems (Alsa)

4. "tee", but with fast writer, 1 slow reader and 1 fast reader

5. Viruses and Linux

6. reader-writer spin locks

7. my list of things to do on linux

8. Reader/writer problem.

9. readers-writers problem using SYSV semaphores

10. Is this a reader writer problem?

11. pipe() problem - one reader, multiple writers?

12. Disk based buffer, one writer, many readers problem