Is Swapping on software RAID1 possible in linux 2.4 ?

Is Swapping on software RAID1 possible in linux 2.4 ?

Post by Peter Zaitse » Fri, 06 Jul 2001 20:30:11



Hello linux-kernel,

  Does anyone have information on this subject ?  I have the constant
  failures with system swapping on RAID1, I just wanted to be shure
  this may be the problem or not.   It works without any problems with
  2.2 kernel.

--
Best regards,

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Is Swapping on software RAID1 possible in linux 2.4 ?

Post by Neil Brow » Fri, 06 Jul 2001 21:20:10



Quote:> Hello linux-kernel,

>   Does anyone have information on this subject ?  I have the constant
>   failures with system swapping on RAID1, I just wanted to be shure
>   this may be the problem or not.   It works without any problems with
>   2.2 kernel.

It certainly should work in 2.4.  What sort of "constant failures" are
you experiencing?

Though it does appear to work in 2.2, there is a possibility of data
corruption if you swap onto a raid1 array that is resyncing.  This
possibility does not exist in 2.4.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Is Swapping on software RAID1 possible in linux 2.4 ?

Post by Peter Zaitse » Fri, 06 Jul 2001 22:30:13


Hello Neil,



Quote:>> Hello linux-kernel,

>>   Does anyone have information on this subject ?  I have the constant
>>   failures with system swapping on RAID1, I just wanted to be shure
>>   this may be the problem or not.   It works without any problems with
>>   2.2 kernel.

NB> It certainly should work in 2.4.  What sort of "constant failures" are
NB> you experiencing?

NB> Though it does appear to work in 2.2, there is a possibility of data
NB> corruption if you swap onto a raid1 array that is resyncing.  This
NB> possibility does not exist in 2.4.

The problem is I'm constantly getting these  X-order-allocation errors
in kernel log and after which system becomes unstable and often hangs
or leaves process which cannot be killed even by "-9" signal.
Installed debuggin patches produce the following allocation paths:

Quote:> Jun 20 05:56:14 tor kernel: Call Trace: [__get_free_pages+20/36]
> [__get_free_pages+20/36] [kmem_cache_grow+187/520] [kmalloc+183/224]
> [raid1_alloc_r1bh+105/256] [raid1_make_request+832/852]
> [raid1_make_request+80/852]
> Jun 20 05:56:14 tor kernel:        [md_make_request+79/124]
> [generic_make_request+293/308] [submit_bh+87/116] [brw_page+143/160]
> [rw_swap_page_base+336/428] [rw_swap_page+112/184] [swap_writepage+120/128]
> [page_launder+644/2132]
> Jun 20 05:56:14 tor kernel:        [do_try_to_free_pages+52/124]
> [kswapd+89/228] [kernel_thread+40/56]

one more trace:

SR>>Jun 19 09:50:08 garnet kernel: __alloc_pages: 0-order allocation failed.
SR>>Jun 19 09:50:08 garnet kernel: __alloc_pages: 0-order allocation failed from
SR>>c01Jun 19 09:50:08 garnet kernel: ^M^Mf4a2bc74 c024ac20 00000000 c012ca09
SR>>c024abe0
SR>>Jun 19 09:50:08 garnet kernel:        00000008 c03225e0 00000003 00000001
SR>>c029c9Jun 19 09:50:08 garnet kernel:        f0ebb760 00000001 00000008
SR>>c03225e0 c0197bJun 19 09:50:08 garnet kernel: Call Trace:
SR>>[alloc_bounce_page+13/140] [alloc_bouJun 19 09:50:08 garnet kernel:
SR>>[raid1_make_request+832/852] [md_make_requJun 19 09:50:08 garnet kernel:
SR>>[swap_writepage+120/128] [page_launder+644Jun 19 09:50:08 garnet kernel:
SR>>[sock_poll+35/40] [do_select+230/476] [sysJun 19 10:21:27 garnet kernel:
SR>>sending pkt_too_big to self
SR>>Jun 19 10:21:55 garnet kernel: sending pkt_too_big to self
SR>>Jun 19 10:34:36 garnet kernel: sending pkt_too_big to self
SR>>Jun 19 10:35:33 garnet last message repeated 2 times
SR>>Jun 19 10:36:50 garnet kernel: sending pkt_too_big to self

That's why I thought this problem is related to raid1 swapping I'm
using.

Well. Of couse I'm speaking about synced RAID1.

--
Best regards,

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Is Swapping on software RAID1 possible in linux 2.4 ?

Post by Arjan van de Ve » Fri, 06 Jul 2001 22:50:08



> That's why I thought this problem is related to raid1 swapping I'm
> using.

Well there is the potential problem that RAID1 has that it can't avoid
allocating
memory in some occasions, for the 2nd bufferhead. ATARAID raid0 has the
same problem for
now, and there is no real solution to this. You can pre-allocate a bunch
of bufferheads,
but under high load you will run out of those, no matter how many you
pre-allocate.

Of course you can then wait for the "in flight" ones to become available
again, and that is
the best thing I've come up with so far. It would be nice if the 3
subsystems that need such
bufferheads now (MD RAID1, ATARAID RAID0 and the bouncebuffer(head)
code) could share their
pool.

Greetings,
   Arjan van de Ven
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Is Swapping on software RAID1 possible in linux 2.4 ?

Post by Nick DeClari » Sat, 07 Jul 2001 00:00:12


Just out of curiousity what are the advantages to having a RAID1 swap
partition?  Setting the swap priority to 0 (pri=0) in the fstab of all
the swap partitions on your system should have the same effect as doing
it with RAID but without the overhead, right?  RAID1 would also mirror
your swap.  Why would you want that?

Regards,
        -Nick


> Hello linux-kernel,

>   Does anyone have information on this subject ?  I have the constant
>   failures with system swapping on RAID1, I just wanted to be shure
>   this may be the problem or not.   It works without any problems with
>   2.2 kernel.

> --
> Best regards,

> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
Nicholas DeClario
Systems Engineer                            Guardian Digital, Inc.
(201) 934-9230                Pioneering.  Open Source.  Security.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
 
 
 

Is Swapping on software RAID1 possible in linux 2.4 ?

Post by Joseph Buen » Sat, 07 Jul 2001 00:20:05



> Just out of curiousity what are the advantages to having a RAID1 swap
> partition?  Setting the swap priority to 0 (pri=0) in the fstab of all
> the swap partitions on your system should have the same effect as doing
> it with RAID but without the overhead, right?  RAID1 would also mirror
> your swap.  Why would you want that?

> Regards,
>         -Nick

Hi,

Setting swap priority to 0 is equivalent to RAID0 (striping) not RAID1 (mirroring).

Mirroring your swap partition is important because if the disk containing
your swap fails, your system is dead. If you want to keep your system running
even if one disk fails you need to mirror ALL your active partitions including
swap.
If you only mirror your data partitions, your are only protected against data
loss in case of a disk crash (assuming you shutdown gracefully before it panics
while it tries to read/write  on a crashed swap partition and leave your data in
some inconsistent state).

Regards
--
Joseph Bueno
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Is Swapping on software RAID1 possible in linux 2.4 ?

Post by Pete Zaitce » Sat, 07 Jul 2001 04:00:13




> > That's why I thought this problem is related to raid1 swapping I'm
> > using.

> Well there is the potential problem that RAID1 has that it can't avoid allocating
> memory in some occasions, for the 2nd bufferhead. ATARAID raid0 has the same problem for
> now, and there is no real solution to this. You can pre-allocate a bunch of bufferheads,
> but under high load you will run out of those, no matter how many you pre-allocate.

Arjan, why doesn't it sleep instead (GFP_KERNEL)?

-- Pete
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Is Swapping on software RAID1 possible in linux 2.4 ?

Post by Paul Jakm » Thu, 12 Jul 2001 21:20:05



> RAID1 would also mirror your swap.  Why would you want that?

redundancy. no point having your data redundant if your swap isn't -
1 drive failure will take out the box the moment it tries to access
swap on the failed drive.

PS: i have 2 boxes deployed running RH's 2.4.2, with swap on top of
LVM on top of RAID1. no problems sofar, even during resync.

Quote:> Regards,
>    -Nick

--paulj

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Is Swapping on software RAID1 possible in linux 2.4 ?

Post by Neil Brow » Fri, 13 Jul 2001 10:20:11



> Hello Neil,



> >> Hello linux-kernel,

> >>   Does anyone have information on this subject ?  I have the constant
> >>   failures with system swapping on RAID1, I just wanted to be shure
> >>   this may be the problem or not.   It works without any problems with
> >>   2.2 kernel.

> NB> It certainly should work in 2.4.  What sort of "constant failures" are
> NB> you experiencing?

> NB> Though it does appear to work in 2.2, there is a possibility of data
> NB> corruption if you swap onto a raid1 array that is resyncing.  This
> NB> possibility does not exist in 2.4.

> The problem is I'm constantly getting these  X-order-allocation errors
> in kernel log and after which system becomes unstable and often hangs
> or leaves process which cannot be killed even by "-9" signal.
> Installed debuggin patches produce the following allocation paths:

These "X-order-allocation" failures are just an indication that you
are running out or memory.  raid1 is explicitly written to cope.
If memory allocation fails it waits for some to be free, and it has
made sure in advance that there is some memory that it will get
first-dibs on when it becomes free, so there is no risk of deadlock.

However this does not explain why you are getting unkillable
processes.

Can you try to put swap on just one of the partitions that your raid1
together instead of on the raid1 array and see if you can get
processes to become unkillable.

Also, can you find out what that process is doing when it is
unkillable.
If you compile with alt-sysrq support, then alt-sysrq-t should print
the process table.  If you can get this out of dmesg and run if though
ksymoops it might be most interesting.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Is Swapping on software RAID1 possible in linux 2.4 ?

Post by Andrew Morto » Fri, 13 Jul 2001 11:00:06


Neil Brown wrote:

> Also, can you find out what that process is doing when it is
> unkillable.
> If you compile with alt-sysrq support, then alt-sysrq-t should print
> the process table.  If you can get this out of dmesg and run if though
> ksymoops it might be most interesting.

Neil, he showed us a trace the other day - kswapd was
stuck in raid1_alloc_r1_bh().  This is basically the
same situation as I had yesterday, where bdflush was stuck
in the same place.

It is completely fatal to the VM for these two processes to
get stuck in this way.  The approach I took was to beef up
the reserved bh queues and to keep a number of them
reserved *only* for the swapout and dirty buffer flush functions.
That way, we have at hand the memory we need to be able to
free up memory.

It was necessary to define a new task_struct.flags bit so we
can identify when the caller is a `buffer flusher' - I expect
we'll need that in other places as well.

An easy way to demonstrate the problem is to put ext3 on RAID1,
boot with `mem=64m' and run `dd if=/dev/zero of=foo bs=1024k count=1k'.
The machine wedges on the first run.  This is due to a bdflush deadlock.
Once swap is on RAID1, there will be kswapd deadlocks as well.  The
patch *should* fix those, but I haven't tested that.

Could you please review these changes?

BTW: I removed the initial buffer_head reservation code.  It's
not necessary with the modified reservation algorithm - as soon
as we start to use the device the reserve pools will build
up.  There will be a deadlock opportunity if the machine is totally
and utterly oom when the RAID device initially starts up, but it's
really not worth the code space to even bother about this.

--- linux-2.4.6/include/linux/sched.h   Wed May  2 22:00:07 2001
+++ lk-ext3/include/linux/sched.h       Thu Jul 12 01:03:20 2001
@@ -413,7 +418,7 @@ struct task_struct {
 #define PF_SIGNALED    0x00000400      /* killed by a signal */
 #define PF_MEMALLOC    0x00000800      /* Allocating memory */
 #define PF_VFORK       0x00001000      /* Wake up parent in mm_release */
-
+#define PF_FLUSH       0x00002000      /* Flushes buffers to disk */
 #define PF_USEDFPU     0x00100000      /* task used FPU this quantum (SMP) */

 /*
--- linux-2.4.6/include/linux/raid/raid1.h      Tue Dec 12 08:20:08 2000
+++ lk-ext3/include/linux/raid/raid1.h  Thu Jul 12 01:15:39 2001
@@ -37,12 +37,12 @@ struct raid1_private_data {
        /* buffer pool */
        /* buffer_heads that we have pre-allocated have b_pprev -> &freebh
         * and are linked into a stack using b_next
-        * raid1_bh that are pre-allocated have R1BH_PreAlloc set.
         * All these variable are protected by device_lock
         */
        struct buffer_head      *freebh;
        int                     freebh_cnt;     /* how many are on the list */
        struct raid1_bh         *freer1;
+       unsigned                freer1_cnt;
        struct raid1_bh         *freebuf;       /* each bh_req has a page allocated */
        md_wait_queue_head_t    wait_buffer;

@@ -87,5 +87,4 @@ struct raid1_bh {
 /* bits for raid1_bh.state */
 #define        R1BH_Uptodate   1
 #define        R1BH_SyncPhase  2
-#define        R1BH_PreAlloc   3       /* this was pre-allocated, add to free list */
 #endif
--- linux-2.4.6/fs/buffer.c     Wed Jul  4 18:21:31 2001
+++ lk-ext3/fs/buffer.c Thu Jul 12 01:03:57 2001
@@ -2685,6 +2748,7 @@ int bdflush(void *sem)
        sigfillset(&tsk->blocked);
        recalc_sigpending(tsk);
        spin_unlock_irq(&tsk->sigmask_lock);
+       current->flags |= PF_FLUSH;

        up((struct semaphore *)sem);

@@ -2726,6 +2790,7 @@ int kupdate(void *sem)
        siginitsetinv(&current->blocked, sigmask(SIGCONT) | sigmask(SIGSTOP));
        recalc_sigpending(tsk);
        spin_unlock_irq(&tsk->sigmask_lock);
+       current->flags |= PF_FLUSH;

        up((struct semaphore *)sem);

--- linux-2.4.6/drivers/md/raid1.c      Wed Jul  4 18:21:26 2001
+++ lk-ext3/drivers/md/raid1.c  Thu Jul 12 01:28:58 2001
@@ -51,6 +51,28 @@ static mdk_personality_t raid1_personali
 static md_spinlock_t retry_list_lock = MD_SPIN_LOCK_UNLOCKED;
 struct raid1_bh *raid1_retry_list = NULL, **raid1_retry_tail;

+/*
+ * We need to scale the number of reserved buffers by the page size
+ * to make writepage()s sucessful. --akpm
+ */
+#define R1_BLOCKS_PP                   (PAGE_CACHE_SIZE / 1024)
+#define FREER1_MEMALLOC_RESERVED       (16 * R1_BLOCKS_PP)
+
+/*
+ * Return true if the caller make take a bh from the list.
+ * PF_FLUSH and PF_MEMALLOC tasks are allowed to use the reserves, because
+ * they're trying to *free* some memory.
+ *
+ * Requires that conf->device_lock be held.
+ */
+static int may_take_bh(raid1_conf_t *conf, int cnt)
+{
+       int min_free = (current->flags & (PF_FLUSH|PF_MEMALLOC)) ?
+                       cnt :
+                       (cnt + FREER1_MEMALLOC_RESERVED * conf->raid_disks);
+       return conf->freebh_cnt >= min_free;
+}
+
 static struct buffer_head *raid1_alloc_bh(raid1_conf_t *conf, int cnt)
 {
        /* return a linked list of "cnt" struct buffer_heads.
@@ -62,7 +84,7 @@ static struct buffer_head *raid1_alloc_b
        while(cnt) {
                struct buffer_head *t;
                md_spin_lock_irq(&conf->device_lock);
-               if (conf->freebh_cnt >= cnt)
+               if (may_take_bh(conf, cnt))
                        while (cnt) {
                                t = conf->freebh;
                                conf->freebh = t->b_next;
@@ -83,7 +105,7 @@ static struct buffer_head *raid1_alloc_b
                        cnt--;
                } else {
                        PRINTK("raid1: waiting for %d bh\n", cnt);
-                       wait_event(conf->wait_buffer, conf->freebh_cnt >= cnt);
+                       wait_event(conf->wait_buffer, may_take_bh(conf, cnt));
                }
        }
        return bh;
@@ -96,9 +118,9 @@ static inline void raid1_free_bh(raid1_c
        while (bh) {
                struct buffer_head *t = bh;
                bh=bh->b_next;
-               if (t->b_pprev == NULL)
+               if (conf->freebh_cnt >= FREER1_MEMALLOC_RESERVED) {
                        kfree(t);
-               else {
+               } else {
                        t->b_next= conf->freebh;
                        conf->freebh = t;
                        conf->freebh_cnt++;
@@ -108,29 +130,6 @@ static inline void raid1_free_bh(raid1_c
        wake_up(&conf->wait_buffer);
 }

-static int raid1_grow_bh(raid1_conf_t *conf, int cnt)
-{
-       /* allocate cnt buffer_heads, possibly less if kalloc fails */
-       int i = 0;
-
-       while (i < cnt) {
-               struct buffer_head *bh;
-               bh = kmalloc(sizeof(*bh), GFP_KERNEL);
-               if (!bh) break;
-               memset(bh, 0, sizeof(*bh));
-
-               md_spin_lock_irq(&conf->device_lock);
-               bh->b_pprev = &conf->freebh;
-               bh->b_next = conf->freebh;
-               conf->freebh = bh;
-               conf->freebh_cnt++;
-               md_spin_unlock_irq(&conf->device_lock);
-
-               i++;
-       }
-       return i;
-}
-
 static int raid1_shrink_bh(raid1_conf_t *conf, int cnt)
 {
        /* discard cnt buffer_heads, if we can find them */
@@ -147,7 +146,16 @@ static int raid1_shrink_bh(raid1_conf_t
        md_spin_unlock_irq(&conf->device_lock);
        return i;
 }
-              
+
+/*
+ * Return true if the caller make take a raid1_bh from the list.
+ * Requires that conf->device_lock be held.
+ */
+static int may_take_r1bh(raid1_conf_t *conf)
+{
+       return ((conf->freer1_cnt > FREER1_MEMALLOC_RESERVED) ||
+                 (current->flags & (PF_FLUSH|PF_MEMALLOC))) && conf->freer1;
+}

 static struct raid1_bh *raid1_alloc_r1bh(raid1_conf_t *conf)
 {
@@ -155,8 +163,9 @@ static struct raid1_bh *raid1_alloc_r1bh

        do {
                md_spin_lock_irq(&conf->device_lock);
-               if (conf->freer1) {
+               if (may_take_r1bh(conf)) {
                        r1_bh = conf->freer1;
+                       conf->freer1_cnt--;
                        conf->freer1 = r1_bh->next_r1;
                        r1_bh->next_r1 = NULL;
                        r1_bh->state = 0;
@@ -170,7 +179,7 @@ static struct raid1_bh *raid1_alloc_r1bh
                        memset(r1_bh, 0, sizeof(*r1_bh));
                        return r1_bh;
                }
-               wait_event(conf->wait_buffer, conf->freer1);
+               wait_event(conf->wait_buffer, may_take_r1bh(conf));
        } while (1);
 }

@@ -178,49 +187,30 @@ static inline void raid1_free_r1bh(struc
 {
        struct buffer_head *bh = r1_bh->mirror_bh_list;
        raid1_conf_t *conf = mddev_to_conf(r1_bh->mddev);
+       unsigned long flags;

        r1_bh->mirror_bh_list = NULL;

-       if (test_bit(R1BH_PreAlloc, &r1_bh->state)) {
-               unsigned long flags;
-               spin_lock_irqsave(&conf->device_lock, flags);
+       spin_lock_irqsave(&conf->device_lock, flags);
+       if (conf->freer1_cnt < FREER1_MEMALLOC_RESERVED) {
                r1_bh->next_r1 = conf->freer1;
                conf->freer1 = r1_bh;
+               conf->freer1_cnt++;
                spin_unlock_irqrestore(&conf->device_lock, flags);
        } else {
+               spin_unlock_irqrestore(&conf->device_lock, flags);
                kfree(r1_bh);
        }
        raid1_free_bh(conf, bh);
 }

-static int raid1_grow_r1bh (raid1_conf_t *conf, int cnt)
-{
-       int i = 0;
-
-       while (i < cnt) {
-               struct raid1_bh *r1_bh;
-               r1_bh = (struct raid1_bh*)kmalloc(sizeof(*r1_bh), GFP_KERNEL);
-               if (!r1_bh)
-                       break;
-               memset(r1_bh, 0, sizeof(*r1_bh));
-
-               md_spin_lock_irq(&conf->device_lock);
-               set_bit(R1BH_PreAlloc, &r1_bh->state);
-               r1_bh->next_r1 = conf->freer1;
-               conf->freer1 = r1_bh;
-               md_spin_unlock_irq(&conf->device_lock);
-
-               i++;
-       }
-       return i;
-}
-
 static void raid1_shrink_r1bh(raid1_conf_t *conf)
 {
        md_spin_lock_irq(&conf->device_lock);
        while (conf->freer1) {
                struct raid1_bh *r1_bh = conf->freer1;
                conf->freer1 = r1_bh->next_r1;
+               conf->freer1_cnt--;  /* pedantry */
                kfree(r1_bh);
        }
        md_spin_unlock_irq(&conf->device_lock);
@@ -1610,21 +1600,6 @@ static int raid1_run (mddev_t *mddev)
                goto out_free_conf;
        }

-
-       /* pre-allocate some buffer_head structures.
-        * As a minimum, 1 r1bh and raid_disks buffer_heads
-        * would probably get us by in tight memory situations,
-        * but a few more is probably a good idea.
-        * For now, try 16 r1bh and 16*raid_disks bufferheads
-        * This will allow at least 16 concurrent reads or writes
-        * even if kmalloc starts failing
-        */
-       if (raid1_grow_r1bh(conf, 16) < 16 ||
-           raid1_grow_bh(conf, 16*conf->raid_disks)< 16*conf->raid_disks) {
-               printk(MEM_ERROR, mdidx(mddev));
-               goto out_free_conf;
-       }
-
        for (i = 0; i < MD_SB_DISKS; i++) {

                descriptor = sb->disks+i;
@@ -1713,6 +1688,8 @@ out_free_conf:
        raid1_shrink_r1bh(conf);
        raid1_shrink_bh(conf, conf->freebh_cnt);
        raid1_shrink_buffers(conf);
+       if (conf->freer1_cnt != 0)
+               BUG();
        kfree(conf);
        mddev->private = NULL;
 out:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Is Swapping on software RAID1 possible in linux 2.4 ?

Post by Neil Brow » Fri, 13 Jul 2001 12:30:10



Quote:

> Could you please review these changes?

I think I see what you are trying to do, and there is nothing
obviously wrong except this comment :-)

Quote:> + * Return true if the caller make take a raid1_bh from the list.

                                ^^^^

but now that I see what the problem is, I think a simpler patch would
be

--- drivers/md/raid1.c  2001/07/12 02:00:35     1.1

                        cnt--;
                } else {
                        PRINTK("raid1: waiting for %d bh\n", cnt);
+                       run_task_queue(&tq_disk);
                        wait_event(conf->wait_buffer, conf->freebh_cnt >= cnt);
                }

                        memset(r1_bh, 0, sizeof(*r1_bh));
                        return r1_bh;
                }
+               run_task_queue(&tq_disk);
                wait_event(conf->wait_buffer, conf->freer1);
        } while (1);
 }

This is needed anyway to be "correct", as you should always unplug
the queues before waiting for IO to complete.

On the issue of whether to pre-allocate some reserved structures or
not, I think it's "6-of-one-half-a-dozen-of-the-other".  My rationale
for pre-allocating was that the buffer that we hold on to would have
been allocated together and so probably are fairly dense within their
pages, and so there is no risk of hogging excess memory that isn't
actually being used.  Mind you, if I was really serious about being
gentle on the memory allocation, I would use
   kmem_cache_alloc(bh_cachep,SLAB_whatever)
instead of
   kmalloc(sizeof(struct buffer_head), GFP_whatever)
but I hadn't 'got' the slab stuff properly when I was writing that
code.

Peter, does the above little patch help your problem?

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Is Swapping on software RAID1 possible in linux 2.4 ?

Post by Andrew Morto » Fri, 13 Jul 2001 14:00:06



> --- drivers/md/raid1.c  2001/07/12 02:00:35     1.1
> +++ drivers/md/raid1.c  2001/07/12 02:01:42

>                         cnt--;
>                 } else {
>                         PRINTK("raid1: waiting for %d bh\n", cnt);
> +                       run_task_queue(&tq_disk);
>                         wait_event(conf->wait_buffer, conf->freebh_cnt >= cnt);
>                 }
>         }

>                         memset(r1_bh, 0, sizeof(*r1_bh));
>                         return r1_bh;
>                 }
> +               run_task_queue(&tq_disk);
>                 wait_event(conf->wait_buffer, conf->freer1);
>         } while (1);
>  }

> This is needed anyway to be "correct", as you should always unplug
> the queues before waiting for IO to complete.

The problem with this approach is the waitqueue - you get several
tasks on the waitqueue, and bdflush loses the race - some other
thread steals the r1bh and bdflush goes back to sleep.

Replacing the wait_event() with a special raid1_wait_event()
which unplugs *each time* the caller is woken does help - but
it is still easy to deadlock the system.

Clearly this approach is racy: it assumes that the reserved buffers have
actually been submitted when we unplug - they may not yet have been.
But the lockup is too easy to trigger for that to be a satisfactory
explanation.

The most effective, aggressive, successful and grotty fix for this
problem is to remove the wait_event altogether and replace it with:

        run_task_queue(tq_disk);
        current->policy |= SCHED_YIELD;
        __set_current_state(TASK_RUNNING);
        schedule();

This can still deadlock in bad OOM situations, but I think we're
dead anyway.  A combination of this approach plus the PF_FLUSH
reservations would work even better, but I found the PF_FLUSH
stuff was sufficient.

Quote:> Mind you, if I was really serious about being
> gentle on the memory allocation, I would use
>    kmem_cache_alloc(bh_cachep,SLAB_whatever)
> instead of
>    kmalloc(sizeof(struct buffer_head), GFP_whatever)

get/put_unused_buffer_head() should be exported API functions.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

1. Problem with networking - I am using RedHat 9 Linux kernel 2.4

Problem with networking - I am using RedHat 9 Linux kernel 2.4

I have the following environment:

A NetGADSL router IP address 192.168.0.100. Attached to this is my Linux
server which is dual homed. One nic is attached to the ADSL router - this
is the eth0 interface, ip address 192.168.0.22. This interface has the
NetGear router set as the default gateway address. The other nic - eth1 -
has an ip address of 10.0.0.1 and this is attached to an 8 port switch.
Connected to the 8 port switch is another machine running Microsoft
Windows XP Professional with an ip address of 10.0.0.23. It has a default
gateway set to be 10.0.0.1.

I only have one static IP address. The NetGear router is set up for NAT.
The Linux server is also set up as being in the DMZ of the NetGear router.
This only has the effect of passing all requests to the Linux server and
not dropping them.

On the Linux server I have stopped the IPTABLES service so as not to
complicate matters further and I have enabled IP forwarding. I have also
enabled and configured the web proxy Squid.

My linux server can ping the ADSL hub on 192.168.0.100, it can ping the
machine 10.0.0.23 and it can ping any valid address on the internet.

My XP machine - 10.0.0.23 - can ping the server on either of its
interfaces and it can ping the ADSL hub - proof that the server is
forwarding IP packets. I can also browse the web from 10.0.0.23 via the
Squid proxy.

However, I cannot ping any address on the internet.

I was wondering if I needed to implement IP Masquerading but thought that
the NAT on the NetGear hub should be sufficient.

Can anyone please shed any light on what I need to do to get the Windows
machine to be able to communicate to the internet via the server.

Thanks in advance

Graham Jones

-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----==  Over 80,000 Newsgroups - 16 Different Servers! =-----

2. README.linux.console

3. 2.4 and cryptofs on raid1 - what will be cached and how many times

4. ICP Vortex Array Controllers and Memory

5. is active-filter possible on linux 2.4.x?

6. Linux on IBM PS/1?

7. linux swap freeze STILL in 2.4.x

8. various OS

9. Linux 2.4, Serial Port IRQ sharing, possible?

10. linux software raid1 problem

11. is it possible to install linux kernel 2.4 in sun sparc?

12. 2.4.19: Strange raid1 resync problem with raid1 on top of multipath raids

13. Possible Bug with MD multipath and raid1 on top