> This works like this. you have 4 drives say, named A,B.,C,D. It will write
> the first block of data between A, and B with parity writen to C. The next
> sector will be written to D and A, with parity writen to B. etc..
This isn't how I've learned RAID 5 theory. What I learned goes like this:
Each write, you create a data stripe for all but one drive, create a parity
stripe by XORing all the data stripes, and place the parity stripe and data
stripes on the drives. The parity stripe round robins across all the drives,
so that it looks like this for each write:
A B C D
D D D P
P D D D
D P D D
D D P D
And so on adinifinitum.
The rest is all pretty much right.
Some insight on performance limitations. The RAID array will get faster as the
drive count climbs, until you either saturate the parity generation engine
(CPU, Hardware RAID, whatever generates parity) or you saturate the SCSI busses
the or whatever buss the drive is connected to.
Parity Generation is best left to small fast RISC engines really good at it
that can offload the overhead from the CPU. Many of the <$1000 cards don't do
very well, like the Adaptec AHA 133 series. ugh, abysmal performance on any
size array we tried. The MegaRAID by AMI and the newer DAC 960s seem to be
good controllers, with high numbers and good hardware parity gen, and several
options of multiple cards / channels all being combined into one
Another thing that can kill you on a large drive number raid array is small
writes. If you have a database that does changes in the range of hundreds of
bytes at a time, and you've got say, 20 drives with a stripe size of 4k, you've
got to retreive the 20x4k of data, change 1 or 100 bytes, recalculate the
parity strip, and write out the 20x4k. Reads are fast, but all writes require
this type of operation. If you're building a FTP server where the average file
size is 100k and above, then the 20x4k is no big deal, and you would likely see
good performance there.
In hardware controllers you'd use mirrored strip sets (RAID 0+1) to get best
database performance. In Linux, you can do almost as well with software if you
make a 3 or 4 drive mirror set.
Reads are spread across the multiple drives, so things like databases are
likely to have three or so heads available at a time to read data. Writes are
gonna be slower than RAID 0, but not terribly slow since the algorhythm for
mirrors is : make a bunch of copies and write them out. I don't if other Oses
support >2 drives in a mirror, but it is a wonderfully fast setup under Linux
for small data access.
So, here's my short recap, in table format:
| Application | RAID type I use |
| Database | RAID-1 x 4 |
| Web Server | RAID 1 x 2 |
| File & Print | RAID 5 x [6-10] |