>>> From what I've read about Linux ndb it will allow aggregation
>>> of separate remote ndb exported partitions in a RAID 0,1 or
>>> even 5 md volume. However, because of the lack of distributed
>>> locking, only one client can mount the volumes r/w.
>>This is correct. But the aggregation is not neccessary. You can always
>>run RAID linear over [several] nbd devices.
> Are there any advantages to doing so rather than treating them
> as one logical volume?
Eh? What do you mean by the latter? Nobody mentioned LVM, I think.
The distinction I was drawing was between the NBD system doing the
resource aggregation, and using a separate aggregation mechanism.
I'm not sure which I'd prefer. I can tell you that it's easy to build
raid aggregation into the NBD server (because I have), but not so easy
to build it into the client (because I haven't). In my opinion it is
better to keep the aggregation separate. If all the resources are at
the server end, you can use software RAID to aggregate them there, and
export the result via NBD. If the resources are scattered, then you
will have to export them individually via NBD, then aggregate them via
softRAID.
Or you could look at drbd, shich I think uses mirrored servers.
Quote:>>> From what I've read of GFS, it provides locking so multiple
>>> clients can mount r/w but GFS does not allow aggregation.
>>Well, locking is not really the problem anyway. Isn't GFS a file
>>system?
> Yes, but it also provides Fibre Channel, parallel SCSI, and
> Ethernet/IP tools. GFS has its own network block device
I'm not sure what relevance that is. The transport system should
really be invisible to the device. I'd prefer something like ipv6, and
leave it to someone else to provide PtP ipv6 over scsi!
Quote:> driver GNDB and an IP based lock server memexpd. The filesystem
Interesting. Do you know anything about GNBD?
Quote:> part of GFS is a journalling FS, and the exports allow
> multiple clients to mount r/w.
journalling fs's (such as xfs) over nbd indeed work well, and I am
assured that the only condition required for their working is that
client block requests retain their time ordering at the server end.
This is of course certain if there is only one client. With two
clients, I would have to implement a clocks mechanism (I haven't, but
will). I don't believe that FS hooks to get atomic locking (i.e.
temporary exclusive access to the server) is necessary under those
circumstances.
The most generic mechanism I know of right now using NBDs is
to export a RAID device via NFS or other shared network FS.
The RAID device shoudl be a composite of NBD devices. This allows
a network FS to be made up of components distributed over the net.
A next generation device would be one that virtualized this
achitecture, so that it appears to be as I have described it,
but isn't.
Quote:>>See above. You need to invest more in your imagination!
> After my misspent years in graduate school I think I'll invest
> in more tangible assets...;-)
>>Well, there's my ENBD, which is better than NBD, but also only
>>multiple-readonly. You can set it multiple-readwr, but the result will
>>be chaos. It'd bound to be if you don't have FS-level atomicity of
>>operations.
>>The short term solution is to simply export the nbd (or enbd)
>>aggegation via nfs. This adds to network traffic, but you probably
>>won't notice.
> I hadn't of that but there doesn't seem to be any reason why
> this won't work. Has anyone tried it?
I suppose they have (although I haven't). It's completely unremarkable
as an idea, so I doubt if anyone would mention it to me.
Quote:>>The long term solution is either
>> a) use a journalling fs such as XFS on nbd, and add FS hooks
>> to atomicise accesses,
>> b) use a journalling fs ... , and maintain time-order of block
>> accesses across all the machines (hic).
>>If somebody could tell me about how to do (b), I would be pleased.
> The GFS project seems close. The GFS FAQ mentions a cluster wide
> LVM is in the works, though RAID will be harder due to the issues
> you mentioned.
> Some commercial startups have been using Linux as the base
> for their products. Tricord has software called Illumina which
> does RAID across an NAS cluster, and Falconstor has a pure software
> storage virtualization product. But I don't think they've made any
> contributions of source code back to the open or free software
> communities.
I am involved with some commercial (and noncommercial) applications.
I don't know if I am at liberty to mention anything.
Peter