Simple Kernel-User Event Interface (Was: RE: [ANNOUNCE] udev 0.1 release)

Simple Kernel-User Event Interface (Was: RE: [ANNOUNCE] udev 0.1 release)

Post by Perez-Gonzalez, Inak » Sun, 13 Apr 2003 06:20:18



> From: Greg KH [mailto:g...@kroah.com]

> On Fri, Apr 11, 2003 at 11:39:07PM +0000, Miquel van Smoorenburg wrote:
> > In article <20030411190717.GH1...@kroah.com>,
> > Greg KH  <g...@kroah.com> wrote:
> > >I agree too.  Having /sbin/hotplug send events to a pipe where a daemon
> > >can get them from makes a lot of sense.  It will handle most of the
> > >synchronization and spawning a zillion tasks problems.

> > Why not serialize /sbin/hotplug at the kernel level. Queue hotplug
> > events and only allow one /sbin/hotplug to run at the same time.

> We don't want the kernel to stop based on a user program.

Okay, so what about this:

I started playing with a simple event interface, that would allow:

- queuing events and recalling-queued events
- not consume (almost) memory when two bazillion events are queued
- be accessible by different processes at the same time on
  different fds

And I came with the attached patch. Now it is kind of lame, as I am
a lazy bastard and don't remember too much on the current state of
the art for device access (and I have no idea on how to code a
poll() fop, so need help there) - whatever - now it is a misc device,
but that should be changed.

The idea is you queue from the kernel a message and the user space
reads it -entirely, no half things-, starting with a header (unsigned
long size) and then the actual bytes. If the user provides a buffer
big enough, more entire messages are copied. If no messages are
available, -EAGAIN.

Now, each fd keeps a pointer to the queue list and only when the
event has been read by all the open fds, it is then disposed. I
think I got right all the maintenance so no event is left dangling,
even if you close all the fds (at this point, a queue flush is
performed, however, new events will be queued).

Now the catch is that the message data can or cannot be kmalloced.
For example, when we plug a device, part of the device structure
can be the message data, and once plugged, we queue the event.
Once read, it won't be freed (set the flag for it), and when
the device is unplugged, the event is recalled, just in case it
wasn't read yet. This means we can have as many of these as we
want, they won't take much extra space, as at the end, somebody
will dispose of them. For removal events, for example, we can
dynamically allocate one.

Caveats:

- Currently using semaphores, so cannot be called from atomic
  functions - need to fix that.

Now, I haven't bothered to look at other interfaces around
(ACPI - need to ask Andy), networking ... but I am pretty
sure this one is generic enough as to work for everyone.
Maybe add a type field in the header or stuff like that,
but it should do.

Tested under 2.5.66 - didn't test the multiple readers part, though

 drivers/char/Makefile   |    2
 drivers/char/kue-test.c |   60 +++++++++
 drivers/char/kue.c      |  314
++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/kue.h     |   76 +++++++++++

Index: drivers/char/Makefile
diff -u drivers/char/Makefile:1.1.1.4 drivers/char/Makefile:1.1.1.4.2.1
--- drivers/char/Makefile:1.1.1.4       Wed Mar 26 15:30:30 2003
+++ drivers/char/Makefile       Fri Apr 11 20:51:38 2003
@@ -78,6 +78,8 @@
 obj-$(CONFIG_IPMI_HANDLER) += ipmi/

 obj-$(CONFIG_HANGCHECK_TIMER) += hangcheck-timer.o
+obj-m += kue.o
+obj-m += kue-test.o

 # Files generated that shall be removed upon make clean
 clean-files := consolemap_deftbl.c defkeymap.c qtronixmap.c
Index: drivers/char/kue-test.c
diff -u /dev/null drivers/char/kue-test.c:1.1.2.1
--- /dev/null   Fri Apr 11 21:05:30 2003
+++ drivers/char/kue-test.c     Fri Apr 11 20:52:03 2003
@@ -0,0 +1,60 @@
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/kue.h>
+
+struct msg
+{
+       struct kue kue;
+       char data[];
+};
+
+struct msg m3 = {
+       .kue = KUE_INIT(m3.kue, 0, 9),
+       .data = "123456789"
+};
+
+struct msg m2 = {
+       .kue = KUE_INIT(m2.kue, 0, 3),
+       .data = "123"
+};
+
+struct msg m1 = {
+       .kue = KUE_INIT(m1.kue, 0, 4),
+       .data = "1234"
+};
+
+struct msg m4 = {
+       .kue = KUE_INIT(m4.kue, 0, 16),
+       .data = "123456789abcdef"
+};
+
+      
+static
+int __init kue_test_init (void)
+{
+       struct msg *_m4;
+      
+       kue_send_event (&m3.kue);
+       kue_send_event (&m2.kue);
+       kue_send_event (&m1.kue);
+
+       _m4 = kmalloc (sizeof (*_m4), GFP_KERNEL);
+       memcpy (_m4, &m4, sizeof (m4));
+       _m4->kue.flags = KUE_KFREE;
+       kue_send_event (&_m4->kue);
+       return 0;
+}
+
+static
+void __exit kue_test_exit (void)
+{
+       kue_recall_event (&m3.kue);
+       kue_recall_event (&m2.kue);
+       kue_recall_event (&m1.kue);
+}
+
+
+module_init(kue_test_init);
+module_exit(kue_test_exit);
+MODULE_LICENSE("GPL");
Index: drivers/char/kue.c
diff -u /dev/null drivers/char/kue.c:1.1.2.1
--- /dev/null   Fri Apr 11 21:05:30 2003
+++ drivers/char/kue.c  Fri Apr 11 20:51:49 2003
@@ -0,0 +1,314 @@
+
+/* Kernel-User Events
+ *
+ * Simple event interface for the kernel to pass on stuff to the user
+ * space. When some part of the kernel wants to send a message to user
+ * space (binary chunk of data of a given size), it defines the
+ * following:
+ *
+ * #include <linux/kue.h>
+ *
+ * struct some_msg {
+ *         struct kue kue; // FIRST MEMBER!!
+ *         char data[];
+ * } msg = { .kue = KUE_INIT(msg.kue, FLAGS, SIZE), data = "Hello world!"

};

+ *
+ * In this case, SIZE would be [onetwothreefourfiv...] thirteen with
+ * the \0. FLAGS is either 0 or KUE_KFREE (with 0, we own the data,
+ * with KUE_KFREE, KUE will kfree() the data when the message is
+ * delivered.
+ *
+ * Now, queue with:
+ *
+ * kue_send_event (&msg.kue);
+ *
+ * Now, the message is queued. There is a char device that user space
+ * programs can open for reading; if they read and there are no
+ * messages, they get a -EAGAIN. KUE will try to fit as many *whole*
+ * messages as possible in the read buffer, and then advance each fd
+ * specific 'current' pointer through the queue of messages. Next time
+ * it reads, it will get the following ones, and so on. If you get
+ * -EFBIG, the buffer is too small for the first message, so make it
+ * bigger.
+ *
+ * When a message has been read by all the current open fds, it is
+ * removed from the queue and if KUE_KFREE was set, kfree()d.
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/miscdevice.h>
+#include <linux/fs.h>
+#include <asm/semaphore.h>
+#include <linux/kue.h>
+#include <asm/uaccess.h>
+
+#define DEBUG
+#ifdef DEBUG
+#define debug(a...) printk (KERN_ERR a)
+#else
+#define debug(a...)
+#endif
+
+static DECLARE_MUTEX(kue_lock);
+static struct list_head kue_list = LIST_HEAD_INIT (kue_list);
+static struct list_head kue_fd_list = LIST_HEAD_INIT (kue_fd_list);
+static unsigned kue_fds = 0;
+struct kue_fd
+{
+       struct list_head list;
+       struct list_head *itr;
+       struct file *file;
+};
+
+/* Internal: de/queue/query event */
+
+static inline
+void __kue_queue (struct kue *kue)
+{
+       debug ("Queuing %p\n", kue);
+       kue->fds_done = 0;
+       list_add_tail (&kue->list, &kue_list);
+}
+
+static inline
+void __kue_dequeue (struct kue *kue)
+{
+       debug ("Dequeuing %p%s\n", kue,
+              kue->flags & KUE_KFREE? " (kfree()d)" : "");
+       list_del (&kue->list);
+       if (kue->flags & KUE_KFREE)
+               kfree (kue);
+}
+
+static inline
+struct kue * __kue_locate_event (const struct kue *kue)
+{
+       struct list_head *itr;
+       struct kue *kue_itr;
+      
+       list_for_each (itr, &kue_list) {
+               kue_itr = container_of (itr, struct kue, list);
+               if (kue_itr == kue)
+                       goto out;
+       }
+       kue_itr = NULL;
+out:
+       return kue_itr;
+}
+
+static inline
+void __kue_recall_event (struct kue *kue)
+{
+       struct list_head *itr, *kue_itr;
+       struct kue_fd *kue_fd;
+      
+       debug ("__Recalling %p\n", kue);
+      
+       kue_itr = &kue->list;
+       list_for_each (itr, &kue_fd_list) {
+               kue_fd = container_of (itr, struct kue_fd, list);
+               if (kue_fd->itr == kue_itr)
+                       kue_fd->itr = kue_itr->prev;
+       }
+       __kue_dequeue (kue);
+}
+
+       /* Kernel interface */
+
+void kue_send_event (struct kue *kue)
+{
+       debug ("Sending %p\n", kue);
+       down (&kue_lock);
+       __kue_queue (kue);
+       up (&kue_lock);
+}
+
+
+int kue_recall_event (struct kue *kue)
+{
+       int result = -ENOENT;
+      
+       debug ("Recalling %p\n", kue);
+      
+       down (&kue_lock);
+       if (__kue_locate_event (kue) != NULL) {
+               __kue_recall_event (kue);
+               result = 0;
+       }
+       up (&kue_lock);
+       return result;
+}
+
+
+unsigned kue_delivered_event (const struct kue *kue)
+{
+       down (&kue_lock);
+       kue = __kue_locate_event (kue);
+       up (&kue_lock);
+       return kue == NULL;
+}
+
+
+       /* File operations */
+
+static
+int kue_open (struct inode *inode, struct file *file)
+{
+       struct kue_fd *kue_fd = kmalloc (sizeof (*kue_fd), GFP_KERNEL);
+      
+       if (kue_fd == NULL)
+               return -ENOMEM;
+       kue_fd->itr = &kue_list;
+       kue_fd->file = file;
+       down (&kue_lock);
+       list_add_tail (&kue_fd->list, &kue_fd_list);
+       kue_fds++;
+       up (&kue_lock);
+       file->private_data = kue_fd;
+       debug ("Open kue_fd %p kue_fds %d\n", kue_fd, kue_fds);
+       return 0;
+}
+
+static
+ssize_t kue_read (struct file *file, char *dest, size_t size,
+                 loff_t *offset)
+{
+       struct kue_fd *kue_fd = file->private_data;
+       struct kue *kue;
+       ssize_t result, total_copied = 0;
+       struct list_head *itr;
+
+       if (kue_fd == NULL)
+               return -EIO;
+      
+       debug ("Read kue_fd %p kue_itr %p\n", kue_fd, kue_fd->itr);
+      
+       down (&kue_lock);
+       itr = kue_fd->itr;
+       while (1)
+       {
+               result = -EAGAIN;
+               if (itr->next == &kue_list)
+                       break;
+               result = -EFBIG;
+               kue = container_of (itr->next, struct kue, list);
+               if (kue->kue_user.size > size)
+                       break;
+               debug ("Read kue_fd %p kue_itr %p dest %p size %u kue %p\n",
+                      kue_fd, kue_fd->itr,
+                      dest, size, kue);
+               result = copy_to_user (dest, &kue->kue_user,
kue->kue_user.size);
+               if (result != 0) {
+                       result = -EFAULT;
+                       break;
+               }
+               size -= kue->kue_user.size;
+               dest += kue->kue_user.size;
+               total_copied += kue->kue_user.size;
+               result = 0;
+               kue->fds_done++;
+               /* All read it? wipe it */
+               if (kue->fds_done >= kue_fds)
+                       __kue_dequeue (kue);
+               else
+                       itr = itr->next;
+      
...

read more »

 
 
 

Simple Kernel-User Event Interface (Was: RE: [ANNOUNCE] udev 0.1 release)

Post by Greg K » Sun, 13 Apr 2003 08:00:15



> Okay, so what about this:

> I started playing with a simple event interface, that would allow:

> - queuing events and recalling-queued events
> - not consume (almost) memory when two bazillion events are queued
> - be accessible by different processes at the same time on
>   different fds

Have you looked at relayfs?  I think it might do much the same thing as
this, but through a fs interface, instead of a char device node.

Quote:> Now, each fd keeps a pointer to the queue list and only when the
> event has been read by all the open fds, it is then disposed.

I don't think you can just count the number of open fds, like your patch
does to get a count of who all read this message (fds can close and
others can open, so newer fds might not have read the message before it
is removed.)

Looks like a good start, but I'm not moving the hotplug interface over
to it :)

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Simple Kernel-User Event Interface (Was: RE: [ANNOUNCE] udev 0.1 release)

Post by Perez-Gonzalez, Inak » Sun, 13 Apr 2003 11:00:16




> > Okay, so what about this:

> > I started playing with a simple event interface, that would allow:

> > - queuing events and recalling-queued events
> > - not consume (almost) memory when two bazillion events are queued
> > - be accessible by different processes at the same time on
> >   different fds

> Have you looked at relayfs?  I think it might do much the same thing as
> this, but through a fs interface, instead of a char device node.

Nope - I didn't even know it existed - this was just, hmmm, it could
be done like this, plank! There.. It's small and to me it cuts it
well enough.

The char device node is a quick place where to hook the struct
file_operations. I'd say this would go inside sysfs or something.
It is not really important.

Quote:> > Now, each fd keeps a pointer to the queue list and only when the
> > event has been read by all the open fds, it is then disposed.

> I don't think you can just count the number of open fds, like your patch
> does to get a count of who all read this message (fds can close and
> others can open, so newer fds might not have read the message before it
> is removed.)

The intention is [unless I have screwed it up big time] that if there
are no readers, the events are queued up. Once there is at least one
reader, then they are released as soon as they are read by all the
current readers. This way there is little chance for having a big accu-
mulation of unread events - once you start whatever event managing
daemon, you are set.

The idea of allowing multiple readers was so you can have other actors
listening for stuff - although the main one would always be the event
daemon (that could even forward the events).

Quote:> Looks like a good start, but I'm not moving the hotplug interface over
> to it :)

Good try - I won't let go :) If you see this as something potentially
useful, how would you like it to develop so that in the long term
it can be used? be it in parallel with /sbin/hotplug or as a
potential replacement?

I guess that the first thing I would have to do is somehow look into
how hotplug is behaving now and hook it to do something similar, right?

See ya

I?aky Prez-Gonzlez -- Not speaking for Intel -- all opinions are my own
(and my fault)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Simple Kernel-User Event Interface (Was: RE: [ANNOUNCE] udev 0.1 release)

Post by Frank van Maarsevee » Sun, 13 Apr 2003 11:20:10



> The idea is you queue from the kernel a message and the user space
> reads it -entirely, no half things-, starting with a header (unsigned
> long size) and then the actual bytes. If the user provides a buffer

It would be better to use an ASCII interface using \n as event
separator. That's what I like about hotplug: it can be scripted.

--
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Simple Kernel-User Event Interface (Was: RE: [ANNOUNCE] udev 0.1 release)

Post by Frank van Maarsevee » Sun, 13 Apr 2003 11:30:13



> The idea of allowing multiple readers was so you can have other actors
> listening for stuff - although the main one would always be the event
> daemon (that could even forward the events).

I think multiple readers is a good idea. Though it can be (ab)used to
create inefficient solutions (hundreds of listeners each handling their
own particular event) it is very practical as it makes it possible
to add an event handler without having to dig into the hotplug script
infrastructure or depending on that.

--
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Simple Kernel-User Event Interface (Was: RE: [ANNOUNCE] udev 0.1 release)

Post by Greg K » Wed, 16 Apr 2003 06:40:08



> > Looks like a good start, but I'm not moving the hotplug interface over
> > to it :)

> Good try - I won't let go :) If you see this as something potentially
> useful, how would you like it to develop so that in the long term
> it can be used? be it in parallel with /sbin/hotplug or as a
> potential replacement?

I don't know.  Even if we decide to change, this is a 2.7 thing.

Quote:> I guess that the first thing I would have to do is somehow look into
> how hotplug is behaving now and hook it to do something similar, right?

That would be a good start :)

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

1. Simple Kernel-User Event Interface (Was: RE: [ANNOUNCE] udev

It didn't do too much [unless I grossly missed anything].

Poll is not supported right now because I didn't know how to do
it and I didn't have time to investigate (this was a quickie
hack). Same thing w/ UNIX domain socket - also I wanted to avoid
sockets because I considered it could be done with simple reads
(thus, if for whatever reason you don't compile sockets, you
don't need to make it more complex - this is also the reason
to force reading whole messages and having the length in the
header - it's a poor's man rcvmsg()).

Partial reads are what really complicates the stuff - I didn't see
a point in supporting them because events are supposed to be kind
of limited in size, not a huge thing; I don't think there are too
many cases where you provide a buffer smaller than say, 256 bytes.

And then, providing small buffers is also kind of underperforming;
you want to maximize how much events you get in a single shot per
system call, to minimize the system call overhead - that means a
bigger buffer; your granularity in time is what will determine it.

And the event struct is generic enough. That's why the data[] thing is
there - you include the message format you want: ascii, binary, name it.

What gets propagate to user space is a four byte size and then
the stuff you asked to have delivered. How can it be more generic
on the format realm?

This would not be difficult to do - however, I see a little bit
overkill to have a filesystem for it when the files could be
plugged into, for example /sysfs [add to my stuff a declare message
queue, and export it in /sysfs as a file] - Will look into that
ASAP.

I?aky Prez-Gonzlez -- Not speaking for Intel -- all opinions are my own
(and my fault)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2. where is libXm.so.0

3. udev 0.1 release

4. Logging outgoing INET Services

5. Announcing release 0.1 of the GNU Hurd

6. Check out unix programming books on eBay

7. ANNOUNCE: ibmonitor v1.0.1 Release

8. ftp: getting a directory tree

9. ANNOUNCE: Atari 8bit filesystem driver for Linux 0.1 released

10. ANNOUNCE: Flash Web Server v 0.1 alpha released

11. ANNOUNCE: KNode 0.1 released

12. udev enhancements to use kernel event queue

13. :Porting from SC3.0.1 to SC4.0.1