PROPOSAL: dot-proc interface [was: /proc stuff]

PROPOSAL: dot-proc interface [was: /proc stuff]

Post by Tim Janse » Tue, 06 Nov 2001 03:40:08




Quote:> > Is there is a way to implement a filesystem in user-space? What you could
> You're proposing a replacement of /proc ?

I was asking whether there is a way to do compatibility stuff and human
readable interfaces in user space.

bye...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

PROPOSAL: dot-proc interface [was: /proc stuff]

Post by Jakob ?stergaar » Tue, 06 Nov 2001 03:40:10




> > The "fuzzy parsing" userland has to do today to get useful information
> > out of many proc files today is not nice at all.  It eats CPU, it's
> > error-prone, and all in all it's just "wrong".

> This is because the files are human-readable, nothing to do with binary vs. plain
> text. proc should be made (entirely ?) of value-per-file trees, and a back-compat
> compatprocfs union mounted for the files people and programs are expecting.

So you want generaiton and parsing of text strings whenever we pass an int from
the kernel ?

Quote:

> > However - having a human-readable /proc that you can use directly with
> > cat, echo,  your scripts,  simple programs using read(), etc.   is absolutely
> > a *very* cool feature that I don't want to let go.  It is just too damn
> > practical.

> I don't see that it's at all useful: it just makes life harder. You yourself
> state above that read(2) parsing of human readable files is "not nice at all",
> and now you're saying it is "just too damn practical".

cat /proc/mdstat    - that's practical !
cat /proc/cpuinfo   - equally so

Anyway - I won't involve myself in the argument whether we should keep
the old /proc or not - I wanted to present my idea how we could overcome
some fundamental problems in the existing framework,  non-intrusively.

Quote:

> Just drop the human-readable stuff from the new /proc, please.

I don't care enough about it to discuss it now, but I'm sure others do  ;)

Quote:

> In what way is parsing /proc/meminfo in a script more practical than
> cat /proc/meminfo/total ?

I see your point.

There's some system overhead when converting text/integer values, but
if you're polling so often I guess you have other problems anyway...

...

Quote:

> This just seems needless duplication, and fragile. Representing things as directory
> hierarchies and single-value files in text seems to me to be much nicer, just as
> convenient, and much nicer for fs/proc/ source...

I like the idea of single-value files.

But then how do we get the nice summary information we have today ?

Hmm...   How about:

  /proc/meminfo    - as it was
  /proc/.meminfo/  - as you suggested

That way we keep /proc looking like it was, while offering the very nice
single-value file interface to apps that needs it.

I could even live with text encoding of the values - I just hate not being able
to tell if it's supposed to be i32/u32/i64/u64/float/double/...  from looking
at the variable.   Type-less interfaces with implicitly typed values are
*evil*.

I'd love to have type information passed along with the value.   Of course
you could add a "f"_t file for each "f", and handle eventual discrepancies
at run-time in your application.

--
................................................................

:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

PROPOSAL: dot-proc interface [was: /proc stuff]

Post by Alexander Vir » Tue, 06 Nov 2001 03:50:13



> So you want generaiton and parsing of text strings whenever we pass an int from
> the kernel ?

"scanf is tough" --- programmer Barbie...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

PROPOSAL: dot-proc interface [was: /proc stuff]

Post by Jakob ?stergaar » Tue, 06 Nov 2001 04:10:12




> > So you want generaiton and parsing of text strings whenever we pass an int from
> > the kernel ?

> "scanf is tough" --- programmer Barbie...

I'm a little scared when our VFS guy claims he never heard of excellent
programmers using scanf in a way that led to parse errors.

/me hopes VFS doesn't use scanf...

Come on Al, if you have real arguments let hear them, if you want to insult
people you gotta do better than that above.   :)

--
................................................................

:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

PROPOSAL: dot-proc interface [was: /proc stuff]

Post by Daniel Phillip » Tue, 06 Nov 2001 04:20:15




> > Folks, could we please deep-six the "ASCII is tough" mentality?  Idea of
> > native-endian data is so broken that it's not even funny.  Exercise:
> > try to export such thing over the network.  Another one: try to use
> > that in a shell script.  One more: try to do it portably in Perl script.

> So make it network byte order.

> How many bugs have you heard of with bad use of sscanf() ?

Yes, and it's easy for those to be buffer overflow bugs.  The extra security
risk is even more of a reason to avoid ASCII strings in internal interfaces
than the gross overhead.  Do the ASCII conversions in user space, please.

No, ASCII isn't tough, it just sucks as an internal transport.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

PROPOSAL: dot-proc interface [was: /proc stuff]

Post by Tim Janse » Tue, 06 Nov 2001 04:20:15



Quote:> The idea is that if the userland application does it's parsing wrong, it
> should either not compile at all, or abort loudly at run-time, instead of
> getting bad values "sometimes".

All the XML parser interfaces that I have seen so far allow you to do things
that will cause the code to fail when you do stupid things or are not
prepared that there may appear unknown elements. Or you use a DTD, and then
your code is guaranteed to fail after a change, which may be even worse.

One-value-files are a noticable exception, you must be VERY stupid if your
code breaks because of an additional file.

bye...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

PROPOSAL: dot-proc interface [was: /proc stuff]

Post by Alex Bligh - linux-kerne » Tue, 06 Nov 2001 04:30:17


--On Sunday, 04 November, 2001 8:04 PM +0100 Jakob ?stergaard


> I'm a little scared when our VFS guy claims he never heard of excellent
> programmers using scanf in a way that led to parse errors.

I'd be far more scared if Al claimed he'd never heard of excellent
programmers reading binary formats, compatible between multiple
code revisions both forward and backwards, endian-ness etc., which
had never lead to parse errors of the binary structure.

If you feel it's too hard to write use scanf(), use sh, awk, perl
etc. which all have their own implementations that appear to have
served UNIX quite well for a long while.

Constructive suggestions:

1. use a textual format, make minimal
   changes from current (duplicate new stuff where necessary),
   but ensure each /proc interface has something which spits
   out a format line (header line or whatever, perhaps an
   interface version number). This at least
   means that userspace tools can check this against known
   previous formats, and don't have to be clairvoyant to
   tell what future kernels have the same /proc interfaces.

2. Flag those entries which are sysctl mirrors as such
   (perhaps in each /proc directory /proc/foo/bar/, a
   /proc/foo/bar/ctl with them all in). Duplicate for the
   time being rather than move. Make reading them (at
   least those in the ctl directory) have a comment line
   starting with a '#' at the top describing the format
   (integer, boolean, string, whatever), what it does.
   Ignore comment lines on write.

3. Try and rearrange all the /proc entries this way, which
   means sysctl can be implemented by a straight ASCII
   write - nice and easy to parse files. Accept that some
   /proc reads (especially) are going to be hard.

--
Alex Bligh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

PROPOSAL: dot-proc interface [was: /proc stuff]

Post by Jakob ?stergaar » Tue, 06 Nov 2001 04:50:09



> --On Sunday, 04 November, 2001 8:04 PM +0100 Jakob ?stergaard

> > I'm a little scared when our VFS guy claims he never heard of excellent
> > programmers using scanf in a way that led to parse errors.

> I'd be far more scared if Al claimed he'd never heard of excellent
> programmers reading binary formats, compatible between multiple
> code revisions both forward and backwards, endian-ness etc., which
> had never lead to parse errors of the binary structure.

Sure there is potential for error anywhere.  And maybe your compiler's
type-check is broken too.  But that's not an argument for not trying
to improve on things.

Please tell me,  is "1610612736" a 32-bit integer, a 64-bit integer, is
it signed or unsigned   ?

I could even live with parsing ASCII, as long as there'd just be type
information to go with the values.  But I see no point in using ASCII
for something intended purely for machine-to-machine communication.

/proc text "GUI" files will stay, don't worry  :)

Quote:> If you feel it's too hard to write use scanf(), use sh, awk, perl
> etc. which all have their own implementations that appear to have
> served UNIX quite well for a long while.

Witness ten lines of vmstat output taking 300+ millions of clock cycles.

Quote:> Constructive suggestions:

> 1. use a textual format, make minimal
>    changes from current (duplicate new stuff where necessary),
>    but ensure each /proc interface has something which spits
>    out a format line (header line or whatever, perhaps an
>    interface version number). This at least
>    means that userspace tools can check this against known
>    previous formats, and don't have to be clairvoyant to
>    tell what future kernels have the same /proc interfaces.

Then we have text strings as values - some with spaces, some with quotes in
them.   Then we escape our way out of that (which isn't done today by the way),
and then we start implementing a parser for that in every /proc using
application out there.

These interfaces need to be "correct", not "mostly correct".

Example:   I make a symlink from "cat" to "c)(t" (sick example, but that doesn't
change my point), and do a "./c)(t /proc/self/stat":

[albatros:joe] $ ./c\)\(a /proc/self/stat
22482 (c)(a) R 22444 22482 22444 34816 22482 0 20 0 126 0 0 0 0 0 14 0 0 0 24933425 1654784 129 4294967295 134512640 134525684 3221223504 3221223112 1074798884 0 0 0 0 0 0 0 17 0

Go parse that one !  What's the name of my applications ?

It's good enough for human readers - we have the ability to reason and
make qualified quesses.   Now go implement that in every single piece of
/proc reading software out there   :)

If you want ASCII, we should at least have some approved parsing library
to parse this into native-machine binary structures that can be used
safely in applications.  I see little point in ASCII then, but maybe it's
just me.

Quote:

> 2. Flag those entries which are sysctl mirrors as such
>    (perhaps in each /proc directory /proc/foo/bar/, a
>    /proc/foo/bar/ctl with them all in). Duplicate for the
>    time being rather than move. Make reading them (at
>    least those in the ctl directory) have a comment line
>    starting with a '#' at the top describing the format
>    (integer, boolean, string, whatever), what it does.
>    Ignore comment lines on write.

> 3. Try and rearrange all the /proc entries this way, which
>    means sysctl can be implemented by a straight ASCII
>    write - nice and easy to parse files. Accept that some
>    /proc reads (especially) are going to be hard.

I just hate to implement a fuzzy parser with an A.I. that makes HAL look like
kid's toys, every d*mn time I need to get information from the system.

I'm not a big fan of huge re-arrangements. I do like the idea of providing
a machine-readable version of /proc.

--
................................................................

:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

PROPOSAL: dot-proc interface [was: /proc stuff]

Post by Alexander Vir » Tue, 06 Nov 2001 05:00:12



> > If you feel it's too hard to write use scanf(), use sh, awk, perl
> > etc. which all have their own implementations that appear to have
> > served UNIX quite well for a long while.

> Witness ten lines of vmstat output taking 300+ millions of clock cycles.

Would the esteemed sir care to check where these cycles are spent?
How about "traversing page tables of every damn process out there"?
Doesn't sound like a string operation to me...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

PROPOSAL: dot-proc interface [was: /proc stuff]

Post by Jakob ?stergaar » Tue, 06 Nov 2001 05:10:11




> > > If you feel it's too hard to write use scanf(), use sh, awk, perl
> > > etc. which all have their own implementations that appear to have
> > > served UNIX quite well for a long while.

> > Witness ten lines of vmstat output taking 300+ millions of clock cycles.

> Would the esteemed sir care to check where these cycles are spent?
> How about "traversing page tables of every damn process out there"?
> Doesn't sound like a string operation to me...

I'm sure your're right.   It's probably not just string operations. And maybe
then don't even dominate.

And I'm sure that vmstat doesn't use sh, awk, and perl either.

Anyway, the efficiency issues was mainly me getting side-tracked from the main
issue as I see it.

The point I wanted to make was, that we need an interface thats possible to
parse "correctly", not "mostly correctly", and we need to be able to parse it
in a way so that we do not have to rely on a myriad of small tools (that change
over time too).

You need something that's simple and correct.  If it's ASCII, well let it be
ASCII. But /proc as it is today is not possible to parse reliably.  See my "cat
vs. c)(a" example.   You can parse it "mostly reliable", but that's just not
good enough.

--
................................................................

:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

PROPOSAL: dot-proc interface [was: /proc stuff]

Post by Alexander Vir » Tue, 06 Nov 2001 05:10:11



> So just ignore square brackets that have "=" " " and ">" between them ?

> What happens when someone decides  "[---->   ]" looks cooler ?

First of all, whoever had chosen that output did a fairly idiotic thing.
But as for your question - you _do_ know what regular expressions are,
don't you?  And you do know how to do this particular regex without
any use of library functions, right?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

PROPOSAL: dot-proc interface [was: /proc stuff]

Post by Albert D. Cahala » Tue, 06 Nov 2001 05:50:11


Quote:SpaceWalker writes:
> A good reason could be that a simple ps -aux uses hundreds of system
> calls to get the list of all the processes ...

First of all, "ps -aux" isn't correct usage. It is accepted only
as long as you don't have a username "x". Try "ps aux" instead.
(good versions of ps will print a warning -- use Debian)

Second of all, if you want a really fast ps, look here:
http://lwn.net/2000/0420/a/atomicps.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

PROPOSAL: dot-proc interface [was: /proc stuff]

Post by Alex Bligh - linux-kerne » Tue, 06 Nov 2001 05:50:11


Quote:>> Using a ioctl that returns the type.

> But that's not pretty   :)

> Can't we think of something else ?

Well this sure isn't perfect, but to
illustrate it can be done with a text
interface (and the only restriction
is strings can't contain \n):

cat /proc/widget
# Format: '%l'
# Params: Number_of_Widgets
37

echo '38' > /proc/widget

cat /proc/widget
# Format: '%l'
# Params: Number_of_Widgets
38

cat /proc/widget | egrep -v '^#'
38

cat /proc/sprocket
# Format: '%l' '%s'
# Params: Number_of_Sprockets Master_Sprocket_Name
21
Foo Bar Baz

echo '22' > /proc/sprocket
# writes first value if no \n character written before
# close - all writes done simultaneously on close

cat /proc/sprocket | egrep -v '^#'
22
Foo Bar Baz

echo 'Master_Sprocket_Name\nBaz Foo Bar' > /proc/sprocket

cat /proc/sprocket | egrep -v '^#'
22
Baz Foo Bar

echo 'Master_Sprocket_Name\nFoo Foo Foo\nNumber_of_Sprockets\n111' >
/proc/sprocket
# Simultaneous commit if /proc driver needs it
# i.e. it has get_lock() and release_lock()
# entries
cat /proc/sprocket | egrep -v '^#'
111
Foo Foo Foo

& nice user tools look at the '# Params:' line to find
what number param they want to read / alter.

--
Alex Bligh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

PROPOSAL: dot-proc interface [was: /proc stuff]

Post by Jakob ?stergaar » Tue, 06 Nov 2001 06:10:11



Quote:> >> Using a ioctl that returns the type.

> > But that's not pretty   :)

> > Can't we think of something else ?

> Well this sure isn't perfect, but to
> illustrate it can be done with a text
> interface (and the only restriction
> is strings can't contain \n):

Such limitations are not acceptable.

Quote:

> cat /proc/widget
> # Format: '%l'
> # Params: Number_of_Widgets
> 37

> echo '38' > /proc/widget

> cat /proc/widget
> # Format: '%l'
> # Params: Number_of_Widgets
> 38

Good point with the parsing  :)

Quote:

> cat /proc/widget | egrep -v '^#'
> 38

> cat /proc/sprocket
> # Format: '%l' '%s'
> # Params: Number_of_Sprockets Master_Sprocket_Name
> 21
> Foo Bar Baz

Not one value per file ?

Quote:

> echo '22' > /proc/sprocket
> # writes first value if no \n character written before
> # close - all writes done simultaneously on close

> cat /proc/sprocket | egrep -v '^#'
> 22
> Foo Bar Baz

> echo 'Master_Sprocket_Name\nBaz Foo Bar' > /proc/sprocket

> cat /proc/sprocket | egrep -v '^#'
> 22
> Baz Foo Bar

> echo 'Master_Sprocket_Name\nFoo Foo Foo\nNumber_of_Sprockets\n111' >
> /proc/sprocket
> # Simultaneous commit if /proc driver needs it
> # i.e. it has get_lock() and release_lock()
> # entries
> cat /proc/sprocket | egrep -v '^#'
> 111
> Foo Foo Foo

> & nice user tools look at the '# Params:' line to find
> what number param they want to read / alter.

How about:

We keep old proc files.

For each file, we make a .directory.

For example - for /proc/meminfo, we make a /proc/.meminfo/ directory
that contains the files
 MemTotal
 MemFree
 MemShared
 etc.

cat /proc/.meminfo/MemTotal gives you
"u32:KB:513276"

The kernel code for printing this is something like
 sprintf(..., "%s:%s:%u", DPI_T_U32, DPI_U_KB, i.memtotal);

The types and the units are necessary. But furthermore we do not
want various developers to be using different ways of writing the
types and units (KB vs. kB, vs. KiB).  Defines will ensure that
(if they are used - but they lend themselves to being used), and
once a new define is introduced it is fairly easy to document and
export to userland.

Not only does this format tell us exactly what's in the file (and
therefore how we should parse it), it also defines what we can write
to it (assuming we write the same types as we read - but that's a
reasonable assumption I suppose).

Problem:  Could it be made simpler to parse from scripting languages,
without making it less elegant to parse in plain C ?

If the values is a string, the string will begin after the second
semicolon (safe, since no type or unit can contain a colon and won't
have to, ever), and ends at the end of the file.  Voila, any character can be
in the string value.

And Al gets his #%^# text files   ;)

--
................................................................

:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

PROPOSAL: dot-proc interface [was: /proc stuff]

Post by Albert D. Cahala » Tue, 06 Nov 2001 06:20:13


Quote:=?iso-8859-1?Q?Jak writes:
> Please tell me,  is "1610612736" a 32-bit integer, a 64-bit integer, is
> it signed or unsigned   ?

> I could even live with parsing ASCII, as long as there'd just be type
> information to go with the values.

You are looking for something called the registry. It's something
that was introduced with Windows 95. It's basically a filesystem
with typed files: char, int, string, string array, etc.

Quote:> These interfaces need to be "correct", not "mostly correct".

> Example:   I make a symlink from "cat" to "c)(t" (sick example,
> but that doesn't change my point), and do a "./c)(t /proc/self/stat":

> [albatros:joe] $ ./c\)\(a /proc/self/stat
> 22482 (c)(a) R 22444 22482 22444 34816 22482 0 20 0 126 0 0 0 0 0 14 0 0 0 24933425 1654784 129 4294967295 134512640 134525684 3221223504 3221223112 1074798884 0 0 0 0 0 0 0 17 0

> Go parse that one !  What's the name of my applications ?

Funny you should mention that one. I wrote the code used by procps
to read this file. I love that file! The parentheses issue is just
a beauty wart. People rarely feel the urge to*with raw numbers.
In all the other files, idiots like to: add headers, change the
spelling of field names, change the order, add spaces and random
punctuation, etc. Nothing is as stable and easy to use as the
/proc/self/stat file.

Quote:> If you want ASCII, we should at least have some approved parsing
> library to parse this into native-machine binary structures

No.

Quote:>> 2. Flag those entries which are sysctl mirrors as such
>>    (perhaps in each /proc directory /proc/foo/bar/, a
>>    /proc/foo/bar/ctl with them all in). Duplicate for the
>>    time being rather than move. Make reading them (at
>>    least those in the ctl directory) have a comment line
>>    starting with a '#' at the top describing the format
>>    (integer, boolean, string, whatever), what it does.
>>    Ignore comment lines on write.

Now you are proposing to dink with the format. See above comments.

Quote:>> 3. Try and rearrange all the /proc entries this way, which
>>    means sysctl can be implemented by a straight ASCII
>>    write - nice and easy to parse files.

This is exactly what the sysctl command does.

Quote:> I'm not a big fan of huge re-arrangements. I do like the idea of providing
> a machine-readable version of /proc.

Linus clearly doesn't give a * about /proc performance.
That's his right, and you are welcome to patch your kernel to
have something better: http://www.veryComputer.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://www.veryComputer.com/
Please read the FAQ at  http://www.veryComputer.com/