File "type"

File "type"

Post by BBS Administrati » Mon, 24 Sep 1990 12:36:58



        Could someone explain how the command "file" works? Specifically, I am
writing a program that allows users to navigate their $HOME directory and
any subdirectories (they cannot leave their $HOME directory though, for
security reasons) to find files that are to be read into a text editor.
Some text editor forks this program, and when the user selects a file to
read, it writes the pathname to a temporary file which the editor reads
and then loads into its' buffer.

        I wrote this "navigator" program as a separate entity, so that either
my line based editor (non-curses) or my full screen editor (subset of
curses) can call upon it and use its facilities (the navigator does lots
of other things too) without giving the user shell access directly.
Anyhow, once they select a file for reading, I'd like to be able to
determine if the file is "ascii text" as the program "file" reports
when this is true, and if not, inform the user that the contents are
NOT ascii text and that they may want to reconsider.

        Should I make a pass through the contents and make sure that each
character has the high bit OFF (so it's 7-bit data) or what? I don't
need to determine what kind of file it is, just whether or not it's
something the editors will "like."

Thanks in advance!

-- John

John Donahue, Senior Partner | UUCP: ucrmath!alchemy!{bbs, gumby}  | The Future

-------------------+---------+-------------------------------------+-----------
Communique On-line | +1-714-243-7150 {3, 12, 24, 96HST} Bps. 8-N-1 | Next Wave:
Information System |    Alchemy Software Designs Support System    | Communique

 
 
 

File "type"

Post by Robert Bedich » Mon, 24 Sep 1990 15:50:52



>    Could someone explain how the command "file" works? Specifically, I am
>writing a program that allows users to navigate their $HOME directory and

<text deleted>

I suggest that you read the man page for 'file'.  Also, read the file
that the man pages specifies as the database that 'file' uses.  You can
find lots of useful stuff by reading man pages and examining
user-readable system files.  It is something that still distinguishes
most versions of UNIX from most other operating systems.

Quote:>Anyhow, once they select a file for reading, I'd like to be able to
>determine if the file is "ascii text" as the program "file" reports
>when this is true, and if not, inform the user that the contents are
>NOT ascii text and that they may want to reconsider.

>    Should I make a pass through the contents and make sure that each
>character has the high bit OFF (so it's 7-bit data) or what? I don't
>need to determine what kind of file it is, just whether or not it's
>something the editors will "like."

There are many file types that editors will like besides files reported
by 'file' as text.  For example shell scripts are usually reported as
such and not as text.  So the result of 'file' isn't what I think that
you want.  Also, some text editors can edit any file, including
executable files.

Quote:

>Thanks in advance!

Sure, I hope that this helps.


>-- John

>John Donahue, Senior Partner | UUCP: ucrmath!alchemy!{bbs, gumby}  | The Future

>-------------------+---------+-------------------------------------+-----------
>Communique On-line | +1-714-243-7150 {3, 12, 24, 96HST} Bps. 8-N-1 | Next Wave:
>Information System |    Alchemy Software Designs Support System    | Communique


 
 
 

File "type"

Post by Joe English Muff » Mon, 24 Sep 1990 18:24:04




>>        Could someone explain how the command "file" works? Specifically, I am
>>writing a program that allows users to navigate their $HOME directory and
><text deleted>
>I suggest that you read the man page for 'file'.  Also, read the file
>that the man pages specifies as the database that 'file' uses.

Not all versions of 'file' use a separate database; I
believe the 4.2BSD 'file' has it hardcoded. (Not to
mention the fact that not all Unices have on-line
man pages, and not all sites make the hard-copy versions
easy to get to, but that's another gripe :-)

To answer the original question, 'file' first does a
stat() to determine if the file is an executable,
setuid, symbolic link, etc.  Then it reads in the
first N characters of the file and checks it against a
predefined set of patterns.  Many of the patterns are
just ``magic numbers''; for example, under SunOS the
file types "mc68020 demand paged dynamically linked
executable" and "shell script" are determined from the
first two bytes of the file.

Some of the other patterns it looks for are a little
more complicated; for example, a period at the
beginning of the line indicates "[nt]roff, tbl, or eqn
input" (which is why it tends to think makefiles are
for troff so often.)  Certain patterns of punctuation
and capitalization (not too sure what they are)
distinguish "English text" from "ascii text."

If none of the patterns match, it looks for
non-printable characters; if there are any it will
report "data", otherwise "ascii text."

Quote:>There are many file types that editors will like besides files reported
>by 'file' as text.  For example shell scripts are usually reported as
>such and not as text.  So the result of 'file' isn't what I think that
>you want.  Also, some text editors can edit any file, including
>executable files.

This is true.  Your best bet is to write a simple C
program that reads in the first block of the file and
checks for non-printing characters and possibly for
lines that are too long as well.

--Joe English


 
 
 

File "type"

Post by Paul Chamberla » Wed, 26 Sep 1990 00:35:21



>    Could someone explain how the command "file" works? Specifically, I am
>writing a program that allows users to navigate their $HOME directory and ...

I agree that reading in the first block and making basic sanity checks is
probably the best thing to do to verify the sanity of editing it.  However,
if you desire any more detail, I would seriously consider reading the output
of the "file" command itself.  Or if you have some deep reason to avoid that,
get one of the PD implementations of "file" and suck it into your source.


512/838-7008     | ...!cs.utexas.edu!ibmaus!auschs!doorstop.austin.ibm.com!tif

 
 
 

File "type"

Post by Larry Wa » Fri, 28 Sep 1990 03:58:48



: Not all versions of 'file' use a separate database; I
: believe the 4.2BSD 'file' has it hardcoded. (Not to
: mention the fact that not all Unices have on-line
: man pages, and not all sites make the hard-copy versions
: easy to get to, but that's another gripe :-)
:
: To answer the original question, 'file' first does a
: stat() to determine if the file is an executable,
: setuid, symbolic link, etc.  Then it reads in the
: first N characters of the file and checks it against a
: predefined set of patterns.  Many of the patterns are
: just ``magic numbers''; for example, under SunOS the
: file types "mc68020 demand paged dynamically linked
: executable" and "shell script" are determined from the
: first two bytes of the file.
:
: Some of the other patterns it looks for are a little
: more complicated; for example, a period at the
: beginning of the line indicates "[nt]roff, tbl, or eqn
: input" (which is why it tends to think makefiles are
: for troff so often.)  Certain patterns of punctuation
: and capitalization (not too sure what they are)
: distinguish "English text" from "ascii text."
:
: If none of the patterns match, it looks for
: non-printable characters; if there are any it will
: report "data", otherwise "ascii text."

Nice summary.

The main problem with using "file" it might induce bitrot when "file"
mutates out from under you.  Just because "file" reports "ascii text"
today is no guarantee that it won't report "D-News history file" sometime
next year.  :-)

: >There are many file types that editors will like besides files reported
: >by 'file' as text.  For example shell scripts are usually reported as
: >such and not as text.  So the result of 'file' isn't what I think that
: >you want.  Also, some text editors can edit any file, including
: >executable files.
:
: This is true.  Your best bet is to write a simple C
: program that reads in the first block of the file and
: checks for non-printing characters and possibly for
: lines that are too long as well.

Why write another one?  I've already got one you can use.  :-)

        perl -e 'print "text" if -T shift' filename

If you really do want a "simple" C program, rip out the routine that Perl
uses, do_fttext().  (But be advised that "simple" programs are just about
as hard to maintain across multiple architectures as complicated ones.
You get a lot of leverage by installing something like Perl across all
your architectures.  End of sermon.)

Larry Wall

 
 
 

File "type"

Post by David Di » Fri, 28 Sep 1990 23:58:44


[initial query about file(1) and answers elided]

Quote:>Not all versions of 'file' use a separate database; I
>believe the 4.2BSD 'file' has it hardcoded.

When we move a customer's applications to UNIX we often come
up with new file types.  Part of fully integrating an application
to UNIX is establishing magic numbers and making file(1) work, IMHO.

Forcing a hard-coded database makes this difficult, as well as
being silly (e.g., is the extra efficiency really needed?) and
quite contrary to the original idea of editable
control files in UNIX.  (Of course, how many other things are there
in BSD UNIX that are contrary to the original idea of UNIX? :-)

David*
Software Innovations, Inc. [the Software Moving Company (sm)]

 
 
 

File "type"

Post by David Di » Fri, 28 Sep 1990 23:49:13



Quote:>    Could someone explain how the command "file" works? Specifically, I am
>writing a program that allows users to navigate their $HOME directory and
>any subdirectories (they cannot leave their $HOME directory though, for
>security reasons) to find files that are to be read into a text editor.
>Some text editor forks this program, and when the user selects a file to
>read, it writes the pathname to a temporary file which the editor reads
>and then loads into its' buffer.

[more description omitted]

I consider file(1) to be a useful heuristic program for manual use,
but I would never put it in a script for automatic use.

In other words, it's just a guesser, and does not contribute to
making a robust application.

If you have particular requirements of a target file, you should
establish them with your own code.

David*
Software Innovations, Inc.

 
 
 

1. Type "(", ")" and "{", "}" in X...

Hi,

When i start a shell under X, i can type all characters on my keyboard,
the same when i start the simple editor, but when i start Forte (Java
IDE) written in Java i cant use the "(", ")", "{", "}" and allmost none
of the characters that are located on the same key as a number...

Does anybody have an idea what i can do about that... could it be
something with programs written in Java..?

Mvg,
Erwin

Sent via Deja.com http://www.deja.com/
Before you buy.

2. Could anyone suggest any ways that a Linux professional Institue can co-operate with Linux vendors to promote this open source OS movement?

3. GETSERVBYNAME()????????????????????"""""""""""""

4. Swap sapce partition size (?)

5. """"""""My SoundBlast 16 pnp isn't up yet""""""""""""

6. 911 -- Can't get into X-win

7. "Standard Journaled File System" vs "Large File Enabled Journaled File System"

8. BASH Frequently-Asked Questions (FAQ version 3.8)

9. How to "mv" many files of same type to another type

10. looking for a free "to to list"/"scheduler" type app

11. What is "type" and "structure"

12. "mount -o bind" - "fs type none not supported" error