"Alan J. Flavell" <flav...@mail.cern.ch> wrote in message <news:Pine.LNX.firstname.lastname@example.org>...
> On Jun 13, Dubya the ecovandal inscribed on the eternal scroll:
> > FWIW I use NN4.74 (IIRC) but the behaviour goes back a long way before
> > that and hasn't altered subsequently AFAIK. My guess is your test
> > download didn't insert extra CRs because either 1) the URL you
> > shift/clicked on began with ftp:// not http:// or 2) it was an http://
> > download but the file's MIME type wasn't marked as "text/plain".
> Curiouser and curiouser. As it happens, my test URL was the one shown
> lynx -dump -head http://ppewww.ph.gla.ac.uk/~flavell/tests/test.txt
> HTTP/1.1 200 OK
> Date: Fri, 14 Jun 2002 16:31:05 GMT
> Server: Apache/1.3.20 (Unix) PHP/4.1.2
> Last-Modified: Thu, 13 Jun 2002 17:27:10 GMT
> ETag: "55846b-f6-3d08d5ee"
> Accept-Ranges: bytes
> Content-Length: 246
> and it's definitely in Unix-newlines format.
You're right, of course - that's a doozy! Shows there's no difference
in this behaviour between our versions of NN or Windows, at least.
> If I copy my test file from test.txt to test.dsc and send _that_ out
> with text/plain, then I _do_ get the effect which you're complaining
> about. It grows from 246 to 260 bytes and - as you won't be surprised
> to hear - the file contains 14 lines. A picture seems to be emerging.
Yes, it looks as though NN on Windows is also taking the
file-extension into account. (Inserts plug for Debian, and for
GNU/Linux in general: if we had the source it'd be easy to tell for
sure!) What's the bet it does the same as with .dsc if you use, say,
.jnk or .ajf, or anything else that isn't so likely to be recognised
as identifying the internal format?
> I apologise for being unaware of this curiosity before.
What's to apologise for? Pat yourself on the back. This thread's
taught us both something, and wildly exceeded my original hopes:
anyone puzzled by NN's behaviour in future need only do a Usenet
search and they'll find here a detailed discussion of what to beware
of and how to avoid it.
> FTP on the other hand seems to have been designed on the assumption
> that the server side could not have any idea what kind of stuff was
> being made available from it, and thus the client (or their software)
> must guess.
Pretty much. It was so much earlier that the MIME types couldn't be
incorporated, so all it offered was ASCII or BINARY for download
modes. Fair enough for its time, and still useful: you have to control
what you get, but the advantage is you're _always_ in control of what
> Your complaint was, fundamentally, about an application that thinks it
> can handle text - but really expects what in some respects needs to be
> a binary image, so IMHO the application is at fault (lack of
> application portability).
Well ... not really. The application (dpkg-source) isn't designed to
handle "text" generically, only Unix-format text. On one hand it
wouldn't kill the designers to make dpkg-source less fussy about
stripping white-space and, while they were at it, to treat extra CRs
as white space even though they basically never appear in Unix
textfiles. OTOH the programmer in me would side with their inevitable
protest: the app undertakes and fulfils a clear contract - to accept
files in a well-defined format - and it has every right to puke if
it's force-fed a different one, however closely related that format
might be. It certainly isn't a portable app, I agree, but then it's
for low-level handling of _D_ebian _P_ac_k_a_g_es so it makes sense
for it to work only with Debian-format files. One thing's certain: no
appeal that dpkg-source should be fixed to accommodate a quirk of
Windows-based NN would get anywhere ;-)
> OTOH _my_ beef would be with sites that make binary stuff available,
> but allow it to be served out as the server's default of text/plain.
Hear, hear, mine too. I'm a bit ticked off that Debian's site does
exactly that, but I have to concede it's not a straightforward choice.
Using text/plain for .dsc files allows them to be viewed before
they're downloaded, on any browser and platform. That's a pretty
strong case for text/plain. OTOH NN (and no doubt other browsers too)
would allow itself or a helper app to be invoked for browsing files
with .dsc extensions anyway, even if application/octet-stream were
used (as is strongly hinted at by RFC2046 for non-"standard" text
types). Against that is that app/octet is such a ridiculous,
last-resort way to handle something as prevalent as Unix text format.
The relevant RFCs seem astonishingly silent in this respect - my
strong suspicion is it's there somewhere but I've overlooked it. It's
as if M$ drove the entire process and no-one had ever heard of Unix
text. I wonder if there's a text/x-lf-newlines or somesuch - that'd be
one approach. In the end I'm almost as bugged that the Debian files in
question are tagged text/plain (a clear violation of RFC2046) as I am
about NN's jackboot behaviour, in particular because those files are
principally made available for download not for viewing.
> > Not every file is offered both for browsing via
> > http and for download via ftp. Where http alone is offered, from
> > archive servers whose principal role is to supply downloads not
> > viewable content, many files are inappropriately marked "text/plain"
> > which will result in NN massaging them.
> Please don't think that I haven't understood the point that you're
No, not at all :-) I started this thread for the benefit of other
readers, hence my long-winded and detailed posts.
> > Finally, people just think
> > differently: a file is a file is a file
> Except that it isn't. Data formats always have been, and still are,
> platform-dependent. While admittedly there is no general solution of
> converting files to another platform's format without knowing rather a
> lot about what the data formats are on the respective platforms, there
> _is_ a degree of agreement about the types of file for which a MIME
> type of text/something is apt.
Absolutely. My comment, "a file is a file is a file" wasn't a
justification for departing from standards - not at all. It was an
appeal for designers to stick to standards but, on top, to layer
allowances for the way people think and use software. The two aren't
mutually exclusive in this case at all - NN could accommodate both, as
I outlined in my "Imagine I'm a browser" suggestions.
> (Don't get me started on the old IBM
> mainframe postscript format, which mixed ASCII and EBCDIC in the same
Eeek! I wrote S/370 assembler for a while - EBCDIC was nightmarish,
mixing it with ASCII would have been even worse.
> >  I
> > should massage (only) http downloads, according to their MIME type and
> > the platform I'm running on. But _by default_ I should only do that
> > on-the-fly, when I'm called upon to browse the file;
> You said that before, but it's still not right. A web browser can and
> should be designed to browse files which come with any of the popular
> newline formats.
Yes, you're right - NN does that, and can be configured to do it for
other filetypes and with other helper apps. But it doesn't have to
alter the actual downloaded file in order to make it viewable - that
can be done on-the-fly, either with a temporary file or in memory,
then, if the file isn't subsequently saved, the massaging can be
discarded without affecting what was downloaded.
> But other i.e non-web applications on each platform
> may well expect the platform-native text format - thus it is
> appropriate for any web browser, when saving text data to file, to
> normalise it into the platform's native text format.
It may be appropriate, but it's not necessarily so. It's certainly
appropriate enough that it should be default behaviour. But it should
happen by default when the file is _saved_, not automatically when
it's merely _viewed_. And making this "massage before saving" a
default behaviour but not the only available behaviour - i.e. by
offering a "Save file as type" option - means any inconvenience can be
avoided at the user's option. Regardless, it's never appropriate for a
download client to decide what other applications might benefit from -
it might offer translations as an option, but making sledgehammer
translations is bad engineering IM(H ;-)O.
> A requirement to save a text file in a platform-foreign format can be
> useful, indeed, but IMHO that _should_ be the exception rather than
> the rule.
I agree completely.
> > That way: by default any saved version of the file stays clean;
> A text file which uses platform-foreign newline conventions cannot be
> described as "clean" in my book, I must say.
Sorry, should have made my meaning clearer. By "clean" I didn't mean
"better" in any sense, I only meant "unaltered". There's no doubt
saving a text file in Windows format should be default behaviour on
Windows platforms, I just have a problem with it being the only
behaviour and being invisible.
> > Cheers, I'm off to drill for oil in Alaska ;->
> Good luck, beware of brass monkeys. :-}
Cold enough here tonight. Fire's lit, snow over much of the South
Island. Fine day tomorrow, might go tramping. Thanks again for your
input Alan - really helpful.