Netscape newlines confuse e.g. Debian dpkg-source

Netscape newlines confuse e.g. Debian dpkg-source

Post by Dubya the ecovand » Wed, 12 Jun 2002 22:25:37



Hope this helps someone. For years, versions of Netscape Navigator
have "cleverly" converted the Unix newline to DOS format in text
files, by adding a carriage return. Even being generally aware of it
didn't stop me from getting caught out recently. I used "Save Link
As..." to download a Debian ".dsc", a description file used when
patching source code. Running

dpkg-source -x <thefile>.dsc

caused dpkg-source to puke, reporting that it had found a PGP body in
the .dsc file but no signature. After a lot of searching I found a
related bug report, mentioning that dpkg-source didn't like extra
newlines before PGP signatures in .dsc files. Sure enough, a quick
check with an editor showed that to be the problem, and fiddling with
various download methods proved that Netscape's "Save Link As..." had
caused it. The best solution is to avoid Windows, but that's not
always easy when, say, bootstrapping GNU/Linux onto a new PC via the
Net. Second-best is to use a standalone ftp client when downloading
Unix files in Windows. (Remember to use 'binary' download format.)
Finally, a fast'n'* fix is to use Netscape anyway but then load
the resultant corrupted file into an editor that's capable of undoing
Netscape's mess. For example, PFE (the Programmer's File Editor) can
recognise whether a file is in DOS or Unix format and, when saving,
can be made to convert from one format to the other. It's also worth
mentioning another "Netscape-ism": the browser's habit of defaulting
full-stops to underscores in filenames when saving files. This is
unrelated to the newline problem, but it can make a mess of Debian
filenames for example.

 
 
 

Netscape newlines confuse e.g. Debian dpkg-source

Post by Alan J. Flavel » Thu, 13 Jun 2002 01:10:26


On Jun 11, Dubya the ecovandal inscribed on the eternal scroll:

Quote:> Hope this helps someone. For years, versions of Netscape Navigator
> have "cleverly" converted the Unix newline to DOS format in text
> files, by adding a carriage return.

That's what's _supposed_ to happen to text files on a Windoze
platform.  Nothing particularly clever involved.

Quote:> dpkg-source -x <thefile>.dsc

> caused dpkg-source to puke, reporting that it had found a PGP body in
> the .dsc file but no signature. After a lot of searching I found a
> related bug report, mentioning that dpkg-source didn't like extra
> newlines before PGP signatures in .dsc files.

"Extra" newlines?  You didn't say anything about _extra_ newlines.

Quote:> Sure enough, a quick
> check with an editor showed that to be the problem,

Alright, which is it?  The newlines being massaged into DOS/Windows
format (which is correct behaviour), or _extra_ newlines being added?

Quote:> and fiddling with
> various download methods proved that Netscape's "Save Link As..." had
> caused it. The best solution is to avoid Windows,

Sounds as if you're using software that doesn't really understand text
format, but expects some kind of binary/image of a unix file.

I suggest you try shift/click and download, instead of viewing and
then save-As.

Quote:> Finally, a fast'n'* fix is to use Netscape anyway but then load
> the resultant corrupted file into an editor that's capable of undoing
> Netscape's mess.

It's not a "mess", it's the correct way of handling text files.
Your problem seems to be that you don't want text files.

While it's true that in the sort of restricted situation you describe,
it's possible to convert-back to unix newline format after the event,
there are situations where pure binary files (for example, disk
images) get sent out from an HTTP server inappropriately described as
text/plain (which is the default in many HTTPDs for unknown content
types), and that's where you really _do_ need to ask for a download
rather than allowing the browser to do its save-As procedure.

good luck

 
 
 

Netscape newlines confuse e.g. Debian dpkg-source

Post by Sami Sihvone » Fri, 14 Jun 2002 18:08:32


In article


Quote:>> Hope this helps someone. For years, versions of Netscape Navigator
>> have "cleverly" converted the Unix newline to DOS format in text
>> files, by adding a carriage return.
> That's what's _supposed_ to happen to text files on a Windoze
> platform.  Nothing particularly clever involved.

GNU Emacs is a good tool to solve this problem about different
newline types and it also solves problems with different charsets
at the same time.

--
Sami Sihvonen,
Chief Executive Officer,
Janiika Networks Corporation.

 
 
 

Netscape newlines confuse e.g. Debian dpkg-source

Post by Jukka K. Korpel » Fri, 14 Jun 2002 23:02:02



> GNU Emacs is a good tool to solve this problem about different
> newline types and it also solves problems with different charsets
> at the same time.

Just in case you're wondering what that message has got to do with the
topic of the discussion: "Sami Sihvonen", who has used different names
and addresses, is a longstanding trollish character in Finnish groups and
now seems to attack international groups as well. He used to write as a
devoted anti-academic anti-Linux Windows enthusiast, now changed clothes
(virtually), etc. He also copied an Emacs manual written by others onto
his own pages and labeled it with his own copyright mark (later claimed
that he had got permission and promised to send evidence, but never did,
of course).

Followups set to his "fan" group (in Finnish).

Quote:> --
> Sami Sihvonen,
> Chief Executive Officer,
> Janiika Networks Corporation.

Last time someone checked, "Sami Sihvonen" (which is probably the real
name) was advertizing the services of a *, asking people to send
him money in order to contact here.

--
Yucca, http://www.veryComputer.com/~jkorpela/

 
 
 

Netscape newlines confuse e.g. Debian dpkg-source

Post by Dubya the ecovand » Sat, 15 Jun 2002 01:05:18


Quote:> I suggest you try shift/click and download, instead of viewing and
> then save-As.

Read my original post more carefully. I didn't view the file in
Navigator on Windows - Why would I? It's a control file intended for
use in a Unix environment - and I clearly stated that I used "Save
Link As...", which has exactly the same effect as "shift/click and
download".

Quote:> While it's true that in the sort of restricted situation you describe,
> it's possible to convert-back to unix newline format after the event,
> there are situations where pure binary files (for example, disk
> images) get sent out from an HTTP server inappropriately described as
> text/plain (which is the default in many HTTPDs for unknown content
> types), and that's where you really _do_ need to ask for a download
> rather than allowing the browser to do its save-As procedure.

This is close to the nub of the problem, albeit somewhat short (IMHO)
of justifying Navigator's behaviour as correct. What happens is that
http downloads result in text/plain being massaged to suit the
browser's platform, whereas an ftp download of the same file will
leave it in its original state. Technical pedantry says the former is
appropriate. Common sense says it should

1) be documented!

2) be performed only on-the-fly, when the file is being rendered.

3) be offered as an option when the file is being saved, rather than
being assumed.

As things stand it's a gotcha, and that's what I was warning about in
my original post.

 
 
 

Netscape newlines confuse e.g. Debian dpkg-source

Post by Alan J. Flavel » Sat, 15 Jun 2002 02:57:33


On Jun 13, Dubya the ecovandal inscribed on the eternal scroll:

Quote:> Read my original post more carefully. I didn't view the file in
> Navigator on Windows - Why would I? It's a control file intended for
> use in a Unix environment - and I clearly stated that I used "Save
> Link As...", which has exactly the same effect as "shift/click and
> download".

You got me worried there. OK, I've just done some tests with NN4.79 on
Win/NT4.  I did a shift/click on a link which pointed to a unix-format
text file, and filed the result in a suitable place.  I then
investigated the saved file and I can assure you that no
carriage-returns had been added to the linefeeds in the saved file,
which was what I understood to be what you were wanting (but weren't
getting).

Quote:> This is close to the nub of the problem, albeit somewhat short (IMHO)
> of justifying Navigator's behaviour as correct. What happens is that
> http downloads result in text/plain being massaged to suit the
> browser's platform,

I'm willing to admit that I may have mis-described NN4's behaviour in
some detail, but really I have to say that my observations just
carried out now did seem to confirm my previous understanding of the
situation, whereas my observations don't seem to be compatible with
the problems that you are describing.

Quote:> whereas an ftp download of the same file will
> leave it in its original state.

That depends on whether the transfer is done in image (binary) or
so-called "ASCII" (text) mode (as I'm sure you'd agree).

Quote:> Technical pedantry says the former is
> appropriate. Common sense says it should

> 1) be documented!

fair comment.

Quote:> 2) be performed only on-the-fly, when the file is being rendered.

I don't agree, sorry.  The HTML spec recommends browsers to accept the
various cross-platform newline specifications (LF alone, CR+LF or CR
alone), and so browsers shouldn't need to do any conversion for the
purposes of rendering.  But other text utilities on the platform may
very well expect only the platform's native text format (and I'm not
talking only about Windows Notepad...).

When saving _text_ locally, the prima facie assumption IMHO should be
that the data is wanted in the local text format.

Quote:> 3) be offered as an option when the file is being saved, rather than
> being assumed.

Fair comment.

Quote:> As things stand it's a gotcha, and that's what I was warning about in
> my original post.

It's curious that Win NN4 doesn't seem to be doing that for me.  (Nor
in Mozilla 1.0, by the way).  I'm puzzled to know where the difference
lies.

all the best

 
 
 

Netscape newlines confuse e.g. Debian dpkg-source

Post by Dubya the ecovand » Sat, 15 Jun 2002 13:51:46



Quote:> [] OK, I've just done some tests with NN4.79 on
> Win/NT4.  I did a shift/click on a link which pointed to a unix-format
> text file, and filed the result in a suitable place.  I then
> investigated the saved file and I can assure you that no
> carriage-returns had been added to the linefeeds in the saved file,
> which was what I understood to be what you were wanting (but weren't
> getting).

-ish ;-)   What I wanted, but wasn't getting, is the other way around:
I wanted Unix textfiles to be left alone, so I could download them
from Windows, but use them in Debian without first having to undo NN's
behind-the-scenes massaging. IOW I wanted NN's download client merely
to perform a download, not to insert CRs before LFs in some cases.

FWIW I use NN4.74 (IIRC) but the behaviour goes back a long way before
that and hasn't altered subsequently AFAIK. My guess is your test
download didn't insert extra CRs because either 1) the URL you
shift/clicked on began with ftp:// not http:// or 2) it was an http://
download but the file's MIME type wasn't marked as "text/plain". In
either of those cases NN will do as you describe: it will make an
exact copy of the file, without modifying it on the assumption that
the download should be Microsoft-ised. (Whether the server's address
is ftp.xxxx.yyy isn't at issue here, of course. Servers can be called
anything within reason; NN's behaviour depends on whether the transfer
protocol that's used for the download is ftp or http.)

Try these test downloads, they give me two different versions of the
"same" file. Visit http://packages.debian.org/stable/base/bash.html
and download from the '[dsc]' link near the bottom of the page. This
performs an http download from a Unix server, of a Unix textfile with
MIME type "text/plain". On Windows, NN will Microsoft-ise the file; it
won't be usable under Debian without undoing NN's shenanigans. Next,
save the link location (instead of downloading from it), paste it into
NN's Location bar, replace the 'http://' with 'ftp://' and knock the
filename (bash_2.03-6.dsc) off the end. Visit that URL -
ftp.debian.org/debian/dists/potato/main/source/base - which is the
same one that targeted the http download but this time will invoke
NN's ftp client thanks to the different prefix. You'll see a jazzified
ftp directory-listing, and the same file as before (bash_2.03-6.dsc)
will appear in it. Download that file and you'll find that NN hasn't
altered it - it'll be smaller than the massaged one NN gave you before
because it won't have CRs inserted throughout, and it'll be usable
directly in Debian just as intended.

The gotcha is that - rightly or wrongly - when most people download a
file _purely to save it, not intending to display or otherwise use it
on their browser's platform_ - the file alone is what they focus on.
The transfer protocol that's used for fetching it is incidental: they
don't expect to have to take it and their download platform into
account, in order to decide whether they're about to be given the file
they want or some invisibly-massaged version of it. That seems a lot
more sensible than NN's behaviour (however strictly correct NN may
be): when you order a pizza you don't expect to have to ask, "Will it
be delivered by moped or by car?" in order to know whether you'll get
a pizza or a Chinese takeaway ;-)

It can be argued that NN is doing something sensible - but IMHO it's
only doing it half-way. Because platform-neutral (Y, r ;-) markup is
what's typically expected via http, and ftp is available for binary
transfers, it's kind of reasonable to dictate that an http download
will be massaged to suit the browser's platform. (After all, we want
Unix-format textfiles to be viewable directly on Windows browsers and
vice versa; we don't want to have to alter all those files on their
servers, nor do we want them to be rendered as-is as long lines filled
with little black rectangles ;-)  So, the reasoning goes: if it's an
http download, massage the result according to its MIME type so it
suits the browser's platform. If you want Unix-clean downloads either
use an ftp server (for which NN will correctly invoke its ftp client
provided the URL is ftp://) or use http but set the file's MIME type
to something other than "text/plain".

Well that's nice reasoning as far as it goes but, as I said, it
doesn't always work. Not every file is offered both for browsing via
http and for download via ftp. Where http alone is offered, from
archive servers whose principal role is to supply downloads not
viewable content, many files are inappropriately marked "text/plain"
which will result in NN massaging them. Finally, people just think
differently: a file is a file is a file - you shouldn't get a Chinese
when you ordered a pizza.

The solution seems obvious to me. Imagine I'm a browser. First, I
should massage (only) http downloads, according to their MIME type and
the platform I'm running on. But _by default_ I should only do that
on-the-fly, when I'm called upon to browse the file; then _by default_
I chuck the changes away without applying them to any saved version of
the file. That way: by default any saved version of the file stays
clean; as expected, a download simply provides an exact copy of the
original regardless of the transfer protocol or the browser's platform
- there are no invisible surprises and I haven't made any brazen
assumptions about the future; yet my on-the-fly changes still allow,
say, Unix textfiles to be browsed correctly on Windows. Similarly,
when downloading but not immediately displaying a file, by default I
keep my mitts off it; I've no idea how or on what platform it will be
used in future, so I shouldn't be altering it according to baseless
assumptions. Finally, I allow the user to override the defaults in a
controlled manner: any time they choose to save a file I offer the
standard "Save as type..." option; that way, if they want me to make
my massaging permanent locally - or even to undo massaging I performed
in the past - I can do so but only with their knowledge and within
their control.

Quote:> > whereas an ftp download of the same file will
> > leave it in its original state.
> That depends on whether the transfer is done in image (binary) or
> so-called "ASCII" (text) mode (as I'm sure you'd agree).

Quite :-)  Perhaps fortunately, this isn't something we have control
over with NN's built-in ftp client. And, of course, it doesn't affect
the problem outlined above.

Quote:> When saving _text_ locally, the prima facie assumption IMHO should be
> that the data is wanted in the local text format.

I agree, it's convenient default behaviour. But I don't consider that
it should always happen, least of all that it should happen invisibly:
the "Save as type..." option should be offered so users 1) can see
what's about to be done, and 2) can opt out and choose a clean image
instead, so they don't have to un-munge all their time-consuming
downloads (or, worse, get caught as I did with software that choked
because NN had altered a text-format control file).

Thanks for explaining that stuff about http servers offering
"text/plain" as the default MIME type and it being (incorrectly) left
in place in many cases. I've never set up a Web server but may need to
soon - definitely something I'll have to watch out for.

Cheers, I'm off to drill for oil in Alaska ;->

Dubya the Ecovandal

 
 
 

Netscape newlines confuse e.g. Debian dpkg-source

Post by Alan J. Flavel » Sun, 16 Jun 2002 03:04:33


On Jun 13, Dubya the ecovandal inscribed on the eternal scroll:

Quote:> FWIW I use NN4.74 (IIRC) but the behaviour goes back a long way before
> that and hasn't altered subsequently AFAIK. My guess is your test
> download didn't insert extra CRs because either 1) the URL you
> shift/clicked on began with ftp:// not http:// or 2) it was an http://
> download but the file's MIME type wasn't marked as "text/plain".

Curiouser and curiouser.  As it happens, my test URL was the one shown
here:

lynx -dump -head http://www.veryComputer.com/~flavell/tests/test.txt
HTTP/1.1 200 OK
Date: Fri, 14 Jun 2002 16:31:05 GMT
Server: Apache/1.3.20 (Unix) PHP/4.1.2
Last-Modified: Thu, 13 Jun 2002 17:27:10 GMT
ETag: "55846b-f6-3d08d5ee"
Accept-Ranges: bytes
Content-Length: 246
[...]

and it's definitely in Unix-newlines format.

Quote:> Try these test downloads, they give me two different versions of the
> "same" file. Visit http://www.veryComputer.com/
> and download from the '[dsc]' link near the bottom of the page. This
> performs an http download from a Unix server, of a Unix textfile with
> MIME type "text/plain". On Windows, NN will Microsoft-ise the file;

So it does.  How strange.

If I copy my test file from test.txt to test.dsc and send _that_ out
with text/plain, then I _do_ get the effect which you're complaining
about.  It grows from 246 to 260 bytes and - as you won't be surprised
to hear - the file contains 14 lines.  A picture seems to be emerging.

I apologise for being unaware of this curiosity before.

[detailed argument snipped]

Quote:> Well that's nice reasoning as far as it goes but, as I said, it
> doesn't always work.

http was designed on the assumption that the person making the file
available for use would know what kind of file it was, and advertise
it accordingly.  We've already discussed the issue of cross-platform
adjustment of text files, and I don't think we're going to get much
further on that topic.

FTP on the other hand seems to have been designed on the assumption
that the server side could not have any idea what kind of stuff was
being made available from it, and thus the client (or their software)
must guess.

Neither assumption is ideal in practice.

Your complaint was, fundamentally, about an application that thinks it
can handle text - but really expects what in some respects needs to be
a binary image, so IMHO the application is at fault (lack of
application portability).

OTOH _my_ beef would be with sites that make binary stuff available,
but allow it to be served out as the server's default of text/plain.

If I was *yminded, I would set the servers that I manage to
default to application/octet-stream (or, with IE in mind, to something
like application/x-forbid-even-IE-to-guess).  But then users would
complain that they had tried to read some file, let's say
README.rightnow, and got told they had to download it because it
wasn't a type of file that could be viewed.  Defaulting to text/plain
at least avoids that kind of irritation.

Quote:> Not every file is offered both for browsing via
> http and for download via ftp. Where http alone is offered, from
> archive servers whose principal role is to supply downloads not
> viewable content, many files are inappropriately marked "text/plain"
> which will result in NN massaging them.

Please don't think that I haven't understood the point that you're
making.  I just draw somewhat different conclusions from it, though.

Quote:> Finally, people just think
> differently: a file is a file is a file

Except that it isn't.  Data formats always have been, and still are,
platform-dependent.  While admittedly there is no general solution of
converting files to another platform's format without knowing rather a
lot about what the data formats are on the respective platforms, there
_is_ a degree of agreement about the types of file for which a MIME
type of text/something is apt. (Don't get me started on the old IBM
mainframe postscript format, which mixed ASCII and EBCDIC in the same
file).

Quote:> The solution seems obvious to me. Imagine I'm a browser. First, I
> should massage (only) http downloads, according to their MIME type and
> the platform I'm running on. But _by default_ I should only do that
> on-the-fly, when I'm called upon to browse the file;

You said that before, but it's still not right.  A web browser can and
should be designed to browse files which come with any of the popular
newline formats.  But other i.e non-web applications on each platform
may well expect the platform-native text format - thus it is
appropriate for any web browser, when saving text data to file, to
normalise it into the platform's native text format.

A requirement to save a text file in a platform-foreign format can be
useful, indeed, but IMHO that _should_ be the exception rather than
the rule.

Quote:> That way: by default any saved version of the file stays clean;

A text file which uses platform-foreign newline conventions cannot be
described as "clean" in my book, I must say.

Quote:> Cheers, I'm off to drill for oil in Alaska ;->

Good luck, beware of brass monkeys.  :-}

bye

 
 
 

Netscape newlines confuse e.g. Debian dpkg-source

Post by Dubya the ecovand » Sun, 16 Jun 2002 17:36:17


"Alan J. Flavell" <flav...@mail.cern.ch> wrote in message <news:Pine.LNX.4.40.0206141831290.1865-100000@lxplus033.cern.ch>...

> On Jun 13, Dubya the ecovandal inscribed on the eternal scroll:

> > FWIW I use NN4.74 (IIRC) but the behaviour goes back a long way before
> > that and hasn't altered subsequently AFAIK. My guess is your test
> > download didn't insert extra CRs because either 1) the URL you
> > shift/clicked on began with ftp:// not http:// or 2) it was an http://
> > download but the file's MIME type wasn't marked as "text/plain".

> Curiouser and curiouser.  As it happens, my test URL was the one shown
> here:

> lynx -dump -head http://ppewww.ph.gla.ac.uk/~flavell/tests/test.txt
> HTTP/1.1 200 OK
> Date: Fri, 14 Jun 2002 16:31:05 GMT
> Server: Apache/1.3.20 (Unix) PHP/4.1.2
> Last-Modified: Thu, 13 Jun 2002 17:27:10 GMT
> ETag: "55846b-f6-3d08d5ee"
> Accept-Ranges: bytes
> Content-Length: 246
> [...]

> and it's definitely in Unix-newlines format.

You're right, of course - that's a doozy! Shows there's no difference
in this behaviour between our versions of NN or Windows, at least.

> If I copy my test file from test.txt to test.dsc and send _that_ out
> with text/plain, then I _do_ get the effect which you're complaining
> about.  It grows from 246 to 260 bytes and - as you won't be surprised
> to hear - the file contains 14 lines.  A picture seems to be emerging.

Yes, it looks as though NN on Windows is also taking the
file-extension into account. (Inserts plug for Debian, and for
GNU/Linux in general: if we had the source it'd be easy to tell for
sure!) What's the bet it does the same as with .dsc if you use, say,
.jnk or .ajf, or anything else that isn't so likely to be recognised
as identifying the internal format?

> I apologise for being unaware of this curiosity before.

What's to apologise for? Pat yourself on the back. This thread's
taught us both something, and wildly exceeded my original hopes:
anyone puzzled by NN's behaviour in future need only do a Usenet
search and they'll find here a detailed discussion of what to beware
of and how to avoid it.

> FTP on the other hand seems to have been designed on the assumption
> that the server side could not have any idea what kind of stuff was
> being made available from it, and thus the client (or their software)
> must guess.

Pretty much. It was so much earlier that the MIME types couldn't be
incorporated, so all it offered was ASCII or BINARY for download
modes. Fair enough for its time, and still useful: you have to control
what you get, but the advantage is you're _always_ in control of what
you get.

> Your complaint was, fundamentally, about an application that thinks it
> can handle text - but really expects what in some respects needs to be
> a binary image, so IMHO the application is at fault (lack of
> application portability).

Well ... not really. The application (dpkg-source) isn't designed to
handle "text" generically, only Unix-format text. On one hand it
wouldn't kill the designers to make dpkg-source less fussy about
stripping white-space and, while they were at it, to treat extra CRs
as white space even though they basically never appear in Unix
textfiles. OTOH the programmer in me would side with their inevitable
protest: the app undertakes and fulfils a clear contract - to accept
files in a well-defined format - and it has every right to puke if
it's force-fed a different one, however closely related that format
might be. It certainly isn't a portable app, I agree, but then it's
for low-level handling of _D_ebian _P_ac_k_a_g_es so it makes sense
for it to work only with Debian-format files. One thing's certain: no
appeal that dpkg-source should be fixed to accommodate a quirk of
Windows-based NN would get anywhere ;-)

> OTOH _my_ beef would be with sites that make binary stuff available,
> but allow it to be served out as the server's default of text/plain.

Hear, hear, mine too. I'm a bit ticked off that Debian's site does
exactly that, but I have to concede it's not a straightforward choice.
Using text/plain for .dsc files allows them to be viewed before
they're downloaded, on any browser and platform. That's a pretty
strong case for text/plain. OTOH NN (and no doubt other browsers too)
would allow itself or a helper app to be invoked for browsing files
with .dsc extensions anyway, even if application/octet-stream were
used (as is strongly hinted at by RFC2046 for non-"standard" text
types). Against that is that app/octet is such a ridiculous,
last-resort way to handle something as prevalent as Unix text format.
The relevant RFCs seem astonishingly silent in this respect - my
strong suspicion is it's there somewhere but I've overlooked it. It's
as if M$ drove the entire process and no-one had ever heard of Unix
text. I wonder if there's a text/x-lf-newlines or somesuch - that'd be
one approach. In the end I'm almost as bugged that the Debian files in
question are tagged text/plain (a clear violation of RFC2046) as I am
about NN's jackboot behaviour, in particular because those files are
principally made available for download not for viewing.

> > Not every file is offered both for browsing via
> > http and for download via ftp. Where http alone is offered, from
> > archive servers whose principal role is to supply downloads not
> > viewable content, many files are inappropriately marked "text/plain"
> > which will result in NN massaging them.
> Please don't think that I haven't understood the point that you're
> making.

No, not at all :-)  I started this thread for the benefit of other
readers, hence my long-winded and detailed posts.

> > Finally, people just think
> > differently: a file is a file is a file
> Except that it isn't.  Data formats always have been, and still are,
> platform-dependent.  While admittedly there is no general solution of
> converting files to another platform's format without knowing rather a
> lot about what the data formats are on the respective platforms, there
> _is_ a degree of agreement about the types of file for which a MIME
> type of text/something is apt.

Absolutely. My comment, "a file is a file is a file" wasn't a
justification for departing from standards - not at all. It was an
appeal for designers to stick to standards but, on top, to layer
allowances for the way people think and use software. The two aren't
mutually exclusive in this case at all - NN could accommodate both, as
I outlined in my "Imagine I'm a browser" suggestions.

> (Don't get me started on the old IBM
> mainframe postscript format, which mixed ASCII and EBCDIC in the same
> file).

Eeek! I wrote S/370 assembler for a while - EBCDIC was nightmarish,
mixing it with ASCII would have been even worse.

> > [] I
> > should massage (only) http downloads, according to their MIME type and
> > the platform I'm running on. But _by default_ I should only do that
> > on-the-fly, when I'm called upon to browse the file;
> You said that before, but it's still not right.  A web browser can and
> should be designed to browse files which come with any of the popular
> newline formats.

Yes, you're right - NN does that, and can be configured to do it for
other filetypes and with other helper apps. But it doesn't have to
alter the actual downloaded file in order to make it viewable - that
can be done on-the-fly, either with a temporary file or in memory,
then, if the file isn't subsequently saved, the massaging can be
discarded without affecting what was downloaded.

> But other i.e non-web applications on each platform
> may well expect the platform-native text format - thus it is
> appropriate for any web browser, when saving text data to file, to
> normalise it into the platform's native text format.

It may be appropriate, but it's not necessarily so. It's certainly
appropriate enough that it should be default behaviour. But it should
happen by default when the file is _saved_, not automatically when
it's merely _viewed_. And making this "massage before saving" a
default behaviour but not the only available behaviour - i.e. by
offering a "Save file as type" option - means any inconvenience can be
avoided at the user's option. Regardless, it's never appropriate for a
download client to decide what other applications might benefit from -
it might offer translations as an option, but making sledgehammer
translations is bad engineering IM(H ;-)O.

> A requirement to save a text file in a platform-foreign format can be
> useful, indeed, but IMHO that _should_ be the exception rather than
> the rule.

I agree completely.

> > That way: by default any saved version of the file stays clean;
> A text file which uses platform-foreign newline conventions cannot be
> described as "clean" in my book, I must say.

Sorry, should have made my meaning clearer. By "clean" I didn't mean
"better" in any sense, I only meant "unaltered". There's no doubt
saving a text file in Windows format should be default behaviour on
Windows platforms, I just have a problem with it being the only
behaviour and being invisible.

> > Cheers, I'm off to drill for oil in Alaska ;->
> Good luck, beware of brass monkeys.  :-}

Cold enough here tonight. Fire's lit, snow over much of the South
Island. Fine day tomorrow, might go tramping. Thanks again for your
input Alan - really helpful.

Dubya

 
 
 

Netscape newlines confuse e.g. Debian dpkg-source

Post by Alan J. Flavel » Mon, 17 Jun 2002 00:53:51


On Jun 15, Dubya the ecovandal inscribed on the eternal scroll:

Quote:> Against that is that app/octet is such a ridiculous,
> last-resort way to handle something as prevalent as Unix text format.
> The relevant RFCs seem astonishingly silent in this respect - my
> strong suspicion is it's there somewhere but I've overlooked it. It's
> as if M$ drove the entire process and no-one had ever heard of Unix
> text. I wonder if there's a text/x-lf-newlines or somesuch

Even if there's nothing suitable registered at IANA, here's nothing to
stop one from defining private MIME types.  There has been some
discussion as to whether private/experimental types should or should
not be prefixed with x- when used for HTTP: my own inclination is to
say that they should, at any rate unless a IANA registration is
planned.

So you could have application/x-debian-source or even
x-debian/x-package-source (ok, that latter one is probably not a
good use of the MIME major content type field! - how about
something like application/vnd.debian-source ?)

When the URL is opened, the server then sends this characteristic MIME
type; on first encounter, the browser prompts the user with the usual
download-to-file or open with application? dialog; and the user can
then even tell the browser what they'd like to happen for the next
time - you know the drill: [x] open this type with that application,
and [x] always prompt me (whatever the exact wording is).

This is how the WWW was meant to happen IMHO: you aren't limited to
just text/plain and application/octet-stream ;-)

On Windows I would then define PFE32 to be my "viewer" for this
content-type, and I reckon we'd both be happy.  I just ran that
scenario with http://ppewww.ph.gla.ac.uk/~flavell/tests/test.dsc after
changing the .htaccess-defined mime type, and it works just fine.

Sure, you don't get to view the file in the browser any more, but you
can use any appropriate (i.e not Notepad!) application as your viewer.

(And I'm tired of servers telling me that RedHat RPMs are some kind of
multimedia format, grumble).

best regards

 
 
 

Netscape newlines confuse e.g. Debian dpkg-source

Post by Dubya the ecovand » Mon, 17 Jun 2002 09:45:51



Quote:> On Jun 15, Dubya the ecovandal inscribed on the eternal scroll:

> > Against that is that app/octet is such a ridiculous,
> > last-resort way to handle something as prevalent as Unix text format.
> > The relevant RFCs seem astonishingly silent in this respect - my
> > strong suspicion is it's there somewhere but I've overlooked it. It's
> > as if M$ drove the entire process and no-one had ever heard of Unix
> > text. I wonder if there's a text/x-lf-newlines or somesuch

> Even if there's nothing suitable registered at IANA, here's nothing to
> stop one from defining private MIME types.  There has been some
> discussion as to whether private/experimental types should or should
> not be prefixed with x- when used for HTTP: my own inclination is to
> say that they should, at any rate unless a IANA registration is
> planned.

I agree. There's quite a pile of text/x-**** 'defined' already for
Unix, and no doubt if I searched among them I'd find one that would
work for dpkg-source input files. But I don't know that I'd solve much
that way. It'd require all the server operators to change their MIME
types, for a start, and whilst I can imagine something like
text/x-unix (or, to make it less platform-specific since Unix doesn't
have a monopoly on LF-only newlines, text/x-lf-newlines say) becoming
widely accepted I can't imagine people bothering to apply something as
specific as text/x-dpkg-source in droves. Also, if I were to define
x-dpkg-source, by rights I should define it _properly_ - not merely "a
file in Unix text format" but "a file adhering to the following
Backus-Naur grammar". Get real! ;-)   Besides, that's the kind of
thing that should be addressed by XML, not with endless variants on
the MIME text/ type. One for the wish-list I guess.

Quote:> When the URL is opened, the server then sends this characteristic MIME
> type; on first encounter, the browser prompts the user with the usual
> download-to-file or open with application? dialog; and the user can
> then even tell the browser what they'd like to happen for the next
> time - you know the drill: [x] open this type with that application,
> and [x] always prompt me (whatever the exact wording is).

Yeah, that's not far from what happens now eh? It works OK.

Quote:> On Windows I would then define PFE32 to be my "viewer" for this
> content-type, and I reckon we'd both be happy.  I just ran that
> scenario with http://ppewww.ph.gla.ac.uk/~flavell/tests/test.dsc after
> changing the .htaccess-defined mime type, and it works just fine.
> Sure, you don't get to view the file in the browser any more, but you
> can use any appropriate (i.e not Notepad!) application as your viewer.

That's the first thing I did with NN for Windows as soon as I
understood how it would otherwise stomp on text/plain Unix textfiles.
(The second thing was to start this thread to warn other people.)
Aside from the minor inconvenience - of having to install PFE32 or
other Unix-file viewer on Windows, do the config in NN, and wait a few
seconds for PFE32 to start each time - it works a treat. (No offence
but I can't quite discern from your post whether you know the
following already, so I'll offer it as a tip. You can do the above -
assign e.g. PFE32 as a helper app - even if the MIME type is
text/plain. The only thing you have to do is also specify the file
extension ".dsc", so NN gives special "Use PFE32" treatment to files
of that particular type. The limitation, of course, is that there are
lots of Unix textfiles out there that either don't have filename
extensions or whose extensions are other than .dsc, so it'd be much
more convenient merely to assign them a MIME type of their own. Oh
well ... :-)

Quote:> (And I'm tired of servers telling me that RedHat RPMs are some kind of
> multimedia format, grumble).

;-)   I had the same problem when I ran Red Hat for a while. The
solution would be XML in an ideal world, I guess. Let's see if it ever
gets that far before M$ hijacks what would otherwise be the
open-standard DTD mechanism. It's to get around that sort of nonsense
that I'm finally flagging away my Windows partition and swinging over
to Debian. And it's in the course of _that_ nonsense that I had to
download the dpkg-source .dsc files, and discovered this NN behaviour,
in the first place. Feels like playing the Kevin Bacon game with
software packages ;-)

Thanks again Alan, cheers,

Dubya

 
 
 

Netscape newlines confuse e.g. Debian dpkg-source

Post by Lee Sau Da » Tue, 25 Jun 2002 16:27:14


    Dubya> I agree. There's quite a pile of text/x-**** 'defined'
    Dubya> already for Unix, and no doubt if I searched among them I'd
    Dubya> find one that would work for dpkg-source input files.
...
    Dubya> becoming widely accepted I can't imagine people
    Dubya> bothering to apply something as specific as
    Dubya> text/x-dpkg-source in droves.

No!  That shouldn't  be text/*.  If you read the  MIME RFC, you should
know that text/* is only for  text or files that are _meaningful_ when
displayed as text.  The text/*  types may invite some browsers or mail
agents to display it on  the screen directly.  So, while text/x-tex is
not a bad idea, Postscript shouldn't be text/*.  Rather, Postscript is
application/postscript,  even  though   (most)  postscript  files  are
viewable and editable with text editors or 'more' or 'less'.

--


Home page: http://www.informatik.uni-freiburg.de/~danlee