Quote:> > I'm not an expert in this area, but I believe that you will need to
> > URL encode any spaces in the URL references in your html pages. Bare
> > spaces are not allowed in HTTP requests, and it seems that some (all?)
> > browsers fail to properly encode them for you.
> From what I can tell the browser is supposed to do any encoding for you.
In what context?
The procedures for submitting a FORM are carefully spelled out in
the _HTML_ specification.
The procedures for handling a URL are no business of HTML as such:
how to write a URL correctly is a subject for RFCs (1738, 2396 etc.).
The only point of interest as far as HTML is concerned is how to
represent a properly-coded URL in an HTML attribute value such as
HREF, and this is quite simple, since the only character of interest is
the ampersand (all other risky characters will have already been
converted to %xx-escapes according to the URL rules). Where
a URL contains a query part that's using ampersand as a separator,
then the HTML rules call for the & to be coded in the HREF etc. as
& (or the &#number; equivalent); whereas an ampersand that is meant
to be part of the data would already have been coded in %xx format
because of the URL rules.
Then again, there's the question of how a browser should handle URLs
that are supplied by user dialogues such as the "Open URL" widget. This
is not clearly defined by the interworking specifications, as far as I
can see. But browser dialogs should clearly _not_ screw-up a
properly-formed URL according to the URL specifications. They might
allow some additional latitude however (e.g accepting a space and coding
it to %20 for you).
Quote:> Leading and trailing white space is stripped, but embedded spaces shouldn't
> be.
Properly formed URLs don't have embedded spaces.
Quote:> The standard seems to allow you to use character references. I didn't see
> a entity refernce for a space, but you can use the numeric reference:
>  
You really should keep this clearly organised in the mind. The rules
for URLs are one thing: the rules for coding a URL into an HREF etc.
value are another thing.
The URL rules don't have any use for &#number; notation. By the time
you've got a properly-formed URL, it doesn't have spaces in it (they
have been turned into %20), so there's no reason for you to be wanting
to write   at all in your HTML.
Quote:> I tried this stuff in Opera and using space,  , and %20 all worked.
The properly-formed URL contains %20; - anything else can be regarded as
error fixup (but note the form-submission convention of representing
space by "+" and, of course, representing "+" by its %xx representation.
I dare to repeat this again. First, form your correct URL according to
the URL RFCs. Then, and only then, consider how to code that into an
HTML HREF. Any other procedure leads to danger.
Quote:> I think the latter is broken and instead of send the URL as is, it should
> have been converted to %3720 in the URL sent to the server.
I say you are mistaken. The last one is the only correct HREF; the
other two are in error and the browser is fixing them up as best it can.
%20 is the correct representation of a space in a URL. A URL containing
a * space is illegal (according to the RFC). A URL (as opposed to
an HREF) containing   is meaningless, and an HREF containing  
is functionally identical to one containing a * space, i.e the URL
which it represents is illegal.
RFC2396 says:
2.4. Escape Sequences
Data must be escaped if it does not have a representation using an
unreserved character;
2.4.2. When to Escape and Unescape
A URI is always in an "escaped" form, since escaping or unescaping a
completed URI might change its semantics. Normally, the only time
escape encodings can safely be made is when the URI is being created
from its component parts;
2.4.3. Excluded US-ASCII Characters
Although they are disallowed within the URI syntax, we include here a
description of those US-ASCII characters that have been excluded and
the reasons for their exclusion.
[...]
The space character is excluded because significant spaces may
disappear and insignificant spaces may be introduced when URI are
transcribed or typeset or subjected to the treatment of word-
processing programs. Whitespace is also used to delimit URI in many
contexts.
In other words, if your URL had been intended to contain a percent
character, as you have hypothesized, then that percent character would
have been represented in the URL in its %xx form already. It was not,
and therefore it can only be understood as part of the %20 combination
that represents a space. By the way, %37 is the digit "7": you meant to
say %25 I presume.