LISTSERV mailing list manager LISTSERV 15.5

Help for HTML-WG Archives


HTML-WG Archives

HTML-WG Archives


View:

Next Message | Previous Message
Next in Topic | Previous in Topic
Next by Same Author | Previous by Same Author
Chronologically | Most Recent First
Proportional Font | Monospaced Font

Options:

Join or Leave HTML-WG
Reply | Post New Message
Search Archives


Subject:

Re: Revised language on: ISO/IEC 10646 as Document Character Set

From:

Albert Lunde <[log in to unmask]>

Reply-To:

[log in to unmask]

Date:

Tue, 9 May 95 13:23:25 EDT

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (59 lines)



>% >Does the HTTP charset have to be a subset of 10646?
>% Yes.
>% >Why not just remove the restriction that the HTTP charset has to be a
>% >subset of 10646? I.e. remove the word "subset" somehow.
>% Because we cannot have ISO 10646 as the document charcater set then,
>% and we move back to square one without passing GO.
>I am lost. I believed that HTTP charset should be an 8 bit one.
>Couldn't we say that the HTTP charset may be either ISO 8859-1
>(or maybe each of the ISO 8859-x) or a suitable *encoding* of 10646 (there
>should be such a beast)? And that HTML browsers are requested to recognise
>all of these and to be able to display just a limited set of charsets?

I think you are confused.

There is nothing in the requirement that ISO 10646 be the document
character set that precudes using an eight-bit encoding like ISO Latin-1.

Let me try a "layman's" explaination, as I understand it.

The SGML document character set specifies the characters that will be
recognized by the SGML parsers. You can think of this as the internal
character representation used by SGML (though an implementation may do this
differently). This has nothing to do with the way characters go over the
wire.

Except for the question of resolving numeric references, what is
significant to SGML is mainly what characters are allowed in this set and
if they are markup or data.

The MIME/HTTP charset parameter specifies the name of a character encoding,
that is an actual mapping of octets (or groups of octets etc) going over
the wire into character names or glyphs.

The first requirement we are making is that all the characters in the range
of this function must correspond to characters in ISO 10646; this is a very
liberal requirement that seems to be true of nearly all real character
encodings in use. (Even if it's not true we can't do much better here!)

We are also requiring that numeric references in HTML be interpreted
according to the corresponding positions in ISO 10646, NOT the position in
the current HTTP character encoding or some other misc. character set.

This makes the SGML nicer and ensures that numeric references will be
consistent across all encodings. This is the real signficance of talking
about using an SGML document character set that is a subset of ISO 10646;
it has little to do with the HTTP encoding.

We are not addressing all the questions of what it means to support a
subset of ISO 10646 here, just saying "if you want to go beyond Latin-1
play by these rules...".

---
    Albert Lunde [log in to unmask]





Back to: Top of Message | Previous Page | Main HTML-WG Page

Permalink



LISTSERV.HEANET.IE

CataList Email List Search Powered by the LISTSERV Email List Manager