LISTSERV mailing list manager LISTSERV 15.5

Help for HTML-WG Archives


HTML-WG Archives

HTML-WG Archives


View:

Next Message | Previous Message
Next in Topic | Previous in Topic
Next by Same Author | Previous by Same Author
Chronologically | Most Recent First
Proportional Font | Monospaced Font

Options:

Join or Leave HTML-WG
Reply | Post New Message
Search Archives


Subject:

Revised language on: ISO/IEC 10646 as Document Character Set

From:

[log in to unmask] (Dan Connolly)

Reply-To:

[log in to unmask]

Date:

Fri, 5 May 95 21:45:02 EDT

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (78 lines)



Martin J. Duerst writes:
 >
 > >> http://www.w3.org/hypertext/WWW/MarkUp/html-spec/html-spec_2.html#SEC8
 > >> |HTML Lexical Syntax
 > >> |
 > >> | ... A minimally conforming HTML user agent must support the SGML
 > >> | declaration in section SGML Declaration for HTML, which specifies ISO
 > >> | Latin 1 (@@full name) as the document character set; it may support
 > >> | other SGML declarations, in particular, SGML declarations with other
 > >> | document character sets.
 >
 > Why not write it like this (another compromize):
 > "in particular, SGML declarations with ISO10646 as the document
 > character set."

Right. Try the latest version on for size:

Blech. Lemme try again.... OK. That's better.

I moved this discussion into the conformance section (it took me a
while to find it where it used to be: under "Lexical syntax"). That
way, the "Character Content" and "Document representation" parts don't
have to change if/when we revise the whole thing or excerpt parts for
other documents.

I actually make ISO10646 a binding constraint without putting it
in the public text (the SGML declaration). See what you think:


http://www.w3.org/hypertext/WWW/MarkUp/html-spec/html-spec_1.html#SEC4
|Documents
|
|A document is a conforming HTML document only if:
|[...]
|Its document character set includes ISO-8859-1 and agrees with
|ISO10646; that is, each code position listed in section The ISO-8859-1
|Coded Character Set is included, and each code position in the
|document character set is mapped to the same character as ISO10646
|designates for that code position. (1)

http://www.w3.org/hypertext/WWW/MarkUp/html-spec/html-spec_foot.html#FOOT1
|(1)
|
|The document character set is somewhat independent of the character
|encoding scheme used to represent a document. For example, the
|ISO-2022-JP character encoding scheme can be used for HTML documents,
|since its repertoire is a subset of the ISO10646 repertoire. The
|crititcal distinction is that numeric character references agree with
|ISO10646 regardless of how the document is encoded.


http://www.w3.org/hypertext/WWW/MarkUp/html-spec/html-spec_1.html#SEC5
|User Agents
|
|An HTML user agent conforms to this specification if:
|[...]
|It supports the ISO-8859-1 character encoding scheme, and processes
|each character in the ISO Latin Alphabet Nr. 1 as specified in section
|The ISO Latin 1 Character Repertoire. (3)

http://www.w3.org/hypertext/WWW/MarkUp/html-spec/html-spec_foot.html#FOOT3
|(3)
|
|To support non-western writing systems, HTML user agents should
|support the Unicode-1-1-UTF-8 and Unicode-1-1-UCS-2 encodings and as
|much of the character repertoire of ISO10646 as is possible as well.


How's that for a compromise?

(note that the text and postscript versions are a bit out of date
right now...)

Dan



Back to: Top of Message | Previous Page | Main HTML-WG Page

Permalink



LISTSERV.HEANET.IE

CataList Email List Search Powered by the LISTSERV Email List Manager