LISTSERV mailing list manager LISTSERV 15.5

Help for HTML-WG Archives


HTML-WG Archives

HTML-WG Archives


View:

Next Message | Previous Message
Next in Topic | Previous in Topic
Next by Same Author | Previous by Same Author
Chronologically | Most Recent First
Proportional Font | Monospaced Font

Options:

Join or Leave HTML-WG
Reply | Post New Message
Search Archives


Subject: Re: Revised language on: ISO/IEC 10646 as Document Character Set
From: Glenn Adams <[log in to unmask]>
Reply-To:[log in to unmask]
Date:Wed, 10 May 95 09:34:04 EDT
Content-Type:text/plain
Parts/Attachments:
Parts/Attachments

text/plain (57 lines)




    Date: Wed, 10 May 95 00:33:50 EDT
    From: [log in to unmask] (Erik van der Poel)

    SGML parsers parse documents in the "document character set",

Not necessarily.  SGML parsers parse documents using a system character
set.  That system character set must not be inconsistent with the
document character set.  However, it doesn't mean it has to be identical
to the document character set.

    My only excuse is that I haven't looked at those tables for a while, and
    in the past they *did* appear to have the problem I mentioned.  Sorry.

All the zenkaku roman have been in Unicode since version 1.0.  Early versions
of the mapping tables, however, only specified Kanji mappings, while the
mappings of the non-Han characters were being determined.  Whether the tables
had those mappings or not had no bearing on whether Unicode was a superset
from its first release -- which it was.

    As far as I can recall, someone even asked about
    these new Taiwanese character sets on the net, and Glenn (or some other
    Unicoder) answered that Unicode/10646's repertoire was frozen before
    those Taiwanese character sets hit the streets.

It is correct that the latest version of CNS 11643 (1994) contains
characters which are not yet in 10646/Unicode.  10646/Unicode is not
frozen for all time, though.  New characters will be added.  This
addition process takes time and depends on submissions from National
Standards Bodies.  So far, Taiwan, which is not an ISO P-Member (but
TCA, which is a class C liaison ISO member), has not submitted the full
collection of new characters in CNS 11643 to the Ideographic Rapporteur
Group (SC2/WG2/IRG) whose current job includes specifying an extension
to the Han repertoire in 10646 (and which will eventually go into Unicode).
Furthermore, TCA nor Taiwan have registered this new version with ISO
for use with ISO 2022.   Given these facts, you should not infer either
that 10646 is inadequate now or will remain inadequate.  It remains to
be seen whether any user community will even develop in Taiwan around
this new standard (BIG5 being the current widespread standard).

Even if 10646 is the document character set, it remains possible to
represent and proceess all of the characters in this new version of 11643
(or any other character repertoiire which includes characters not present
in 10646); namely, use SDATA general entities.

    So, how about defining the "document character set" to be the union
    of the "charset" and 10646?

We can't redefine "document character set".  It already has a fixed,
known definition.  Furthermore, as I indicated above, it isn't even
necessary.

Glenn



Back to: Top of Message | Previous Page | Main HTML-WG Page

Permalink



LISTSERV.HEANET.IE

CataList Email List Search Powered by the LISTSERV Email List Manager