LISTSERV mailing list manager LISTSERV 16.0

Help for HTML-WG Archives


HTML-WG Archives

HTML-WG Archives


HTML-WG@LISTSERV.HEANET.IE


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

HTML-WG Home

HTML-WG Home

HTML-WG  May 1995

HTML-WG May 1995

Subject:

Re: Revised language on: ISO/IEC 10646 as Document Character Set

From:

Glenn Adams <[log in to unmask]>

Reply-To:

[log in to unmask]

Date:

Wed, 10 May 95 09:34:04 EDT

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (57 lines)



    Date: Wed, 10 May 95 00:33:50 EDT
    From: [log in to unmask] (Erik van der Poel)

    SGML parsers parse documents in the "document character set",

Not necessarily.  SGML parsers parse documents using a system character
set.  That system character set must not be inconsistent with the
document character set.  However, it doesn't mean it has to be identical
to the document character set.

    My only excuse is that I haven't looked at those tables for a while, and
    in the past they *did* appear to have the problem I mentioned.  Sorry.

All the zenkaku roman have been in Unicode since version 1.0.  Early versions
of the mapping tables, however, only specified Kanji mappings, while the
mappings of the non-Han characters were being determined.  Whether the tables
had those mappings or not had no bearing on whether Unicode was a superset
from its first release -- which it was.

    As far as I can recall, someone even asked about
    these new Taiwanese character sets on the net, and Glenn (or some other
    Unicoder) answered that Unicode/10646's repertoire was frozen before
    those Taiwanese character sets hit the streets.

It is correct that the latest version of CNS 11643 (1994) contains
characters which are not yet in 10646/Unicode.  10646/Unicode is not
frozen for all time, though.  New characters will be added.  This
addition process takes time and depends on submissions from National
Standards Bodies.  So far, Taiwan, which is not an ISO P-Member (but
TCA, which is a class C liaison ISO member), has not submitted the full
collection of new characters in CNS 11643 to the Ideographic Rapporteur
Group (SC2/WG2/IRG) whose current job includes specifying an extension
to the Han repertoire in 10646 (and which will eventually go into Unicode).
Furthermore, TCA nor Taiwan have registered this new version with ISO
for use with ISO 2022.   Given these facts, you should not infer either
that 10646 is inadequate now or will remain inadequate.  It remains to
be seen whether any user community will even develop in Taiwan around
this new standard (BIG5 being the current widespread standard).

Even if 10646 is the document character set, it remains possible to
represent and proceess all of the characters in this new version of 11643
(or any other character repertoiire which includes characters not present
in 10646); namely, use SDATA general entities.

    So, how about defining the "document character set" to be the union
    of the "charset" and 10646?

We can't redefine "document character set".  It already has a fixed,
known definition.  Furthermore, as I indicated above, it isn't even
necessary.

Glenn


Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

August 1996
July 1996
June 1996
May 1996
April 1996
March 1996
February 1996
January 1996
December 1995
November 1995
October 1995
September 1995
August 1995
July 1995
June 1995
May 1995
April 1995
March 1995
February 1995
January 1995
December 1994
November 1994
October 1994
September 1994
August 1994
July 1994

ATOM RSS1 RSS2



LISTSERV.HEANET.IE

CataList Email List Search Powered by the LISTSERV Email List Manager