LISTSERV mailing list manager LISTSERV 15.5

Help for HTML-WG Archives

HTML-WG Archives

HTML-WG Archives


Next Message | Previous Message
Next in Topic | Previous in Topic
Next by Same Author | Previous by Same Author
Chronologically | Most Recent First
Proportional Font | Monospaced Font


Join or Leave HTML-WG
Reply | Post New Message
Search Archives

Subject: Re: ISO/IEC 10646 as Document Character Set
From: Gavin Nicol <[log in to unmask]>
Reply-To:[log in to unmask]
Date:Mon, 1 May 95 22:07:11 EDT

text/plain (60 lines)

>The replacement text of the entity would be expressed by reference to
>the document character set. Such a reference may or may not have a
>representation in the system character set.  In the above case, neither
>ASCII nor EBCDIC have a suitable representation, and, if this is what
>you mean by "result", then, yes, it is system dependent as to how to
>interpret such a reference.
This is precisely what I am trying to say, though I should note that
the mapping from document character set to representation in the
system character set (to use your terminology), is not defined at all
by SGML. It is assumed that LATIN CAPITAL LETTER A will be mapped to 
LATIN CAPITAL LETTER A, but this is not *required* behaviour, though
it is obviously *desirable* behaviour.

>It seems that you aren't distinguishing properly between
>(1) the well-definedness of a numeric character reference as such; and
>(2) the interpretation of the character which the reference specifies
>    in terms of the system character set used in the parsing process.
Well, perhaps I do suffer from lack of precision. I'll pull out the
old excuse about not using English enough ;-) I've been talking about
case 2, because 1 is well defined (even if it's only an abstract

>For example, if I translate this entity to a document
>character set of KS C 5601:1987 (the primary Korean character set),
>then I would have to translate the numeric charref to &#10343; which
>is the equivalent character.  If I translated it to an entity whose
>document charset was ASCII, then, I could do one of the following:
>(1) indicate an error due to inability to translate; (2) translate it
>to &#26; ('SUB'), the ASCII substitution control character; (3) translate
>it to &#49; ('1') as an approximate mapping, etc.
Exactly. It is system dependent.

>What we can't do with numeric charrefs is to say they are interpreted
>according to the system character set (in general). We can only say this
>as a side-effect of the parsing process (or other processes) where we
>need to represent the referenced character according to the system character
>set(s) at hand.
Yes, this is entirely correct.

>I think you understand all of the above.  Perhaps we are just in violent
>agreement but are using different terminology?

I think so. As Dan has pointed out, clarity is not my strong point... 

Anyway, this is all really academic. Obviously the most desirable
behaviour is to map, and represent the characters in accordance with
ISO 10646. All I have been trying to show is that current browsers all
exhibit legal behaviour, including things like Mosaic L10N. I'm sure
this conversation is quite soporific to most readers...

Back to: Top of Message | Previous Page | Main HTML-WG Page



CataList Email List Search Powered by the LISTSERV Email List Manager