LISTSERV mailing list manager LISTSERV 15.5

Help for XML-L Archives


XML-L Archives

XML-L Archives


View:

Next Message | Previous Message
Next in Topic | Previous in Topic
Next by Same Author | Previous by Same Author
Chronologically | Most Recent First
Proportional Font | Monospaced Font

Options:

Join or Leave XML-L
Reply | Post New Message
Search Archives


Subject: Re: character encodings
From: "John E. Simpson" <[log in to unmask]>
Reply-To:General discussion of Extensible Markup Language <[log in to unmask]>
Date:Mon, 5 Apr 1999 09:16:12 -0400
Content-Type:text/plain
Parts/Attachments:
Parts/Attachments

text/plain (32 lines)


At 08:48 AM 4/5/1999 +0200, Lars Marius Garshol wrote:
> [various gentle corrections of my interpretation of the
> differences among and "meanings" of UTF-8, -16, et al.]

Thanks for clarifying and correcting all that, Lars. Especially gently. :)

If I read what you were saying correctly, there's fundamentally no
difference between what UTF-8 and UTF-16 are capable of representing, only
between *how* they represent it. Is that right?

Suppose, as Richard Lander (I think) did in his original post, that one has
to represent multiple character sets, such as Latin1, Hebrew, Greek, and
Korean, in a given document. Is there some encoding declaration that he
should *not* use (aside from US-ASCII, I mean)? Or, conversely, is there
some encoding that will cover him, regardless (or nearly so) *what* charset
might appear in the document?

It's interesting that document instances can declare their own encodings,
in the XML declaration, and that encodings cannot be constrained with a
DTD. (And for that matter, that DTDs have no way of identifying their own
encodings, which must therefore always presumed to be... what?)

[For those interested in complicating all this even further, there's a --
to me -- somewhat arcane thread running concurrently on XML-DEV, subject
"IE5.0 does not conform to RFC2376." It's a bit involved (current nestings
of cross-references to other posts are four and five deep), but the general
drift has to do with the relationship among encodings, MIME types, and
charsets.]
=============================================================
John E. Simpson          | It's no disgrace t'be poor,
[log in to unmask]      | but it might as well be.
                         |            -- "Kin" Hubbard

Back to: Top of Message | Previous Page | Main XML-L Page

Permalink



LISTSERV.HEANET.IE

CataList Email List Search Powered by the LISTSERV Email List Manager