David Goldsmith recently commented :
>This is probably the section I have the most problem with. Unicode
>specifically was designed with the idea that attributes such as
>language, fonts, etc. would be encoded out-of-band, via high level tags
>or even out of the character stream entirely.
I tend to wholeheartly agreed with David's thought. I have great
reservations with the idea of using the UNICODE private use area for
encoding language hints. I have basically four reasons :
1) As David mentionned Unicode was explicitly designed not to address
2) How do you generalise this idea with encodings where there is no
bytes left for language hinting ? I can write French, Dutch, English
and German, for instance, using ISO-8859-1 : do I have to use Unicode
even in a purely European setting so that I can tag texts ? What about
the fact that today the text base available is mainly in ISO-latin-1 ?
3) It is easy to add, in upward compatible fashion, a tag called, for
example, <lang=...>. Browsers that do not understand the tag will
simply ignore it.
4) I have the impression that this may not be the proper forum
(html-wg, http-wg) to discuss changes of interpretation of Unicode
characters or codes. I am not convinced that these changes will easily
be accepted by the Unicode consortium. It might be much easier to
create an html tag for this purpose.
Alis Technolgies Inc.
1+514+738 91 71