LISTSERV mailing list manager LISTSERV 15.5

Help for HTML-WG Archives

HTML-WG Archives

HTML-WG Archives


Next Message | Previous Message
Next in Topic | Previous in Topic
Next by Same Author | Previous by Same Author
Chronologically | Most Recent First
Proportional Font | Monospaced Font


Join or Leave HTML-WG
Reply | Post New Message
Search Archives


Re: Directional characters, international dates, etc.


Martin J Duerst <[log in to unmask]>


[log in to unmask]


Mon, 21 Aug 95 09:05:15 EDT





text/plain (167 lines)

>I agree with James Clark on this:
>While in a rigorous (and UNTAGGED) Unicode world, these
>characters can be used to control formatting, in the real
>world, is anyone using them?

I would assume yes. Mostly, they will not be directly
visible, or directly input via a keyboard, though.
They may be inserted during a cut-and-paste operation,
or when explicit reordering for a certain stretch is applied
with a menu commmand.
What is definitely needed is the functionality they provide.
The draft is already trying to provide this as markup, and
I guess with the help of James Clark, we have made some
more progress on providing them ONLY as markup, without
major consequences for conversion and such.

>Normal BIDI text should be automaticly displayable with
>minimal intervention. Once the READING ORDER is also known,
>bidirectional textual display is correct. In Unicode, this
>is established by the first "hard"-directional character.
>(The RLM or LRM could be used as such.) Numerical strings
>within Hebrew or Arabic text (TAGGED by <LANG>!!!) would
>be formatted correctly without special overrides.

The idea of using the LANG attribute, or the language tag,
for default values of bidirectionality, is certainly worth
considering and should probably be adopted.
The question is whether this can address all cases.

>HTML is fully tagged, so--do we really need or want character
>entities that are little-understood?
Its not that much anymore a problem of characters. But the
functionality is needed, whether it is little understood (by many
non-experts) or better understood (by some specialists).

>If one needs to override the display, would not something like
> ... Hebrew text ... <LTR>(Hebrew, but left to right display)</LTR> ...
>be more clear and more likely to be correctly used than
> &rle; .... &lro; ..... &pdf; .... &pdf;
> Note that these must be NESTED correctly!

Of course. But <LTR> is not suited as a tag (or attribute) for
overriding. It should be <FORCELTR> or <LRO>.

>I would also say that reading order is more clearly a
>paragraph level attribute than demanding a special
>character code start a paragraph. (Such character or attribute
>--both could certainly be accepted--is only required if the
>first "hard"-directional character of the paragraph is not
>appropriate, i.e. I start a Hebrew paragraph with English

I agree. But LRM/RLM, which you are referring to in this
case, is mainly needed for other cases, such as (neutral)
punctuation between stretches of different strong directionality.
These are really better handled as character entities, and
should not be used to indicate global directionality.

LRE/RLE and RLO/LRO on the other hand are used for
nesting inside a paragraph, and some respective
markup should be available.

>Note-1: <Q RTL>.. </Q>, et al <Q> (short quote) is an
>"implied formatting". Do we want to proliferate attributes
>to most all tags? "LANG=" goes with nearly every HTML3 TAG!
>Yes, <Q><RTL> ... </RTL><Q> is longer, but so is
><Q><BIG>...</BIG><Q>. I vote for simplicity.

<Q><BIG> is mixing two different kinds of markup that
usually go separated, and that are also of different levels
(<Q> being more on a logical level, whereas <BIG> being
quite explicit formatting). I think the HTML 2.0 spec
(for backward compatibility mostly) doesn't even guarantee
that what you would reasonably expect to happen with such
text indeed gets marked up as desired. In this very specific
case, you may end up with the quotation marks being smaller
than the quoted text, which you may be able to remedy by
changing to <BIG><Q>, if displaying an inline quotation
bigger than its surrounding text makes any sense at all.

In <Q RTL> or maybe <Q LANG="ar">, we want to say that
the quotation has RTL directionality, or that it is in Arabic.
It's most often a quotation, a paragraph, or the whole body,
that is in a given language, or directionality. For <P> and smaller,
the solution of <P><LANG LANG="ar"> may work, but larger
changes to the DTD are needed to allow <BODY><LANG LANG="ar">.

>Note-2: Defining these codes as &#8237; assumes
>1: Such 4-place numerical entities are made legal
They are already a legal alternative in HTML 2.0, and
they are fully legal according to our draft.
We may explicitly disallow RLE/RLO/LRE/LRO/PDF
to avoid conflicts between them and corresponding markup.

>2: Most folks would know them. Unicode tables usually
>define them as two hexes, U+212A, or such.
That's why, certainly in the case of RLM/LRM,
corresponding charcter entities will be supplied
(i.e. &rlm;).

>3: SHORTREFS work.
I guess James Clark has shown that they are not really
necessary, and I can agree with him. I hope the other
authors of the draft will agree, too.

><DATE ... >
><TIME .... >
>Formatting alternatives might be:
>FMT=READER -- language localization at browzer site
>FMT=AUTHOR -- language of text on the page at this place
>I think that for the second two, most authors will simply
>prefer to simply type in their dates and times. But:
>is a possibility. Defaults would be reader, gregorian,
>and today. Of course, today changes every day. Therefore
><DATE> and <TIME> elements would be useful.
>Some of what I saw on the "digest 123" was a "bit too much." Do we also support
>conversion from calendar to calendar? When do things get a
>bit "out of hand?"

Conversion from calendar to calendar is up to the browser.
Also, I do not exactly understand what you intend with the FMT attribute.
If this is the format of the final display, then I guess this should
really be left over to the browser and the reader preferences. If some
author absolutely doesn't want anything to be changed, plaintext is
always there. The aim of our proposals is that they enable the
browser to display something that the reader understands.

><CURRENCY>, etc. Now we get into international exchange
>rates at time of writing? reading? ?? ?. I think most
>people would type-in what they mean.

We have discussed exchange rates when writing the proposal.
There was even the idea of adding an EXCHRATE attribute.
Closer investigation showed that this had some problems:
a) What should be the "reference currency"?
b) Exchange rates changing in time, it would be impossible
just with an EXCHRATE attribute to display arbitrary
conversions reasonably, even if one had a server
with an exchrate database (there is already such a thing).
Our conclusion was that if conversion was indeed to be possible,
the only consistent way of solving the problem was to add a
DATE/TIME attribute of when this amount was valid. The browser
could then do conversion to other currencies at the rates that
applied at that time, and also might convert the amount to
current value, to give the reader a better impression.

Regards, Martin.

Back to: Top of Message | Previous Page | Main HTML-WG Page



CataList Email List Search Powered by the LISTSERV Email List Manager