LISTSERV mailing list manager LISTSERV 15.5

Help for HTML-WG Archives


HTML-WG Archives

HTML-WG Archives


View:

Next Message | Previous Message
Next in Topic | Previous in Topic
Next by Same Author | Previous by Same Author
Chronologically | Most Recent First
Proportional Font | Monospaced Font

Options:

Join or Leave HTML-WG
Reply | Post New Message
Search Archives


Subject: The Reference Concrete Syntax is not Current Practice (Was Re: Standards, Work Groups, and Reality Checks: A Radical Proposal.)
From: Arjun Ray <[log in to unmask]>
Reply-To:[log in to unmask]
Date:Mon, 25 Sep 95 01:27:30 EDT
Content-Type:text/plain
Parts/Attachments:
Parts/Attachments

text/plain (134 lines)





On Sat, 23 Sep 1995, Glenn Adams wrote:

>     Date: Sat, 23 Sep 95 14:50:36 EDT
>     From: [log in to unmask]
> 
>     > So all I have to do to *totally* derail the standards process is put a
>     > new tag (or change an old one) in a popular browser and fail to file a
>     > DTD on it?
> 
>     Well, that's what the evidence so far seems to indicate, yes.
> 
> I would not agree with the previously quoted statement.  The handling
> of unknown tags is clearly specified by the current draft -- ignore them.
> Of course, this is a bit easier said than done in the context of using
> a real SGML parser -- it is possible though without any great difficulty.

Can the Working Group make a definitive statement about the practicality
of a "real SGML parser" contending with the *current practice* of HTML?

> On the other hand, the draft is weak on what to do with known tags that
> appear in contexts other than where they are permitted.  It is this latter
> problem that is much more insidious.  

I have trawled untold megabytes of the mail archives and have yet to find
a good discussion of the *really* insidious problem. I'm nearly convinced
that the SGML gestalt actively hinders its appreciation, by conflating
parsing with validation as a practical prescription, so that *categories*
of error fail to be distinguished. 

Two cases prototypically off the point: 

> Take for instance the CENTER tag
> employed by Netscape.  They failed utterly to specify the content model
> or the context where this element is to be used, and, consequently, many
> documents use it willy-nilly and depend on the quirky parsing of Netscape
> to essentially treat format related tags as a toggle on a global formatting
> state.  [...]
>
> What troubles me further, is the fact that Netscape glibly accepts things
> like:
> 
> plain <B> bold <I> bold italic </B> bold?!? </I> plain
> 
> This is madness!

And one much closer:

> If people are so concerned over the randomness of netscape security key seeds,
> then shouldn't they be concerned with the fact that the following document,
> due to someone forgetting to type a single '>' character, may end up
> killing someone!
> 
> <title>Instructions for Patient Jane Doe</title>
> <p>
> <i>Warnings</i>
> <p>
> <b Do not inject this patient with penicillin.  She will die!</b>

Consider a different "broken" version of this:

  <title>Instructions for Patient Jane Doe</title>
  <p>
  <!-- Warnings here ----> 
  <i>Warnings</i>
  <p>
  <b> Do not inject this patient with penicillin.  She will die!</b>
  <!---- End Warnings -->

Consider the fact that using Netscape (or Mosaic) achieves the *desired*
result (the warning being displayed), but using a SGML-compliant browser
*could be fatal*. It will take just one such incident to "convince" a
hospital administator which browser -- and perhaps which *language* -- is
"better", and he'll have a fatality to prove it, standards notwithstanding. 

The fact of the matter is that *current practice* deems an arbitrary 
number of -'s in comment declarations perfectly acceptable as a prettifying 
device. Current practice deems that after STAGO, the first occurence of 
ISO 8859-1 code point #60 is TAGC regardless of context, and therefore 
it's permissible to omit an ending quote for the last attribute value 
literal in a start-tag. And so on.

In all the agonizing over content model violations, sight has been lost of
the far more fundamental fact that current practice has been divorced
from the Reference Concrete Syntax. HTML *as it is being used in practice*
poses a *tokenization problem* for any SGML-compliant implementation. 
It's ad hoc parsing all the way, and to expect implementors today to 
consider SGML compliance *at the lexical level* is to ask competitive 
suicide of them, insofar as HTML is taken to *mean* current practice. 

Does the Working Group have an estimate of the percentage of existing 
documents that can "conform" *only* to the parsing heuristic embodied in

<URL:
ftp://ftp.ncsa.uiuc.edu/Mosaic/Unix/source/Mosaic-src/libhtmlw/HTMLparse.c>?

Is the Working Group prepared to make an explicit, definitive statement 
about the compliance of this source code with the Reference Concrete 
Syntax in, say,

<URL:ftp://ftp.ifi.uio.no/pub/SGML/productions>?

Will the Working Group concede that much of current practice conforms if 
at all to implementations, and not to specifications, far less a standard?
And from that standard's perspective, how much of current practice is the 
Working group prepared to declare explicitly as non-conforming?

But putting a significant fraction of the existing document base beyond
the pale isn't half as relevant as the fact that there are implementations
on which this fraction "works", and for people whose concerns are limited 
to that, what a standard actually stipulates counts for much less than 
the ability of the implementors to simply *claim* conformance.

And *that* is the fundamental issue. There are players in the game who 
seek legitimation only. Will it be a meet outcome if all the good work of 
this Working Group has the only substantive effect of legitimizing a 
*name* for the benefit of those who, having secured the legitimacy and 
cachet that comes from a putative association with an Internet Standard, 
propose to ignore the *real* specifications?

The Working Group should seriously consider declaring HTML qua Current 
Practice an unstandardizable hodgepodge. Leave it to Netscape, and 
perhaps Microsoft, to have the wit or patience or discipline to concoct 
specifications the IETF might accept. Delegitimize the name HTML, and 
continue the good work!


Arjun Ray
(I speak for myself only.)



Back to: Top of Message | Previous Page | Main HTML-WG Page

Permalink



LISTSERV.HEANET.IE

CataList Email List Search Powered by the LISTSERV Email List Manager