LISTSERV mailing list manager LISTSERV 15.5

Help for HTML-WG Archives

HTML-WG Archives

HTML-WG Archives


Next Message | Previous Message
Next in Topic | Previous in Topic
Next by Same Author | Previous by Same Author
Chronologically | Most Recent First
Proportional Font | Monospaced Font


Join or Leave HTML-WG
Reply | Post New Message
Search Archives

Subject: Re: New DTD (final version?)
From: [log in to unmask] (Paul Grosso)
Reply-To:[log in to unmask]
Date:Mon, 30 Jan 95 06:12:25 EST

text/plain (177 lines)

> From: "Daniel W. Connolly" <[log in to unmask]>
> Would some of the SGML experts out there look over the SGML
> declaration?  I think the capacities/quantities need some
> tweaking. Anybody who really knows about character set declarations is
> invited to look those over too. I'm still not clear on the distinction

I'm afraid I am not a character set expert.  I see they haven't changed
since the IETF draft when I last looked at them, so I doubt I'll have
much more input.  Hopefully some other SGML experts with more expertise
in character sets will check it out (I'll ask around a bit).

As far as the rest, my first comment is on the newly introduced line
breaks in the public identifiers.  A public identifier is a minimum
literal (clause 10.1.7 of 8879) whose exact normalized form is crucial
for doing such things as catalog lookup for external entity resolution.
In particular, the existence of space characters (in the normalized
minimum literal) and case of letters is significant.  A minimum literal
is "normalized by ignoring record starts, condensing record end and
space sequences to a single space, and stripping spaces at the start
or end of the minimum literal."  All of this to say that, if one wishes
to introduce a line break in a public identifier, it *must* be done at
an existing space to avoid changing the (normalized) minimum literal.
The latest sgml decl has broken the public identifier for all of the
three character set references just before "//ESC..." which was not
a place where a space appeared in the original public identifier.  You
are therefore, in fact, not referencing the registered public texts
you think you're referencing.  You should change your "line breaking
algorithm" so that it breaks the public identifiers at an existing space.

As far as the quantities, I've written a long message about that already.
In summary, here are my suggestions, though other values may make as much
sense (read my Jan4 message appended below):

                  ATTSPLEN 2100
                  LITLEN   1024
                  NAMELEN  72    -- somewhat arbitrary; taken from
                                internet line length conventions --
                  PILEN    1024
                  TAGLEN   2100

(I've removed the non-RCS values of GRPCNT, GRPGTCNT, and TAGLEVEL since
no one gave me a good reason why they're necessary when I last raised the
question, but you can put them back in if you think there's good reason.)


Paul Grosso
VP Research                      Chief Technical Officer
ArborText, Inc.                  SGML Open

Email: [log in to unmask]
  or   [log in to unmask]

----- Begin Included Message -----

>From pbg Wed Jan  4 14:39:27 1995
Date: Wed, 4 Jan 95 14:39:25 GMT
From: pbg (Paul Grosso)
To: [log in to unmask]
Subject: Re: HTML 2.0 SGML declaration [was: ATTSPLEN?]

> From: Paul Burchard <[log in to unmask]>
> [log in to unmask] (Paul Grosso) writes:
> > LITLEN [...] ATTSPLEN [...] TAGLEN [...]
> > Note that it doesn't make sense to expect to be able to
> > enter large values for attributes unless one increases
> > all three of the above quantities.
> Thanks for the explanation -- it looks like we have a definite  
> problem in the current SGML declaration for HTML, then.  It sets  
> LITLEN to 1024 in order to provide reasonable room for URLs and FORM  
> values, but then leaves ATTSPLEN and TAGLEN at their default values.

Looking at the HTML 2.0 SGML decl, I realize I don't remember any
discussion on the quantity values before, so I might have missed
something.  But here's my comments on the SGML declaration.

I didn't consider the capacities--I think capacities are more annoyances
than useful, and almost all products I have seen rightly ignore, for
all practical purposes, the capacities (usually after giving a warning).
Besides, appropriate values for capacities are usually only determinable
by trial and error, and I figure Dan's already done that.

I have to admit to lack of expertise in the area of the details of the
character set stuff in SGML declarations.  Not that I haven't tried,
but there are just too many issues for me to know if the ones given
are the best ones for HTML use.  Character sets in HTML is an open
issue, and I have no reason to think that the ones Dan has included
aren't the best ones for now.

The features are quite reasonable and standard, and the syntax is the same
as the Reference Concrete Syntax (RCS) with the exception of the quantities.

What's currently in the HTML 2.0 spec as far as quantities is:

                  NAMELEN  72    -- somewhat arbitrary; taken from
                                    internet line length conventions --
                  TAGLVL   100
                  LITLEN   1024
                  GRPGTCNT 150
                  GRPCNT   64

For reference, the RCS quantities are:

	LITLEN		240
	PILEN		240
	TAGLEN		960

My comments:

1.  I'm not sure why it was felt necessary for GRPCNT, GRPGTCNT, and
    TAGLVL to be raised from their RCS values.  In my experience, I
    have rarely seen the need, and the HTML application is one of the
    smaller ones I've seen.  I don't see anything wrong with the larger
    values, I was just a bit surprised to see them.

2.  A value of 1024 for LITLEN makes sense.  Most people increase PILEN
    when LITLEN is increased.  Basically, if you expect to have large
    literals, you might well have large PIs.  In particular, PIs may be
    used to contain things that are related to things tags contain, so
    I usually recommend a PILEN at least as large as TAGLEN.  (From the
    following paragraph, that would imply a value of 4230 if you follow
    the argument in a strict fashion.)  I would recommend making PILEN
    at least the same as LITLEN--in this case, 1024.

3.  As the earlier exchange discusses, ATTSPLEN and TAGLEN should usually
    be increased when LITLEN is increased.  [This isn't necessarily the
    case--one might want to allow for large literals in parameter literals
    (e.g., for the replacement text of entities), but still not expect 
    such long literals for attribute value literals.  I am assuming that
    we wish to allow URL's and VALUE's and such to have lengths up to 
    LITLEN in the rest of this paragraph.]  A quick glance at the DTD shows
    that the elements A and INPUT have four CDATA attributes plus a few
    others, LINK has three CDATA atts plus others, and IMG and FORM have
    two CDATA atts plus others.  Unless someone has a good argument for
    thinking it isn't necessary to allow for the case that all four of
    A's and INPUT's CDATA attributes have values that approach LITLEN,
    that would indicate a value of ATTSPLEN near 4150.  With a NAMELEN 
    of 72 (even though no element names currently approach that), that
    would suggest a TAGLEN near 4230 in round numbers.  In practice, one
    would rarely expect such extremes, so smaller numbers may be reasonable,
    but I'm just laying out the appropriate logic.  In particular, the
    elements A (with HREF and NAME), IMG (with SRC and ALT), INPUT (with
    SRC and VALUE), and LINK (with HREF and URN) all have at least two
    CDATA attributes that, I would think, could both get long (either by
    virtue of being a URL or URN or by having a long textual string for
    a value), so a value of at least 2100 for ATTSPLEN and TAGLEN seems 
    necessary if we want to be consistent with LITLEN. 


----- End Included Message -----

Back to: Top of Message | Previous Page | Main HTML-WG Page



CataList Email List Search Powered by the LISTSERV Email List Manager