> I would like for HTML documents to have some facility to communicate
> to an indexing tool (a web worm/spider/etc) what the author believes
> is significant. Currently, most webworms simple index the entire text
> of the HTML document, throwing out excessively common words ("the",
> "web", ...) and 'words' which have numbers or special characters in
> them. Some webworms pay special attention to what's inside of <title>
> or <h1> tags, as a way of trying to figure out what the author of the
> document thinks is significant about the document.
> What we really should have is some sort of markup that goes into the
> <head> portion of an HTML document (since it should not be displayed)
> that specifies the author's intention of what keywords should be used
> to index this document. For example, maybe <kl>..</kl> to mark a key
> list, with <ki> to denote each individual key item.
This could be done using META. However, I am experimenting with using
an INDEX attribute on elements in the document BODY together with
DFN for the first instance of a term. The INDEX attribute is used
to define how the element is to appear in an index and supports
hierarchical indexes via a syntactic convention using "::" to separate
each level in the hierarchy. I want to use this to index the HTML3
specification to increase the effectiveness of the Internet Draft.
-- Dave Raggett <[log in to unmask]> url = http://www.hpl.hp.co.uk/people/dsr
Hewlett Packard Laboratories, Filton Road, | tel: +44 117 922 8046
Bristol BS12 6QZ, United Kingdom | fax: +44 117 922 8924