Print

Print


> The primary problem with offering XML-enhanced search engines is that
> users won't know which elements to search for. You may have defined your
> author element as 'author', but I may have defined it as 'auth' or
> <PERSON role="author">. We could establish a
> tag library, with standardized element names for search engines, or rely
> on RDF. Comments?
>
I don't think we have to create a tag Esperanto in order to have search
engines that are enhanced because they are content-based, ie in XML.

Where we used to have one domain to search, text as such, we now have two:
markup, and data (content = data + markup). The markup establishes the
context in which the data is to be found.

To use your example, a query where the string "auth" was found in the markup
domain would work for to discover authors named "Smith":

        <person role="AUTHor">SMITH</person>
        <AUTHor>SMITH</author>

while filtering out sentences like:

        <p>I'm an AUTHor, and any editor named SMITH is no friend of
mine.</p>

So perhaps full-text retrieval, both in the two domains, markup and data,
will have enough advantages for our current retrieval vendors to get moving
on this problem right away, without standardizing element names any more
than human language is already standardized -- since markup, like data, is
meant to be human-readable (Design Goal 6 of XML: "XML documents should be
human-legible and reasonably clear.") This would be yet another advantage
gained by the fact that XML parsers are easy to build.