If you want to extract the meaning from documents automatically (instead of
doing do manually) you could always try a tool such as Autonomy. Autonomy
scans documents and extrapolates their meaning producing a metadata review
of the original document. This metadata can then be used to populate a
reporting profile (that you can define) which in turn can be used to create
entries in a database. Amendments to policy that need reflecting in the
way you gather information will then require re-configuring your Autonomy
collection profile (not a complex process).
The issue of having the supporting document, and not just the Autonomy
extracted meaning, posted to the database is another issue. A relatively
simple link should cover that one. Access has a facility whereby you
extract content from an MS Office document and paste this as a hyperlink
from the database to the original document.
The dtd will tell you what can/cannot be used in the database entry,
enforcing certain criteria that you define. Extraction from the database
of xml encoded data should be relatively straight forward. A standard
request form referencing xml tags and content as search terms is one way of
The issue of tagging a document as xml needs careful thought, but the use
of either Autonomy (or similar) or requesting the authors complete a simple
summary sheet could/should save time. I do remember there being some tools
that will encode standard text as xml, but I forget when/where I saw them.
The following comes from microsoft's Office2000 promo stuff, and talks
about Office 2000 being integrated with a tool called Keyfile.
"The Office 2000 support for XML-based document property sheets enables
direct indexing by leveraging the familiar Office 2000 property sheet. The
Office 2000 document descriptions based on Document Type Declarations
(DTDs) provide the ability to create templates based on Keyfile-managed
information and to create new Keyfile document categories based on
templates authored by the user."
Apologies for the rambling, hope this helps.
Knowledge and Information Systems
Defence Evaluation & Research Agency
MAL ext. 7450
The Information contained in this E-Mail and any subsequent correspondence
is private and is intended solely for the intended recipient(s). For those
other than the recipient any disclosure, copying, distribution, or any
action taken or omitted to be taken in reliance on such information is
prohibited and may be unlawful.
From: Zeba Khan [SMTP:[log in to unmask]]
Sent: Thursday, August 10, 2000 8:30 AM
To: [log in to unmask]
Subject: database setup
I'm am extremely new to xml stuff. I was wondering if anyone can answer my
I am trying to set up a system to collect data from contracts from
organizations (about 130) to form a policy database. Each contract is about
30-40 pages. Right now we are gathering information as text and evaluating
it ourselves and putting the data into Access. From Access, we plan to code
with asp and create a Web search form for answers to queries (which we will
probably pre-determine). End users will also need to have the actual
supporting text from the initial documents retrieved as well as one other
accompany document. Most of the data is text and the cells in Access are
Another big issue is maintenance of the database. When an organization
changes it's policies, it will send us a new contract which we will
need to input and re-code.
Is there any easy way to get this done using an xml dtd? One problem I can
see is that the organizations sending us the primary data are not very tech
savvy. (Not that I am either!) We're getting the documents as Word. If we
issue a dtd, is there an easy way for non-tech people to tag document with
xml? And how would people retrieve specific info from an xml-coded
If anyone can offer any insight or starting points for me, I would be
forever grateful. Thanks.
[log in to unmask]