M V wrote:
> While I know of DocBook as one standard, and someone recently
> mentioned TEI to me, I'm also interested in some general discussions
> on what I guess I would call standard practices and their merits and
If you plan to have people playing any part in the markup task, I'd
recommend you design the structure yourself. Otherwise you run the risk
of problems with consistency.
> 1) Should the hierarchy be centered on chapters/sub-sections or
> pages? Can it/should it be both?
I would make it both, but page breaks would be nothing more than an
empty element, perhaps with a page number stored in it. Page breaks are
more like flags than containers.
> Some of the XML I currently have
> is chapter/sub-section based with a nested DIV stucture and
> self-closing page tags to signal breaks, and I'm finding it difficult
> to transform.
If you have any SGML tools such as OmniMark, you could make much lighter
sledding of that sort of transformation. Managing endtags that the
parser should know about is a bore.
> 2) For footnote/endnote references, where should the actual note data
> be stored? With the reference? At the end of the
> chapter/sub-section? At the end of the book?
With the reference if it's a footnote, but endnotes are sometimes
referenced from several points within a chapter. You would need to
determine which you have.
> 3) Should I follow a standard when including metadata in the XML
> file? Should it be ONIX, Dublin Core, or something else?
If a standard suits your requirements, use it - otherwise modify a
standard or make up your own structure.
> I know some of these and the many other questions depend on the
> specific purpose of the XML file. In my specific case, two known
> purposes are archival and for online display -- admittedly quite
> different. But that's why I'm trying to get a grasp on basic issues
Are you certain that you need XML at all? For archival purposes, PDF is
arguably a better option, and if online display is secondary, I'd
consider using a toolkit such as PJ (http://www.etymon.com/pjc/) to
allow you to expose whatever interests you, then index for searching
with Lucene (http://jakarta.apache.org/lucene/docs/index.html). A decent
search on a collection of PDFs can be a lot less expensive option than
going to XML, especially if the nature of the data lends itself to that
sort of browsing.
Marcus Carr email: [log in to unmask]
Allette Systems (Australia) www: http://www.allette.com.au
"Everything should be made as simple as possible, but not simpler."