Avraham Shapiro wrote:
> ** Low Priority **
> Thanks for your input. I believe the validator is having trouble with the single quotes
> because it fails on the xml line with the error about the UTF-8 encoding, if the
> encoding value is 'UTF-8' and it does not fail at all if the value is "UTF-8".
Ah, OK. In that case the parser is definitely broken.
> The reason we use this validator (although we sometimes still use generic XML
> parsers) is that in adddition to validating against a DTD, it checks a great deal
> of semantic information in a Z39-86 compliant digital book, information that a
> more general XML parser would not check.
It sounds as if you might want to move the whole thing to a W3S or
RelaxNG schema, where you have more control over semantic information at
the parsing level.
What you might be able to make do is to run osgmlnorm, which is part of
the same package (SP) that onsgmls comes from. This normalises the
document, making all single-quote enclosures into double-quote (except
where they *contain* double quotes, in which case single quotes are used
as the container, and vice versa; and it re-expresses the whole document
in accordance with the DTD.
It was intended for use with SGML, where there was much more
abbreviation and avoidance of quotes than is allowed with XML, and I
haven't actually tried it with XML, but with the -wxml switch it should
work: $ osgmlnorm -wxml xml.dec filename.xml >output.xml
Then use your Z39-86 parser on the result.