[The HTML Writers Guild Logo]


The HTML Writers Guild

Project gutenberg
[Previous] [HWG Home page] [Gutenberg Index] [Next]

Book DTD's I

This page dicusses the role of DTD's in marking up an historic document. It does not go into the specifics of page authoring. All documents that are marked up for Project Gutenberg must be marked up according to a DTD, and then validated against that DTD. Here are a list of the DTD's available at present.

The Importance of DTD's

DTD's or schemas are a description of the way a document is marked up, and contains such information as the permitted elements and attributes, the permitted content model of an element, and the attribute types. They are particularly important in marking up historic documents, as they let any future reviewer of the document know what they can expect. They are also important in preventing anarchy. If it wasn't for a DTD or Schema the marker would be free to mark up the document in any way they wanted. Although this may be fun for the marker, it would not be fun for someone who had to review the document at a later date.

DTD's and Schemas can be used to impose an order on a document, and this is exactly what we want to do at the top level of the document, where we want to make sure all the markup blurbs and e-text blurbs are included. When we come to the historic document proper we need a much looser state of affairs. The book or poem is written, and we just want to make sure we have a DTD that describes the content!

We will not teach you about the specifics of DTD's here. There are several good tutorials and books available, and also the guild has an XML class. There are narrative descriptions of the various DTD's in the next few pages.

DTD Policy

All the DTD's used must be free to use in perpetuity. (Note this is applies to several well known DTD's including DocBook). We will maintain a series of suitable DTD's on this site. These DTD's will evolve, and hopefully improve over time, but all old versions will be maintained. We intend to follow these general principles.

The top level 'gutdoc' DTD

All 'gut' DTD's should all have the same top level structure. This structure is shown in the following diagram.

The parts of an e-text

The following shows "pseudo code" that explains the heirachy and the nature of the content of each section.

<gutdoc>
 <gutblurb>
    [This contains all the meta information about the 
    document that was developed by the original transcriber. 
    It will probably not be displayed by the style sheet.]
 </gutblurb>
 <markupblurb>
   [This section contains the information about the marker 
   of the document, including all details of the revision history. 
   It will probably not be displayed by the style sheet.]
 </markupblurb>
 <gutcredit>
    [This short credit is designed to be displayed 
    at the top of the document. eg. ]
    This document was marked up by [name], a member of 
    the HTML writers guild as part of Project Gutenberg.[date]. 
    The original transcription was made by [name] date. 
    For further information go to view/source.
 </gutcredit>
 <gutbook>
  
   The document proper goes here. The DTD for this will vary.
 </gutbook>
 <endmarkupblurb>
    [Typically this will contain and notes made by the 
     marker pertaining to the document it self, including 
    foot notes. It will probably not be displayed by 
    the style sheet.
 </endmarkupblurb>
 <endgutblurb>
    Most e-texts have a line or two of additional 
    meta information.
 </endgutblurb>
</gutdoc>

Each of the non document top levels should have the same top level structure namely:- (#PCDATA|para|subsect|title)*

Selecting a DTD

At present, in addition to the the XHTML dtd there are four dtd's available, the gutpoems1.dtd, the gutplay1.dtd, and the gutbook1.dtd, plus a DTD for books with poetry and plays included. There is also a series of elementary tutorials on the TEI DTD's (teixlite.dtd) starting at teidtds1.html

[Previous] [HWG Home page] [Gutenberg Index] [Next]

[Valid XHTML 1.0]
This page is maintained by frank@hwg.org. Last updated on 16 January 2000.
Copyright © 2000 by the HTML Writers Guild, Inc.