DBLite - Frequently Asked Questions

1. Installation and Getting Started

1.1. The Perl library XML::LibXML has a lot of dependancies like XML::SAX and so on. Should I install them all, or can I skip them?

DBLite tools are written in Perl and have some dependancies not included in this package. Those dependancies are Perl modules which themselves have dependancies. We recommend you install everything the modules call for. Read the documentation on installing perl modules with the command perldoc perlmodinstall. You can determine what modules are already install on your system with the command perldoc perllocal.

1.2. There are a lot of tools here. Is there documentation for them?

Yes. For an overview of the tools, read the file README. For more specific information about each tool, go into the bin directory and type perldoc sometool, where sometool is the name of the tool you want to know more about.

1.3. Is there an example book coded in DocBook I can look at?

Yes. It's in the test/lx subdirectory of the install package.

2. Authoring

2.1. How can I edit XML files with Emacs?

Emacs supports XML as long as you stay within the ISO Latin-1 character set. It does not support UTF-8 or other encodings yet. You will have to use character entity references for special symbols and exotic characters (described below).

There is a special major mode called psgmls designed to make working with XML much easier. It provides an automatic syntax checking mechanism that reads the DTD and then scans your file while you type. It will also automatically indent elements to make your file easier to read. I will include detailed instructions for setting it up soon.

2.2. How can I use a special character from inside a text-only editor?

Text-only editors like Emacs are stuck in the ISO Latin-1 character set. Any special characters you want to use such as mathematical symbols or accented letters or Japanese ideographs will have to be imported using a special entity reference. There are many defined entity references declared in the dblite DTD which you can locate in the docbook41/ent subdirectory. For example, to find the name of the entity for trademark (from the dblite directory:

$ grep TRADE docbook41/ent/*
docbook41/ent/iso-num.ent:<!ENTITY reg "&#x00AE;"> <!-- REG TRADE MARK SIGN -->
docbook41/ent/iso-num.ent:<!ENTITY trade "&#x2122;"> <!-- TRADE MARK SIGN -->

So, to use the trademark character in your document, you can type either &trade; or &#x2122;. The latter one will be understood by all XML parsers even without a doctype declaration, but has the disadvantage of being hard to remember.

2.2. Are there graphical editors that will work with DBLite?

There are many graphical editors for authoring XML. They come with a range of features and prices. Many such editors will read the DTD and check the elements in your document to make sure it's structured correctly. They usually include options to hide markup to make the text easier to read.

Some graphical editors rely on CSS stylesheets to format the text so that it looks nice on screen. I've included a CSS stylesheet in the DBLite install package that an author created for use with XMetaL.

Here are a few editors we have used:

CompanyNamePlatformCost
MorphonXML EditorMacOS, Windows, Unix$100
SoftQuadXMetaLWindows$200
ArbortextAdeptWindows, Solaris$1000

If you have experience with any others, please let us know so we can recommend them to other authors.

2.3. Does dblite include the equivalent of a <br/> element to force a newline in prose (for instance if I want a URL to be on a line by itself)?

No, there is nothing like that in DocBook. However, since we convert the book to FrameMaker in production, we can force a linebreak then. So if you want to let the production editor know your intention to have a linebreak at some point, you can just insert a little comment to that effect. The <remark> element is for putting in such comments.

In general, you don't need to worry about linebreaks or pagebreaks. That will all be done by the production editor. However, in certain cases you need the linebreaks for semantic reasons, such as in a code block or a poem. <programlisting> and <screen> are elements for code blocks and representing screen output, respectively. <literallayout> is for anything else that needs to retain spacing, such as poems.

2.4. How do I insert a comment that the production editor will see? Can I use a regular XML comment?

Production editors don't look at the XML markup, so any comment constructs you create will be ignored. If you want a comment to be seen, use the <remark> element like this:

<remark>Please check the spelling of the word
"farfignoogin" in the next para.</remark>
<para>The Germans call it "farfignoogin"</para>

2.5. Are the production tools smart enough to discard filler whitespace?

The rules for whitespace in XML are kind of tricky, but they generally do what you want.

In block-type elements (e.g. paragraphs, titles), leading and trailing space will be thrown away. All other space will be collapsed into one single space character. So this:

<para>     Hey     there.      </para>

will "normalize" to this:

<para>Hey there.</para>

There are a few exceptions, noteably <screen>, <programlisting>, and <literallayout> which preserve all whitespace. (Technically, in these elements, we throw out any leading blank lines and all trailing space, but otherwise keep all space characters intact.)

In non-block (inline) elements, the leading and trailing space are treated the same as space in the middle of content. So this:

<para>   I am   <emphasis>   very hungry
</emphasis>   today.   </para>

normalizes to this:

<para>I am <emphasis> very hungry </emphasis> today.</para>

This is not what we want, however, because there are effectively two spaces between the words "am" and "very". So we usually do some pre-processing on the XML to squeeze out space inside of inline tags and then do the normalization step. The result is this:

<para>I am <emphasis>very hungry</emphasis> today.</para>

3. Generating HTML

3.1. How can I generate HTML from my XML files?

This package contains a tool for authors to generate HTML easily from their XML files. The command is db2h and it takes an XML filename as its argument. The files it generates include an index called book.html and a file for each chapter or appendix.

3.2. When I run db2h on a file, I get a lot of error messages and no HTML file is generated. What happened?

The most common reason is that the XML file is not well-formed. This simply means that there is a syntax error in the XML and the parser in db2h has given up trying to work with it. You need to find out where the error is located and fix it.

To locate the error, run the command xwf on each file you want to check (not book.xml, but each individual chapter and appendix file). You'll see a series of error messages like this:

$ xwf test.xml
WARNING: 'test.xml' is NOT well-formed.
----------
[1] ON LINE 9 OF 'test.xml', Opening and ending tag mismatch (bar and
baz).

  EXCERPT
  <bar>Hello there</baz>
                       ^

This message tells you that in the file test.xml on line 9, there is an error located where the caret character is pointing. In this case, the problem is a misspelled element end tag. For more information about creating well-formed XML, read chapter 2 of my book Learning XML.

3.3. I tried running db2h on the file ch03.xml and got an error telling me "Entity 'foo' not defined." It's well-formed XML, so what am I doing wrong?

The file contains entity references but no declarations for the entities. It's like forgetting to initialize a variable. You need to include some information in the document to tell db2h what to do with the entity references. The way to do this is with a doctype declaration. It looks like this:

<!DOCTYPE book SYSTEM "/usr/local/prod/sgml/dblite/dblite.dtd"
[
  <!ENTITY foo "some replacement text here">
]>

If you want to put this in a chapter file, change "book" to "chapter" (or whatever is the type of the root element) above. The path to dblite.dtd may be different, depending on where you installed it.

The downside to this is that the chapter becomes an independant document and you can't validate the whole book at once. If you want to be able to validate the whole book, then create a file called book.xml and put the doctype declaration in there. Then add entities to import each of the chapter files like this:

<!DOCTYPE book SYSTEM "/usr/local/prod/sgml/dblite/dblite.dtd"
[
  <!ENTITY preface "ch00.xml">
  <!ENTITY ch01 "ch01.xml">
  <!ENTITY ch02 "ch02.xml">
  <!ENTITY appa "appa.xml">
]>
<book><title>My Excellent Book</title>
&preface;
&ch00;
&ch01;
&appa;
</book>

3.4. I am running db2h on the file ch03.xml and it saves the result in ch01.html. Why doesn't it save it in ch03.html?

This is a bug that will be fixed soon. If you want to have correct numbering, create a book.xml as described in section 3.3 above, and run db2h book.xml. It will separate all the chapter files with correct numbering automatically.

3.5. The graphics in my book are in the PNG format. Can db2h handle that?

db2h doesn't care about the graphic format. It will simply transmit the file name into the HTML file as-is. The real limitation here is what kind of browser you're using. If it supports PNG format files, then you're fine.

3.6. The character entity references resolve into a form my browser doesn't recognise. For example, &copy; becomes &#x00A9; which doesn't display in Netscape Navigator. It should remain &copy;.

The default character entity values are numeric unicode character entity references, which only work for browsers that support Unicode. If you want to fall back on more traditional HTML entity definitions, change the DTD declaration in your document to be dblite_htmlents.dtd instead of dblite.dtd. Some entity references like &copy; will be retained as an entity reference for compatibility with older web browsers.