
DocBook Publishing


Version 1.02

24. May 2003

Copyright (c) 2003 Stefan Rinke


This article is available online at Stefan Rinke, Articles [http://
www.stefan-rinke.de/articles/]
-------------------------------------------------------------------------------

Table of Contents

*1. Introduction
    +1.1. Requirements
   
*2. Read DocBook Publishing online
*3. Software
    +3.1. Overview
    +3.2. Java
    +3.3. HTMLDOC
    +3.4. W3M
    +3.5. HTML Help Workshop
    +3.6. Python
    +3.7. The Framework itself
    +3.8. OpenOffice-Writer
    +3.9. OOo2sDbk
   
*4. Hello World-Project
*5. Short Tutorial
    +5.1. Book or Article
    +5.2. A simple DocBook book
    +5.3. An artikel
    +5.4. Online Resources
        o5.4.1. DocBook: The Definitive Guide
        o5.4.2. Tutorial by Stephan Wiesner
        o5.4.3. Online Tutorial von Lars Trieloff
       
   
*6. DocBook publishing with OpenOffice
    +6.1. Interaction between OpenOffice and DocBook
    +6.2. More functions
        o6.2.1. Book or Article
        o6.2.2. CalloutList
        o6.2.3. Insert a note
       
   
*7. Download DocBook Publishing

Chapter1.Introduction


DocBook Publishing describes the framework for publishing with DocBook. It
describes how you can easily create a publication or an article in many
different formats. You can either read this article online [toc.html]
[toc.html]or download (see chapter 12 ) it in several formats. This framework
is based on the work of many people who contributed many parts and made it
available to a larger audience as Open Source. Special thanks to Mark Pilgrim
who inspired me to write this article because of his publications especially
Dive Into Python. [http://www.diveintopython.org/] [http://
www.diveintopython.org/]I have done nothing else than making it work under
Windows and in German with all necessary tools. I use it myself: This article
was certainly written with DocBook Publishing .

Again: This article is not about DocBook itself (see links to other sites
below) but about the description of a setup how to run it in practice. You can
also download all necessary tools (also as binaries) from this WebSite; so it
should not be a problem to set up your own system enabling you to successfully
write documents with DocBook.

1.1.Requirements


It is important to understand that this framework is not a WYSIWYG solution or
"Click-clack" application (a Windows application operated by mouse click).
Articles are rather written in soucecode in DocBook ( XML) and then compiled.
This might remind you on the good old TEX times where a document was written in
the same way.

If you are afraid of using command lines, I can only recommend to use the
common Office tools and keep your hands off this approach.

You should be familiar with XML and XSLT basics so that you will not become
confused even by the easiest topics. All other information is described in this
article so that you will be successful in a very short time.

Chapter2.Read DocBook Publishing online


This article is not completed yet and will be extended by new chapters or user
comments. These changes will soon be visible in a Revision history showing the
differences between the different versions.

Chapter3.Software

3.1.Overview


To create all formats, some software packages are required:
   
*The Build environment is based on Ant [http://jakarta.apache.org/ant/]
    [http://jakarta.apache.org/ant/], a popular Build tool from the Apache
    Group [http://www.apache.org/] [http://www.apache.org/]. The XML tools and
    here especially the stylesheet processes rely on Java . Thus a Java runtime
    environment Version 1.3 or higher is required. .
*To create the Word version of an article, Microsoft Word is used. This of
    course only works on Microsoft Windows and is the only tool used here not
    available for freely.
*The Help File format requires a free Microsoft tool which can be download
    from Microsoft : hhc.exe the Help compiler
*The PDF format is generated with the tool HTMLDOC [http://www.easysw.com/
    htmldoc/] [http://www.easysw.com/htmldoc/]. Source code and binary can also
    be downloaded for free.
*The Plain-Text Format is generated via HTML and the tool w3m, a text
    browser similar to Lynx.
*Mark realized some support functions in Python, so Python [http://
    www.python.org/] [http://www.python.org/]should also be installed on the
    system.
*If you want to create your DocBook with OpenOffice [http://
    www.openoffice.org/] [http://www.openoffice.org/]you naturally need to
    install OpenOffice.
*The package ooo2sDbk [http://www.chez.com/ebellot/ooo2sdbk/] [http://
    www.chez.com/ebellot/ooo2sdbk/]converts OpenOffice documents to DocBook.

3.2.Java


It is recommended to download Java directly from Sun. The Runtime 1.3.1_06 [1]
for Windows is available at http://java.sun.com/j2se/1.3/download.html [http://
java.sun.com/j2se/1.3/download.html] [http://java.sun.com/j2se/1.3/
download.html]. Sun supplies a common setup so that it can easily be installed.

Important: To ensure that the main batch file which starts Ant finds Java, you
need to set the environment variable JAVA_HOME. Alternatively it is also
possible to enter the Java installation path directly in the batch file.

To check if all settings are right, carry out the following tests in the
command line:
[C:\ ]$ echo %JAVA_HOME% 
 C:\Programme\JavaSoft\jdk1.3.1_06 
 [C:\ ]$ %JAVA_HOME%\bin\java 
 Usage: java [-options] class [args...]
 (to execute a class)
 or java -jar [-options] jarfile [args...]
 (to execute a jar file)
 ... 

For defining and transforming with XSLT some additional Java libraries (JARs)
are required. As these are the same for all projects, it is recommended to
install them in the Lib directory of ??JAVA_HOME?? (here ?? C:\Programme\
JavaSoft\jdk1.3.1_06\lib ??. Download the libraries here [download/
docbook-jars.zip] [download/docbook-jars.zip].

3.3.HTMLDOC


HTMLDOC is available from http://www.easysw.com/htmldoc/ [http://www.easysw.com
/htmldoc/] [http://www.easysw.com/htmldoc/]as source code and binary, although
the functionality of the binary is supposed to be limited. I did not try this
out but compiled a new binary.

Only required is the htmldoc.exe and a few font files which as a default are
expected by the binary in ?? C:\Program Files\htmldoc . This ZIP file [download
/htmldoc.zip] [download/htmldoc.zip]contains all files (Version 1.8.21). Make
sure that the binary htmldoc.exe is found in the PATH. This is done by copying
it to the Windows directory. If you do not like this, create a new directory
for such tools (I use ?? C:\Programme\bin ??) and adapt the path so that the
directory is included. Test this by entering htmldoc in the prompt. The
following should be displayed:
[C:\ ]$ htmldoc
 ERROR: No HTML files!
HTMLDOC Version 1.8.21 Copyright 1997-2002 Easy Software Products, All Rights Reserved.
This software is governed by the GNU General Public License, Version 2, and is based in part on the work of the Independent JPEG Group.
 
Usage:
 htmldoc [options] filename1.html [ ... filenameN.html ]
 
Options:
 
 --batch filename.book
 ...       

3.4.W3M


The tool w3m for creating plain text is available for download at http://
www.w3m.org/ [http://www.w3m.org/] [http://www.w3m.org/]. The only thing you
need for this framework is the w3m.exe binary and CGYWIN-DLL [2] . You can
download it here [download/w3m.zip] [download/w3m.zip]directly. As described
above, copy the EXE and DLL to a directory included in the path. The command
line should look as follows:
[C:\ ]$ w3m
 version w3m/0.1.9
usage: w3m [options] [URL or filename]
options: 
-t tab set tab width
 ... 

3.5.HTML Help Workshop


The Microsoft Help compiler is directly available form Microsoft clicking this
URL [http://msdn.microsoft.com/library/default.asp?url=/library/en-us/htmlhelp/
html/hwMicrosoftHTMLHelpDownloads.asp] [http://msdn.microsoft.com/library/
default.asp?url=/library/en-us/htmlhelp/html/hwMicrosoftHTMLHelpDownloads.asp].
The file HTMLHELP.EXE Version 1.32 is the required file and started after the
download. The setup installs the HTML Help Workshop usually using the path C:\
Programme\HTML Help Workshop??. Again, if you want to run this directly, the
directory must be included in the path. As an alternative you can also copy the
file ??hhc.exe?? to the tool directory. The command line will look as follows:
[C:\ ]$ hhc
 Usage: hhc <filename>
 where <filename> = an HTML Help project file
 Example: hhc myfile.hhp       

3.6.Python


Python is available from http://www.python.org/ [http://www.python.org/] [http:
//www.python.org/]or download ActivePython [http://www.activestate.com/Products
/ActivePython/] [http://www.activestate.com/Products/ActivePython/]from
ActiveState [http://www.activestate.com/] [http://www.activestate.com/]. As
said before: Python needs to be included in the path after installation. Both
distributions usually do this automatically. Test directly in the prompt:
[C:\ ]$ python
 ActivePython 2.2.1 Build 222 (ActiveState Corp.) based on
Python 2.2.1 (#34, Apr 15 2002, 09:51:39) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>     

To close the interactive mode, press CTRL-Z

Important: To generate the "real" Word document, the win32com extension needs
to be installed- Test in the prompt with:
>>> import win32com
>>> print win32com
 <module 'win32com' from 'C:\Python22\Lib\site-packages\win32com\__init__.pyc'>       

3.7.The Framework itself


Now you nearly have all the tools you need, only missing is the first project.
You can download this file from the section Download [#download%23download] [#
download%23download]and generate all pages and output format with ?? make all
??.

Nevertheless, it is easier to start with the ??Hello World?? project which is
described in the next chapter.

3.8.OpenOffice-Writer


This is an optional component as you can use any editor to create a DocBook
source code file. Especially for the beginner it might easier to start with
OpenOffice. The advantage of the OpenOffice Writer is that you can edit your
text WYSIWYG without seeing the XML code. This advantage might also be a
disadvantage as you will not have complete control over all aspects of DocBook
Publishing. OpenOffice Version 1.0.1 can be downloaded from http://
www.openoffice.org/ [http://www.openoffice.org/] [http://www.openoffice.org/].

3.9.OOo2sDbk


To generate DocBook with OpenOffice a package is required which converts the
OpenOffice format to DocBook. This is for example the package OOo2sDbk [http://
www.chez.com/ebellot/ooo2sdbk/] [http://www.chez.com/ebellot/ooo2sdbk/]dar. See
chapter 6 [#chap.openoffice%23chap.openoffice] [#
chap.openoffice%23chap.openoffice]??DocBook with OpenOffice?? for more
information.

The package can easily be installed. Only extract the zip file ooo2sdbk.zip
[download/ooo2sdbk.zip] [download/ooo2sdbk.zip]to the directory of the Python
Site package. The module is then commonly available. Download the package from
this Website as it contains extended functionality compared to the original.

Test respectively:
>>> import ooo2sdbk
>>> print ooo2sdbk
 <module 'ooo2sdbk' from 'C:\Python22\Lib\site-packages\ooo2sdbk\__init__.pyc'> 

------------

[1] You can also use the Version 1.4.1 alternatively

[2] An environment which easily ports many Unix programs and tools to Windows
(see http://www.cygwin.com/ [http://www.cygwin.com/] )

Chapter4.Hello World-Project


The standard directory structure is as follows:
   
*The Common directory: contains all libraries and tools required for the
    project. The contents is always the same and independent of the project
    contents.
        Note: Note
        This is not fully right as the layout of the documents with XSL style
        sheet is defined here.
   
*The Build directory: contains the batch file for calling Ant: make.bat
    (or their UNIX variant make.sh), the file build.xml, which controls the
    building and the directory with the source documents: xml.

*The XML directory: contains all XML files which make up the actual
    document or book. For ??Hello World?? one file is sufficient.


The build file for ant looks like follows:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE project [
         <!ENTITY common "../common">
         <!ENTITY buildcommon SYSTEM "../common/build_common.xml">
]>
<project name="HelloWorld" default="html" basedir=".">(1) 
         <property name="lang" value="de"/>
         <target name="postprocess"/>
         <property name="project" value="HelloWorld"/>
         <property name="javahome" value="..."/>
         <property name="fileversion" value="1.0"/>(2) 
&buildcommon;
</project>       

This file only determines the project name and which document types will be
generated as a standard.

(1) Defines project name and standard target (here html).
(2) Defines the project properties. Here the project name which is required as
    a variable in the shared control file for Ant build_common.xml.


Only missing is the contents; here the file HelloWorld.xml which is saved in
the XML directory.

For the simplest HelloWorld this is only:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"../../common/docbook/dtd/docbookx.dtd">
<article lang="de">
 <articleinfo>
     <title>Hello World</title>
     <author>
       <firstname>Stefan</firstname>
       <surname>Rinke</surname>
     </author>
 </articleinfo>
 <section>
     <title>Hello World</title>
     <para>That's all.</para>
 </section>
</article>                   
       

Now we have everything we need for the first test. After having installed all
tools, it should not be a problem to structure your fist project. If the
following is returned:
make all
 BUILD SUCCESSFUL
Total time: 11 seconds                       

the dist directory under build should now contain all formats of the Hello
World document created just now.

Chapter5.Short Tutorial

5.1.Book or Article


The following explanations follow mainly the reference book of Norman Walsh and
Leonard Muellner (see here).

Typically a book is a larger bit of work than an article. Its structure
comprises meta information (<bookinfo>), which describe the author, the title
and the copyright information. It might also have a Preface Vorwort and several
chapters which are divided into paragraphs. In addition it is possible to have
appendixes, literatur reference list, a glossary and an index.

An article is much shorter as it does not contain as much parts as a book and
is usually only structured into individual chapters. But it does also contain
meta information (<articleinfo>) and can have appendixes, a literature
reference list, glossary and index.

5.2.A simple DocBook book


The following XML section shows a typical structure of a DocBook book.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"../../common/docbook/dtd/docbookx.dtd">
<book lang="de">
 <title>Mein erstes DocBook Buch</title>
 <bookinfo>(1) 
     <author>
       <honorific>Dipl.-Ing</honorific>
       <firstname>Stefan</firstname>
       <surname>Rinke</surname>
     </author>
 </bookinfo>
 <preface>(2) 
     <title>Vorwort</title>
     <para>In einem Vorwort stehen Dinge ber das Buch an sich.</para>
 </preface>
 <chapter>(3) 
     <title>Einfhrung</title>
     <para>Dies ist ein einfacher Artikel
       <indexterm>(4) 
         <primary>Artikel</primary>
         <secondary>DocBook-Artikel</secondary>
       </indexterm>.
     Das ist nur ein Beispielabsatz.</para>
 </chapter>
 <appendix>(5) 
     <title>Anhang 1</title>
     <para>Das ist ein Anhang.</para>
 </appendix>
 <glossary>(6) 
     <glossdiv>
       <title>Ein Glossar</title>
       <glossentry>
         <glossterm id="DocBook">DocBook</glossterm>
         <glosssee>DocBook ist ein Standard zur Erstellung...</glosssee>
       </glossentry>
     </glossdiv>
 </glossary>
 <index/>(7) 
</book>

(1) The Bookinfo section is optional although you might want to be named as the
    author.
(2) The Preface is also optional. But if the book becomes very large, the
    preface will have about this strucutre.
(3) Here follow one or several chapters which start with a title <title> and
    are even more structured by paragraphs <para>.
(4) With this an index entry is created. The secondary entry is optional.
(5) Following tihs you will have one or more appendixes.
(6) If you want to, you can also add a glossary.
(7) This tag marks the position where the completed index is to be inserted.


To see how such a book might look in HTML[3], click hier. [testbookde.html]

5.3.An artikel


A simple article looks accordingly:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"../../common/docbook/dtd/docbookx.dtd">
<article lang="de">
 <title>Mein erster Artikel</title>
 <articleinfo>
     <author>
       <honorific>Dipl.-Ing</honorific>
       <firstname>Stefan</firstname>
       <surname>Rinke</surname>
     </author>
 </articleinfo>
 <para>Dies ist ein einfacher Artikel. Dieser Ausdruck
     <indexterm>
       <primary>Ausdruck</primary>
     </indexterm>
         soll im Index erscheinen.</para>
 <appendix>
     <title>Anhang 1</title>
     <para>Das ist ein Anhang.</para>
 </appendix>
 <glossary>
     <glossdiv>
       <title>Ein Glossar</title>
       <glossentry>
         <glossterm id="DocBook">DocBook</glossterm>
         <glosssee>DocBook ist ein Standard zur Erstellung...</glosssee>
       </glossentry>
     </glossdiv>
 </glossary>
 <index/>
</article>         

To see the result online, click here [testarticlede.html].

5.4.Online Resources


More deeper knowledge on DocBook can be found in the following online
resources.

5.4.1.DocBook: The Definitive Guide


The Standard: DocBook: The Definitive Guide [http://www.oreilly.com/catalog/
docbook/chapter/book/docbook.html] [http://www.oreilly.com/catalog/docbook/
chapter/book/docbook.html]which can be read online and offers information from
scratch to reference material.



Figure5.1. DocBook: The definitive Guide
[img001]    

5.4.2.Tutorial by Stephan Wiesner


Stephan Wiesner has written an excellent tutorial. It is also available online
[http://rzserv2.fhnon.de/~lg002556/docbuch/].

5.4.3.Online Tutorial von Lars Trieloff


An excellent tutorial by Lars Trieloff can be found under http://trieloff.net/
doctutorial/build/index.html [http://trieloff.net/doctutorial/build/index.html]
.

------------

[3] this depends on the defined style sheets

Chapter6.DocBook publishing with OpenOffice


A very easy possibility to work with docbook is to use OpenOffice as an editor.
OpenOffice uses xml as natural file format, therefore it is quite obvious to
think about a conversion from OpenOffice writer files to docbook xml. Indeed
this is relatively easy. The following steps are necessary:
   
*Unpack the OpenOffice file
*Join the individual XML files to a large, flat XML file
*Swap out embedded pictures
*Transform the large XML file to DocBook using a xslt stylesheet


To make it clearer, what exactly is going on, the procedure is to be described
briefly.

OpenOffice stores its documents in files with the extension swx. These files
are actually ZIP-archives[4], containing serveral XML-files. One can extract
and rearrange the individual files very easily.

In order to be able to transform via XSLT at the end, one single large XML file
is merged, whereby the different part remain separate by XSLT namespaces.

Pictures are embedded by OpenOffice directly into the XML file and stored as
base64 in the XML file. Unfortunately the usual DocBook stylesheet are not able
to use these embedded pictures, but rather expect them as external files.

All these steps are settled by a small Python module of Eric Bellot. It can
downloaded here [http://www.chez.com/ebellot/ooo2sdbk/] and is part of the
whole framework either.

The following transformation is accomplished with a special stylesheet using an
arbitrary XSLT-processor (here saxon). The stylesheet of Eric Bellot was
improved for this framework and there are serveral variants, depending upon
whether one would like to write a ??book?? or an ??article??.

6.1.Interaction between OpenOffice and DocBook


OpenOffice uses paragraph formats, character formats and frame formats for
formatting documents. The are named and linked to certain layout rules. The
connection between DocBook and OpenOffice is made exactly over these predefined
formats. If for instance a sentence contains a filename, then it is formatted
in OpenOffice with the character format ??filename?? and converted later into <
filename>...</filename> .

It is the same with paragraph format for enumeration or numbered lists: one can
use them in OpenOffice as usual and they will be converted in the corrosponding
DocBook tags automatically. To make this always work right, one should use a
document template or an empty standard document. Such a document is included in
the framework.

The most corrosponding formats / tags are intuitive, a more detailed
description is on Eric Bellot's website [http://www.chez.com/ebellot/ooo2sdbk/]
.

6.2.More functions

6.2.1.Book or Article


As already mentioned there are two different stylesheets or a controllable one,
to generate alternatively a ??book?? or an ??article??. That is defined by the
build-property ??doc-type?? in the control file ??build.xml??.

6.2.2.CalloutList


The be able to produce so-called ??CalloutLists??, an additional enumerating
format ??CalloutList?? was taken up .

References are converted to cross references. It is important, that the names
of the bookmarks begins with ??co.??. If these conventions are kept, the
stylesheet automatically generates the right <co id=??co.test??> and <callout
arearefs=??co.test??>. How that functions exactly, one can see best in this
document.

6.2.3.Insert a note


If one inserts a note with OpenOffice, this is transferred simply literally to
the DocBook document. Thereby one can generate arbitrary DocBook tags. However
the exact control of the position of the tag is not possibly, so that structure
tags, that are bound to a certain content model sometimes cannot be inserted in
this way. What is possible without any problems are e.g. Index entries.

------------

[4] One can open it actually e.g. simply with WinZip

Chapter7.Download DocBook Publishing

*PDF [download/publish-pdfen-1.02.zip]
*HTML [download/publish-htmlen-1.02.zip]
*Microsoft Word [download/publish-worden-1.02.zip]
*Windows Help [download/publish-helpen-1.02.zip]
*Plaintext [download/publish-texten-1.02.zip]
*HTML (eine Datei) [download/publish-html-flaten-1.02.zip]
*XML (DocBook) [download/publish-xmlen-1.02.zip]
*OpenOffice [download/publish-OpenOfficeen-1.02.zip]
*Build Framework [download/publish-common-1.02.zip] (for users of the
    framework)



