Untitled
Note: I haven't yet written a real version of this document. What follows is a cursory description for until I do.

Description

This library provides procedures for converting multi­font text created with the jrichtext.tcl library (or with compatible tags) in a Tk text widget to a variety of other formats, including a generic `Save As...' panel you can use to prompt your users for a filename and file type.

The library contains a lot of procedures, currently only the most important public procedures are documented. If all you want to do is let your users save the contents of a rich­text widget in various formats, the only procedure you'll need is j:tc:saveas.

Currently, the following output formats are supported:


Except when converting to HTML, only font information is converted; underlining, colours, and other tags are not (yet :-) converted.

However, when converting to HTML, jdoc hypertext links to other documents (i.e., not within the same document) are preserved after a fashion. (These can be links to other jdoc documents, which will be converted to references to HTML documents, or standard Web URLs, which will be preserved without change.) The HTML links generated for links to other jdoc documents (as opposed to standard URLs) may need hand­editing, however, since relative links in HTML documents and jdoc don't follow the same rules.

Also, when converting to HTML, one level of unordered list is supported. (The lists generated may be very strange if you haven't been careful when typing the list; all text within the list must be tagged as an unordered list, and list item markers must appear within the list in appropriate places.)

Thanks

Thanks to Miguel Santana <santana@imag.fr> for permission to use the /reencodeISO procedure from his a2ps program when converting rich­text to PostScript.

Tags

This library considers the following tags:
richtext:font:roman
richtext:font:italic
richtext:font:bold
richtext:font:bolditalic
richtext:font:typewriter
richtext:font:heading0
richtext:font:heading1
richtext:font:heading2
richtext:font:heading3
richtext:font:heading4
richtext:font:heading5
jdoc:link:link
(where link is a URL or the name of a jdoc document)

See Also

jtexttags.tcl
jrichtext.tcl

j:tc:saveas

Usage

j:tc:saveas t

Argument

t is the text widget whose content is to be converted

Description

This procedure brings up a File Selection panel with an option button that lets the user choose among the supported file formats. When the user chooses a format and a name and clicks OK or presses Return, the text widget t is saved in the chosen format in the specified file.

Warning

The File Selection panel seems to cause Tk scripts to crash under at least some beta versions of Tk 4.0.

Conversion Procedures

All of the following take as their sole argument a text widget whose contents are to be converted, and return the contents of that text widget converted to the given format as their value. (Note that this can mean that you're schlepping around some fairly large strings.)

j:tc:tclrt:convert_text t - convert to Tcl­format richtext; see jrichtext.tcl
j:tc:tex:convert_text t - convert to TeX source
j:tc:html:convert_text t - convert to HTML (without links, currently)
j:tc:ps:convert_text t - convert to PostScript

Comments on Formats

Tcl­Format Richtext

Because it's designed to write into a text widget, this is the most faithful format.

The distinction between j:rt:par and two successive j:rt:cr's is lost when converting text that was generated with the jrichtext.tcl library, but it's not actually reflected in the text widget in the first place.

TeX

The TeX generated by j:tc:tex:convert_text works, but it's really weird and unnecessarily verbose. It makes lots of characters active and changes some standard parameters, so if you try to embed it in TeX documents of your own you should enclose it in braces. If you don't use any non­ASCII characters, you can trim off most of the preamble, which provides support for the ISO 8859-1 character set.

Tabs are converted to a fixed amount of whitespace, and spaces at the beginning of a line are lost. Multiple blank lines are also lost.

HTML

When converting to HTML, tabs are lost, as is any spacing at the beginnings of lines.

The distinction between paragraphs and line breaks is lost; all sequences of line breaks are translated as a single <P> code.

PostScript

The line­breaking algorithm is hideous, and long words are likely to be wrapped across lines.

Tabs are rendered as a fixed amount of space. Spaces occasionally appear at the beginnings of lines when they shouldn't (similarly to the way they do in the Tk text widget).

What is generated is actually a PostScript program that generates the formatting, rather than a set of simple page descriptions, so it makes a lot of demands on your PostScript interpreter, and may print more slowly than you expect. Also, it doesn't conform to the PostScript comment conventions (it can't), so tools that need to work with PostScript files page­by­page will fail.

The ISO 8859-1 character set is supported only if you have a Level 2 PostScript interpreter (or at least an interpreter than knows ISOLatin1Encoding).

Bugs and Misfeatures

* The code needs to be reorganised. Code is shared between different formats that shouldn't be, and code isn't shared that should be.

* Whitespace is often lost or garbled in many of the formats.

* Much of the code is pretty inefficient.

* Tags other than font tags should be handled, for instance, colour and underlining should be supported (where possible).

Future Directions

* In addition to improving the existing conversions (and they really need it!), I'd like to provide modes for plain­text (with lines broken sensibly, and maybe capitalisation for headers) and formatted text (like nroff(1) output). LaTeX, RTF, and troff are other possibilities.

* It would be nice to support WYSIWYG writing of manual pages, or generation of them from jdoc documents. This would probably require a little additional information beyond what's in the text widget (e.g. name and description, section of the manual, etc.)

* When jdoc documents are converted to HTML, I'd like to translate hypertext links and anchors as well as fonts. (The capabilites of jdoc are closely modelled after those expressible in HTML.)

* The exact fonts used when generating PostScript and TeX should be user preferences.

* The TeX conversion does a lot of work to support ISO 8859-1. This should only be done if there are actually non­ASCII characters in the text (or perhaps it should be a user preference). The PostScript conversion should support ISO 8859-1 even on Level 1 interpreters (it's easy enough).