markupparser 2.0
-----------------------------------------------------------------
markupparser parses Notebook markup into an intermediary form,
a flat list of tag/value pairs. The parsing proceeds by first
breaking the input into chunk, where each chunk consists
of one or more full lines of text. There are (at present)
four kinds of chunk:
Directives
A directive is a line beginning with a hash "#" character;
the specific directive is determined by the text that follows
the hash character. Unless otherwise indicated, any
subsequent text on the line is ignored. Unrecognized
directives are permitted. The existing directives are as
follows:
#---
Rendered as a horizontal line, e.g.,
.
#pre
Begins a chunk of preformatted text.
#unpre
Ends a chunk of preformatted text.
#Tcl
Begins a chunk of Tcl Code
#unTcl
Ends a chunk of Tcl Code
#meta
Defines a named value associated with the pages. It will be
possible to query a page for its metadata.
#data
Begins a data chunk. A data chunk is just like a Tcl Code or
preformatted text chunk in terms of how it's handled; however,
it will allow other code to easily extract data of a specific
type from a page. Eventually, it might be possible to add
rendering plugins for specific data types.
#undata
Ends a data chunk.
Preformatted
A preformatted chunk is to be rendered verbatim in a
fixed-width font. Performatted chunks are found in two
ways: either a paragraph whose first line begins with
a whitespace character, or a group of lines beginning with
a "#pre" directive and ending with a "#unpre" directive
(or the end of the input). (The directives are not
part of the chunk.)
Tcl Code
A Tcl code chunk is usually rendered Preformatted; it is
a group of lines bracketed by "#Tcl" and "#unTcl".
Wrapped Text
A wrapped text paragraph consists of one or more lines of
possibly styled text. Wrapped text paragraphs can be
indented; if indented they can have a bullet. Wrapped
text can also contain links, magic buttons, and embedded
macros.
Note: embedded macros are usually expanded *before* parsing
the markup, so usually there won't be any in the input. If
there are any, they could be anywhere, in any kind of text.
However, the parser will only recognize them in wrapped text.
Pass-through Text
Sometimes it's desirable to include text in a page that will be
passed through the parser unchanged, e.g., when using macros
to generate HTML in a page that will be exported as HTML.
There are two ways to specify pass-through text.
#data thru
Text to be passed through.
#undata
Text wrapped by #data thru/#undata will go through the parser
unchanged, and will be handled by output processors as a single
paragraph.
This text has some HTML markup.
Within a paragraph of wrapped text, the ...
tags can be used to quote a section of text to pass
through unchanged. If the example above were exported
as HTML without the ... tags, the angle
brackets on and would be escaped as
< and >.
Intermediate Form
-----------------------------------------------------------------
The parser parses the input and produces the intermediate form,
a flat list of tag/value pairs. The tags and their values are
defined as follows:
META
The value of META is a dict of the #meta values defined in
this page. If the same keyword appears multiple times, the
final value is retained. If no #meta directives appear in
this page, META appears with an empty list as its value.
META is always the first tag in the list.
HASH
One of these pairs is produced for each directive; the value is
the complete line of text, including the newline at the end.
PRE
This is usually Preformatted text, to be displayed verbatim. The
value is the exact set of lines from the input. If the chunk was
delimited by "#pre" and "#unpre" directives, then this tag/value
will be preceded and followed by the appropriate HASH tag/values.
TCL
The is a chunk of Tcl code. This tag/value will be preceded
and followed by the "#Tcl" and "#unTcl" HASH tag/values.
DATA {}
The value of DATA is a list of two elements; the data name from
the #data line and the chunk of data that appeared between the
#data and #undata.
When the is "thru", the is to be passed through
the parser unchanged.
P {:|* }
Begins a paragraph of wrapped text. The value is a list of three
items: the paragraph type (":" or "*"), the indent level (0 or
higher) and (for indented and bulleted paragraphs) the leading
string, which is the whitespace between the initial "*" or ":"
and the paragraph text. (It's used to reconstruct a parsed page
just exactly as it was.)
A normal, non-indented non-bulleted paragraph will begin with
P {: 0 {}}
A paragraph with an indent level of 1 will begin with this
(where the length of the leading string depends on the actual
input):
P {: 1 { }}
A bulleted paragraph with an indent level of 1 will begin with
this:
P {* 1 { }}
For bulleted paragraphs, the indent level must be 1 or more.
Note that a normal paragraph is simply an indented paragraph
with an indent level of zero.
/P
Terminates a paragraph of wrapped text, of whatever kind.
The following tags represent components of a paragraph of wrapped
text; they will always appear between P and /P.
TXT
The value is plain text to be wrapped and rendered.
THRU
The value is text in the current output format; it
was passed through the parser unchanged.
STY {}
Notebook markup defines a number of HTML-like style codes,
e.g., ... and .... When one is found in a
wrapped paragraph, this tag is generated. The tag's value is
a list of three elements: the raw style code (), the
style letter (b), and a flag: 1 when the style turns on and
0 when the style turns off. Note that all styles are
cancelled at the end of the paragraph.
A renderer is free to render the styles as it prefers.
LINK
The value is a page link, not including the [ and ].
BTN