Description

A programmable filter

Author

Dorai Sitaram

Version

Usage

(require-extension mistie)

Download

mistie.egg

Documentation

Mistie

Mistie is a programmable filter. Its primary aim is to let the user define a document's markup using Scheme.

By itself, Mistie does not require any style of markup or format of either its input or its output. It simply copies its standard input to standard output as is. E.g.,

csi -R mistie -e '(mistie-main)' < input.doc > output.doc

produces an output.doc that is indistinguishable from input.doc. mistie-main can be given a file's name as argument, in which case it reads that file instead of standard input. Thus, the above command is equivalent to

csi -R mistie -e '(mistie-main "input.doc")' > output.doc

To make Mistie do something more interesting than copying input verbatim to output, the user must supply a format file. A format file is a Scheme file that describes the markup of the input document in terms of the desired output format. Format files are normal Scheme files and can be loaded with mistie-load. E.g.,

csi -R mistie myformat.mistie -e '(mistie-main)' < input.doc

produces a formatted version of input.doc, the formatting being dictated by the format file myformat.mistie. The formatted version may either go to standard output or to some file depending on myformat.mistie. We will use the .mistie extension for Scheme files used as format files, but this is just a convention.

In general, a format file will use the Mistie infrastructure to define a particular markup, deciding both what the input document should look like and what kind of output to emit. Format authors are not limited to a specialized sublanguage -- they can use full Scheme, including all the nonstandard features of the particular Scheme dialect they have at their disposal.

Writing a format file requires some Scheme programming skill. If you're already a Scheme programmer, you are all set. If not, you can rely on format files written by people whose taste you trust. If it helps, Mistie is somewhat like TeX in its mode of operation (though not in its domain), with the ``macro'' language being Scheme. The analogy is not perfect though: There are no predefined primitives (everything must be supplied via a format file), and the output style is CFD (completely format dependent) rather than some DVI (device independent). (Hope that wasn't too mistie-rious.)

The distribution includes several sample format files: Format files may be combined in the call to mistie.scm, e.g.,

csi -R mistie plain.mistie footnote.mistie -e '(mistie-main "file.doc}")' > file.html
csi -R mistie plain.mistie multipage.mistie -e '(mistie-main "file.doc")'

Alternatively, a new combination format file can be written that loads other format files. E.g., the following format file basic.mistie combines within itself the effects of plain.mistie, scmhilit.mistie, and multipage.mistie:

; File: basic.mistie

(mistie-load "plain.mistie") ;or use `load' with full pathnames
(mistie-load "scmhilit.mistie")
(mistie-load "multipage.mistie")

It is invoked in the usual manner:

csi -R mistie basic.mistie -e '(mistie-main "file.doc")'

Note that the format file multipage.mistie creates a set of .html files whose names are based on the name of the input document. Therefore, when using this format file, whether explicitly or implicitly, redirection of standard input or standard output is inappropriate.

The name Mistie stands for Markup In Scheme That Is Extensible. Possible pronunciations are miss-tea and miss-tie.

1. Writing Mistie formats

A typical intent of a format file is to cause certain characters in the input document to trigger non-trivial changes in the output document. E.g., if the output is to be HTML, we'd like the characters <, >, &, and " in the input to come out as &lt;, &gt;, &amp;, and &quot;, respectively. The Mistie procedure mistie-def-char can be used for this:

(mistie-def-char #\< 
  (lambda ()
    (display "&lt;")))

(mistie-def-char #\> 
  (lambda ()
    (display "&gt;")))

(mistie-def-char #\& 
  (lambda ()
    (display "&amp;")))

(mistie-def-char #\" 
  (lambda ()
    (display "&quot;")))

mistie-def-char takes two arguments: The first is the character that is defined, and the second is the procedure associated with it. Here, the procedure writes the HTML encoded version of the character.

Suppose we want a contiguous sequence of blank lines to be come out as the paragraph separator, <p>. We could mistie-def-char the newline character as follows:

(mistie-def-char #\newline
  (lambda ()
    (newline)
    (let* ((s (h-read-whitespace))
           (n (h-number-of-newlines s)))
      (if (> n 0)
          (begin (display "<p>")
            (newline) (newline))
          (display s)))))

This will cause newline to read up all the following whitespace, and then check to see how many further newlines it picked up. If there was at least one, it outputs the paragraph separator, viz., <p> followed by two newlines (added for human readability). Otherwise, it merely prints the picked up whitespace as is. The help procedures h-read-whitespace and h-number-of-newlines are ordinary Scheme procedures:

(define h-read-whitespace
  (lambda ()
    (let loop ((r '()))
      (let ((c (peek-char)))
        (if (or (eof-object? c) (not (char-whitespace? c)))
            (list->string (reverse r))
            (loop (cons (read-char) r)))))))

(define h-number-of-newlines
  (lambda (ws)
    (let ((n (string-length ws)))
      (let loop ((i 0) (k 0))
        (if (>= i n) k
            (loop (+ i 1)
              (if (char=? (string-ref ws i) #\newline)
                  (+ k 1) k)))))))

1.1 Control sequences

The Mistie procedure mistie-def-ctl-seq defines control sequences. A control sequence is a sequence of letters (alphabetic characters), and is invoked in the input document by prefixing the sequence with an escape character. (The case of the letters is insignificant.) mistie-def-ctl-seq associates a procedure with a control sequence -- when the control sequence occurs in the input document, it causes the procedure to be applied. The following defines the control sequence br, which emits the HTML tag <br>:

(mistie-def-ctl-seq 'br
  (lambda ()
    (display "<br>")))

Before a control sequence can be used, we must fix the escape character. The following sets it to backslash:

(set! mistie-escape-char #\\)

We can now invoke the br control sequence as \br.

1.2 Frames

However, we can do better and get automatic line breaks with a more powerful control sequence. Let's say text between \obeylines and \endobeylines should have automatic line breaks. We define the control sequences obeylines and endobeylines as follows:

(mistie-def-ctl-seq 'obeylines
  (lambda ()
    (mistie-push-frame)
    (mistie-def-char #\newline
      (lambda ()
        (display "<br>")
        (newline)))
    (mistie-def-ctl-seq 'endobeylines
      (lambda ()
        (mistie-pop-frame)))))

The obeylines control sequence first pushes a new frame on to the Mistie environment, using the Mistie procedure mistie-push-frame. What this means is that any definitions (whether mistie-def-char or mistie-def-ctl-seq) will shadow existing definitions. The Mistie procedure mistie-pop-frame exits the frame, causing the older definitions to take effect again.

In this case, we create a shadowing mistie-def-char for newline, so that it will emit <br> instead of performing its default action (which, as we described above, was to look for paragraph separation). We also define a control sequence endobeylines which will pop the frame pushed by obeylines. With this definition in place, any text sandwiched between \obeylines and \endobeylines (assuming \ is the escape character) will be output with a <br> at the end of each of its lines.

1.3 Calling Scheme from within the document

We can define a control sequence eval that will allow the input document to explicitly evaluate Scheme expressions, without having to put them all in a format file.

(mistie-def-ctl-seq 'eval
  (lambda ()
    (eval (read))))

This will cause \eval followed by a Scheme expression to evaluate that Scheme expression. E.g.,

\eval (display (+ 21 21))

will cause 42 to be printed at the point where the \eval statement is placed. Of course, once you have arbitrary access to Scheme within your document, the amount of kooky intertextual stuff you can do is limited only by your imagination. A mundane use for \eval is to reset the escape character at arbitrary locations in the document, should the existing character be needed (temporarily or permanently) for something else.

1.4 Utilities

To load a mistie file, you should use the (mistie-load FILENAME ...) procedure, which will search the working directory and the directory given in the parameter mistie-path (which defaults to the value of (repository-path), the path where chicken-setup will install the initially provided .mistie files).

(mistie-main [FILENAME]) will invoke the filtering process.

2 Sample format files

Several example format files are included in the Mistie distribution. They are:

2.1 plain.mistie

plain.mistie is a basic format file. It specifies a minimal markup that produces HTML. Example usage:

mistie.scm -f plain.mistie input.doc > input.html

plain converts the characters <, >, &, and " to their HTML encodings. One or more blank lines are treated as paragraph separation.

plain provides a small set of control sequences geared for manual writing. The default escape character is \ (backslash). Typically, arguments of plain's control sequences are specified within braces ( {...}), as in TeX or LaTeX.

\i typesets its argument in italic. E.g., \i{italic} produces italic. Other control sequences in this vein are \b for bold and \small for small print.

\p puts its argument in monospace fixed font and is used for program code. If it is not convenient to enclose \p's argument in braces (e.g., the enclosed code contains non-matching braces), then the argument may be specified by the placing the same character on each side. (This is like LaTeX's \verb.) Another useful feature of the \p control sequence: If its argument starts with a newline, it is displayed with the linebreaks preserved.

Use \title for specifying a document's title, which is used as both the internal title and the external (bookmarkable) title.

\stylesheet{file.css} causes the resulting HTML file to use the file file.css as its style sheet. A sample style sheet mistie.css is included in the distribution.

\section, \subsection, \subsubsection produce numbered section headers of the appropriate depth. \section*, etc., produce unnumbered sections.

\urlh{URL}{TEXT} typesets TEXT as a link to URL.

\obeylines{...} preserves linebreaks for its argument. Note that this is dissimilar in call, though not in function, to TeX's {\obeylines ...}.

\flushright is like \obeylines, but sets its argument lines flush right.

\input FILE or \input{FILE} includes the contents of FILE.

\eval evaluates the following Scheme expression.

2.2 footnote.mistie

This format supplies the \footnote control sequence, which makes a footnote out of its (brace-delimited) argument. Footnotes are numbered from 1, and the footnote text is placed on the bottom of the same page as the footnote call.

2.3 scmhilit.mistie

This format provides the \q control sequence which is used exactly like \p, except that it syntax-highlights the enclosed code. ( q is p with a twist.) Used for Scheme and Common Lisp code. scmhilit distinguishes between syntactic keywords (i.e., special forms and macros); user-defined constants; variables; booleans; characters; numbers; strings; comments; and background punctuation. You can add your own keywords and constants with \scmkeyword and \scmconstant, e.g.,

\scmkeyword (prog1 block)

\scmconstant (true false)

A style sheet (see plain.mistie) is used to set the colors. The style sheet mistie.css, provided with this distribution, has the following style class settings:

.scheme {
color: brown;
}

.scheme .keyword {
color: #cc0000;
font-weight: bold;
}

.scheme .variable {
color: navy;
}

.scheme .number,.string,.char,.boolean,.constant {
color: green;
}

.scheme .comment {
color: teal;
}

The class .scheme specifies the background punctuation style, and the various subclasses, -- .keyword, .variable, etc. -- specify the styles for the various syntactic categories. Note that we have combined the subclasses for numbers, strings, etc., into one, but you can separate them out if you want to distinguish between them.

You may wish to modify these settings for your documents. Additionally, there are browser-specific ways you can use to override the settings of other authors' documents.

2.4 multipage.mistie

This format provides the \pagebreak control sequence, which causes a fresh HTML page to used for subsequent text. The names of the HTML pages depend on the name of the input file, which means that standard input/output redirection on Mistie doesn't make sense when using this format.

Navigation bars at the bottom allow the user to travel across the pages.

2.5 xref.mistie

This provides LaTeX-like cross-references. \label{LABEL} associates LABEL with the nearest section (or footnote) number. \ref{LABEL} prints the number associated with LABEL.

\bibitem can be used to enumerate bibliographic entries. \cite{BIBKEY} points to the entry introduced by \bibitem{BIBKEY}. \cite's argument can list multiple keys, with comma as the separator.

2.6 timestamp.mistie

This prints the date of last modification at the bottom of the (first) page.

License

Copyright (c) 2006, Dorai Sitaram.  All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the Software),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED ASIS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.