Description

Automatic ASCII smart quotes and ligature handling for SXML

Author

Peter Bex

Version

Requires

Download

fancypants.egg

Documentation

Fancypants is a fairly simple set of functions plus an SXSLT ruleset to automagically convert SXML with plain-ASCII strings to typographically enhanced Unicode strings. Ligatures are added and quotes are educated ie, opening quotes are curled to the left while closing quotes are curled the other way. An example piece of SXML:

(sxml-apply-rules
  '(blockquote "\"The affable Estonian wasn't fired\","
	       " said the --- strangely afflicted ---"
	       " flying monkey at the office.")
  (make-fancy-rules)
  (make-smart-quote-rules))

When rendered, looks like the following:

“The affable Estonian wasn’t fired”, said the — strangely afflicted — flying monkey at the office.

Which looks like this without using fancypants:

"The affable Estonian wasn't fired", said the --- strangely afflicted --- flying monkey at the office.

As you can see, the quotes are curled correctly, the three minuses are converted to real emdashes and the 'fi', 'ffl', 'fl' and 'ff' characters are replaced by ligatures that merge the characters in a nice way.

A word of warning: How the ligatures are displayed depends heavily on the particular font being used and the implementation of the fonts. For example, on a Mac, most MS Corefonts are apparently modified by Apple to support all ligatures, while the basic Corefonts by Microsoft (as found under Windows and many Unix installations) are lacking ligatures in most fonts. Consider this before using Fancypants' ligature capability (the fi and ff ligatures are reasonably safe to use in most cases, though). Testing on a number of platforms is, unfortunately, still a good idea while doing webdevelopment.

Fancypants was inspired by SmartyPants and, more specifically, Mikhail Wolfson's ligatures hack.

Rulesets

There are two rulesets: one for auto-conversion of ligatures and other types of character combinations to Unicode and one for smartening quotes. Both rulesets are generated by functions.

procedure: (make-fancy-rules [exceptions default-exceptions] [character-map full-map])
Create a ruleset that performs ASCII->Unicode mappings for all entries in the character-map argument. Please note that the order matters because the replacement algorithm employes a nongreedy search. Place prefixes of other matches after them and there is no problem. The symbols in exceptions are the tags to leave alone. (ie, nothing below these is fancified)
procedure: (make-smart-quote-rules [exceptions default-exceptions] [quotes all-quotes])
Create a ruleset that educates quotes. quotes defines the strategy of how to translate quotes to smart quotes. See the documentation for all-quotes for more info on the structure of this argument. Please note that here, the order doesn't matter because the replacement algorithm uses simple regexes. The symbols in exceptions are the tags to leave alone. (ie, under these nothing has its quotes changes)

Constants

constant: default-exceptions
This constant is a list of all the tags (symbols) that are ignored by default. These are: (head script pre code kbd samp @)
constant: default-ligature-map
An alist of default ASCII sequences that are translated to ligatures by make-fancy-rules. Contains mappings for 'ffi', 'ffl', 'ff', 'fi', 'fl' and 'ft'. The mapping for 'st' is intentionally left out because this ligature is too elaborate to use in body copy. You could easily define a ruleset for eg headings that does include the 'st' ligature (it's Unicode character fb06).
constant: default-punctuation-map
An alist of default ASCII punctuation sequences to translate to 'fancy' Unicode versions. Contains mappings for '...' => '…', '..' => '‥', '. . .' => '…', '---' => '—' and '--' => '–'.
constant: default-arrow-map
An alist of default ASCII sequences to translate to 'fancy' Unicode versions. This contains several types of arrows. Useful mostly for mathematical texts and 'evaluates to' examples.
constant: default-map
The default map to use for fancifying text. This is simply a concatenation of default-ligature-map, default-punctuation-map and default-arrow-map.
constant: all-quotes
The quote characters in here to be translated by make-smart-quotes. Remove any you don't want to have handled.The structure of an entry in this list is: (pre match post how counts?). pre is the part of the string that's before the quote to match, post is the string that is after the match. These are all regexes, or #f. how is one of: single, double, single-open, double-open, single-close or double-close. counts? is a boolean describing whether the quote should influence the nesting of subsequent quotes or not. (ie, "isn't" => #f, since the ' is not a quote which matches a preceding quote or which is matched by a subsequent quote). Note that you (currently) can't use brackets in these regexes, since that messes up the expected structure of the result of string-search-positions.

Helper functions

These functions are used internally by Fancypants, but they are probably useful enough to export, so here they are.

procedure: (fancify string character-map)
Perform simple substitution of all ASCII character strings in the character-map alist to their Unicode character within string.
procedure: (smarten-quotes sxml quotes exceptions)
Smarten the sxml. Translates only the strings in the quotes argument, and skips all tag names in the exceptions list

License

Copyright (c) 2006-2009, Peter Bex (peter.bex@xs4all.nl)
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
   notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
   notice, this list of conditions and the following disclaimer in the
   documentation and/or other materials provided with the distribution.
3. Neither the name of author nor the names of any contributors may
   be used to endorse or promote products derived from this software
   without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.