To generate the parser:
(require-extension lalr) ;; this will only work in the interpreter (lalr-parser grammar definition ... )
To use the parser:
(include "parser.grm.scm") ;; file generated by lalr-parser macro (parser lexer parse-error) ;; parser is the function defined in the grammar file ;; lexer is a user-defined character-reading function ;; parse-error is a user-defined error function
lalr implements an efficient algorithm for computing lookahead sets, which is the same as the one used in Bison (GNU yacc):
Efficient Computation of LALR(1) Look-Ahead Set, F. DeRemer and T. Pennello, TOPLAS, vol. 4, no. 4, october 1982.
The official documentation for lalr can be found here: http://schemeway.dyndns.org/Lalr/lalr.html. The following is taken from chapters 4 and 6 in the official documentation.
The file lalr.scm declares a macro called lalr-parser:
(lalr-parser [options] tokens rules ...)To use this macro, you must first load lalr.scm in the Scheme interpreter via the include special form. The macro will not work in the compiler.
This macro, when given appropriate arguments, generates an LALR(1) parser. The macro accepts at least two arguments. The first is a list of symbols which represent the terminal symbols of the grammar. The remaining arguments are the grammar production rules.
The grammar is specified by first giving the list of terminals and the list of non-terminal definitions. Each non-terminal definition is a list where the first element is the non-terminal and the other elements are the right-hand sides (lists of grammar symbols). In addition to this, each right-hand side can be followed by a semantic action.
For example, consider the following (yacc) grammar for a very simple expression language:
e : e '+' t | e '-' t | t ; t : t '*' f : t '/' f | f ; f : ID ;
The same grammar, written for the scheme parser generator, would look like this (with semantic actions):
(define expr-parser (lalr-parser ; Terminal symbols (ID + - * /) ; Productions (e (e + t) : (+ $1 $3) (e - t) : (- $1 $3) (t) : $1) (t (t * f) : (* $1 $3) (t / f) : (/ $1 $3) (f) : $1) (f (ID) : $1)))</pre>
In semantic actions, the symbol $n refers to the synthesized attribute value of the nth symbol in the production. The value associated with the non-terminal on the left is the result of evaluating the semantic action (it defaults to #f).
The above grammar implicitly handles operator precedences. It is also possible to explicitly assign precedences and associativity to terminal symbols and productions a la Yacc. Here is a modified (and augmented) version of the grammar:
(define expr-parser (lalr-parser ; Terminal symbols (ID (left: + -) (left: * /) (nonassoc: uminus)) (e (e + e) : (+ $1 $3) (e - e) : (- $1 $3) (e * e) : (* $1 $3) (e / e) : (/ $1 $3) (- e (prec: uminus)) : (- $2) (ID) : $1)))
The left: directive is used to specify a set of left-associative operators of the same precedence level, the right: directive for right-associative operators, and nonassoc: for operators that are not associative. Note the use of the (apparently) useless terminal uminus. It is only defined in order to assign to the penultimate rule a precedence level higher than that of * and
lalr implements a very simple error recovery strategy. A production can be of the form
(rulename ... (error TERMINAL) : action-code)
There can be several such productions for a single rulename.This will cause the parser to skip all the tokens produced by the lexer that are different than the given TERMINAL. For a C-like language, one can synchronize on semicolons and closing curly brackets by writing error rules like these:
(stmt (expression SEMICOLON) : ... (LBRACKET stmt RBRACKET) : ... (error SEMICOLON) (error RBRACKET))
Conflicts in the grammar are handled in a conventional way. In the absence of precedence directives, Shift/Reduce conflicts are resolved by shifting, and Reduce/Reduce conflicts are resolved by choosing the rule listed first in the grammar definition.
The parser generated by the lalr-parser macro is a function that takes two parameters. The first parameter is a lexical analyzer while the second is an error procedure. The lexical analyzer is zero-argument function (a thunk) invoked each time the parser needs to look-ahead in the token stream. A token is usually a pair whose car is the symbol corresponding to the token (the same symbol as used in the grammar definition). The cdr of the pair is the semantic value associated with the token. For example, a string token would have the car set to 'string while the cdr is set to the string value "hello".
Once the end of file is encountered, the lexical analyzer must always return the symbol'*eoi* each time it is invoked.
The error procedure must be a function that accepts at least two parameters, a message and additional arguments to be printed.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; FILE: calc.grm ;;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; Parser for simple calculator in Scheme. ;;; ;;; To use, run the command: ;;; ;;; csi < calc.grm ;;; ;;; This will create file `calc.grm.scm'. Then, to compile the ;;; calculator program, run: ;;; ;;; csc calc.scm ;;; ;; ;; ;; @created "Tue Jan 6 12:47:23 2004" ;; @modified "Mon Oct 25 11:07:24 2004" ;; @author "Dominique Boucher" ;; @copyright "Dominique Boucher" ;; ;; Simple arithmetic calculator. ;; ;; This program illustrates the use of the lalr-scm parser generator ;; for Scheme. It is NOT robust, since calling a function with ;; the wrong number of arguments may generate an error that will ;; cause the calculator to crash. ;;; ;;;; The LALR(1) parser ;;; (require-extension lalr) (define calc-parser (lalr-parser ;; --- Options ;; output a parser, called calc-parser, in a separate file - calc.grm.scm, (output: calc-parser "calc.grm.scm") ;; output the LALR table to calc.out (out-table: "calc.grm.out") ;; --- token definitions (ID NUM NEWLINE LPAREN RPAREN (nonassoc: =) (left: - +) (left: * /) (left: COMMA) (nonassoc: uminus)) (lines (lines line) : (display-result $2) (line) : (display-result $1)) ;; --- rules (line (NEWLINE) : (void) (assign NEWLINE) : $1 (expr NEWLINE) : $1 (error NEWLINE) : #f) (assign (ID = expr) : (add-binding $1 $3)) (expr (NUM) : $1 (ID) : (get-binding $1) (ID LPAREN RPAREN) : (invoke-proc $1 (list)) (ID LPAREN args RPAREN) : (invoke-proc $1 $3) (expr + expr) : (+ $1 $3) (expr - expr) : (- $1 $3) (expr * expr) : (* $1 $3) (expr / expr) : (/ $1 $3) (- expr (prec: uminus)) : (- $2) (LPAREN expr RPAREN) : $2) (args (expr) : (list $1) (args COMMA expr) : (cons $3 $1)))) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; END FILE: calc.grm ;;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; FILE: calc.scm ;;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;;; Simple calculator in Scheme ;;; ;; ;; @created "Tue Jan 6 12:47:23 2004" ;; @modified "Mon Oct 25 11:07:24 2004" ;; @author "Dominique Boucher" ;; @copyright "Dominique Boucher" ;; ;; Simple arithmetic calculator. ;; ;; This program illustrates the use of the lalr-scm parser generator ;; for Scheme. It is NOT robust, since calling a function with ;; the wrong number of arguments may generate an error that will ;; cause the calculator to crash. ;; parser (include "calc.grm.scm") ;;; ;;;; The lexer ;;; (define (make-lexer errorp) (lambda () (letrec ((skip-spaces (lambda () (let loop ((c (peek-char))) (if (and (not (eof-object? c)) (or (char=? c #\space) (char=? c #\tab))) (begin (read-char) (loop (peek-char))))))) (read-number (lambda (l) (let ((c (peek-char))) (if (char-numeric? c) (read-number (cons (read-char) l)) (string->number (apply string (reverse l))))))) (read-id (lambda (l) (let ((c (peek-char))) (if (char-alphabetic? c) (read-id (cons (read-char) l)) (string->symbol (apply string (reverse l)))))))) ;; -- skip spaces (skip-spaces) ;; -- read the next token (let loop ((c (read-char))) (cond ((eof-object? c) '*eoi*) ((char=? c #\newline) 'NEWLINE) ((char=? c #\+) '+) ((char=? c #\-) '-) ((char=? c #\*) '*) ((char=? c #\/) '/) ((char=? c #\=) '=) ((char=? c #\,) 'COMMA) ((char=? c #\() 'LPAREN) ((char=? c #\)) 'RPAREN) ((char-numeric? c) (cons 'NUM (read-number (list c)))) ((char-alphabetic? c) (cons 'ID (read-id (list c)))) (else (errorp "PARSE ERROR : illegal character: " c) (skip-spaces) (loop (read-char)))))))) (define (read-line) (let loop ((c (read-char))) (if (and (not (eof-object? c)) (not (char=? c #\newline))) (loop (read-char))))) ;;; ;;;; Environment management ;;; (define *env* (list (cons '$$ 0))) (define (init-bindings) (set-cdr! *env* '()) (add-binding 'cos cos) (add-binding 'sin sin) (add-binding 'tan tan) (add-binding 'expt expt) (add-binding 'sqrt sqrt)) (define (add-binding var val) (set! *env* (cons (cons var val) *env*)) val) (define (get-binding var) (let ((p (assq var *env*))) (if p (cdr p) 0))) (define (invoke-proc proc-name args) (let ((proc (get-binding proc-name))) (if (procedure? proc) (apply proc args) (begin (display "ERROR: invalid procedure:") (display proc-name) (newline) 0)))) ;; value display (define (display-result v) (if v (begin (display "==> ") (display v) (newline)))) ;;; ;;;; The main program ;;; (define calc (lambda () (call-with-current-continuation (lambda (k) (display "********************************") (newline) (display "* Mini calculator in Scheme *") (newline) (display "* *") (newline) (display "* Enter expressions followed *") (newline) (display "* by [RETURN] or 'quit()' to *") (newline) (display "* exit. *") (newline) (display "********************************") (newline) (init-bindings) (add-binding 'quit (lambda () (k #t))) (letrec ((errorp (lambda args (for-each display args) (newline))) (start (lambda () (calc-parser (make-lexer errorp) errorp)))) (start)))))) (calc) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; END FILE: calc.scm ;;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA A full copy of the GPL license can be found on Debian systems in /usr/share/common-licenses/GPL-2