~ chicken-core (chicken-5) fd66e1b108194917ee6fcb430820c1a32543e18c
commit fd66e1b108194917ee6fcb430820c1a32543e18c Author: felix <bunny351@gmail.com> AuthorDate: Wed Apr 21 20:53:38 2010 +0200 Commit: felix <bunny351@gmail.com> CommitDate: Wed Apr 21 20:53:38 2010 +0200 merged wiki manual changes made by zb diff --git a/manual/Unit library b/manual/Unit library index 697de3fa..ef0f01d3 100644 --- a/manual/Unit library +++ b/manual/Unit library @@ -463,48 +463,48 @@ The following composite conditions are additionally defined: <table> -<tr><td> (exn arity) +<tr><td> (exn arity) </td><td> Signaled when a procedure is called with the wrong number of arguments. -</td></tr><tr><td> (exn type) +</td></tr><tr><td> (exn type) </td><td> Signaled on type-mismatch errors, for example when an argument of the wrong type is passed to a built-in procedure. -</td></tr><tr><td> (exn arithmetic) +</td></tr><tr><td> (exn arithmetic) </td><td> Signaled on arithmetic errors, like division by zero. -</td></tr><tr><td> (exn i/o) +</td></tr><tr><td> (exn i/o) </td><td> Signaled on input/output errors. -</td></tr><tr><td> (exn i/o file) +</td></tr><tr><td> (exn i/o file) </td><td> Signaled on file-related errors. -</td></tr><tr><td> (exn i/o net) +</td></tr><tr><td> (exn i/o net) </td><td> Signaled on network errors. -</td></tr><tr><td> (exn bounds) +</td></tr><tr><td> (exn bounds) </td><td> Signaled on errors caused by accessing non-existent elements of a collection. -</td></tr><tr><td> (exn runtime) +</td></tr><tr><td> (exn runtime) </td><td> Signaled on low-level runtime-system error-situations. -</td></tr><tr><td> (exn runtime limit) +</td></tr><tr><td> (exn runtime limit) </td><td> Signaled when an internal limit is exceeded (like running out of memory). -</td></tr><tr><td> (exn match) +</td></tr><tr><td> (exn match) </td><td> Signaled on errors raised by failed matches (see the section on {{match}}). -</td></tr><tr><td> (exn syntax) +</td></tr><tr><td> (exn syntax) </td><td> Signaled on syntax errors. diff --git a/manual/Unit srfi-1 b/manual/Unit srfi-1 index a54684d4..72ac271f 100644 --- a/manual/Unit srfi-1 +++ b/manual/Unit srfi-1 @@ -2,47 +2,77 @@ == Unit srfi-1 -SRFI 1 (List Library) procedures. For more information, see the -[[http://srfi.schemers.org/srfi-1/srfi-1.html|SRFI 1]] document. +SRFI 1 (list library) procedures. A few procedures that can be found in R5RS, such as {{car}} and {{append}}, are omitted below. For more information, see the +[[http://srfi.schemers.org/srfi-1/srfi-1.html|original SRFI 1 document]]. [[toc:]] + +== The procedures + +The templates given below obey the following conventions for procedure +formals: + +<table> +<tr><th>LIST</th><td>A proper (finite, nil-terminated) list</td></tr> +<tr><th>CLIST</th><td>A proper or circular list</td></tr> +<tr><th>FLIST</th><td>A finite (proper or dotted) list</td></tr> +<tr><th>PAIR</th><td>A pair</td></tr> +<tr><th>X, Y, D, A</th><td>Any value</td></tr> +<tr><th>OBJECT, VALUE</th><td>Any value</td></tr> +<tr><th>N, I</th><td>A natural number (an integer >= 0)</td></tr> +<tr><th>PROC</th><td>A procedure</td></tr> +<tr><th>PRED</th><td>A procedure whose return value is treated as a boolean</td></tr> +<tr><th>=</th><td>A boolean procedure taking two arguments</td></tr></table> + +It is an error to pass a circular or dotted list to a procedure not defined +to accept such an argument. + + === Constructors <procedure>(xcons d a) -> pair</procedure><br> + (lambda (d a) (cons a d)) -Of utility only as a value to be conveniently passed to -higher-order procedures. +Of utility only as a value to be conveniently passed to higher-order +procedures. + (xcons '(b c) 'a) => (a b c) + The name stands for "eXchanged CONS." -<procedure>(cons* elt[1] elt[2] ...) -> object</procedure><br> +<procedure>(cons* elt_1 elt_2 ...) -> object</procedure><br> + +Like {{list}}, but the last argument provides the tail of the constructed +list, returning + +{{ (cons ELT_1 (cons ELT_2 (cons ... ELT_N))) }} -Like list, but the last argument provides the tail of the -constructed list, returning - (cons elt[1] (cons elt[2] (cons ... elt[n]))) +This function is called {{list*}} in Common Lisp and about half of the +Schemes that provide it, and {{cons*}} in the other half. -This function is called list* in Common Lisp and about half of the -Schemes that provide it, and cons* in the other half. (cons* 1 2 3 4) => (1 2 3 . 4) (cons* 1) => 1 <procedure>(make-list n [fill]) -> list</procedure><br> -Returns an n-element list, whose elements are all the value fill. -If the fill argument is not given, the elements of the list may be -arbitrary values. + +Returns an N-element list, whose elements are all the value FILL. If the +FILL argument is not given, the elements of the list may be arbitrary +values. + (make-list 4 'c) => (c c c c) <procedure>(list-tabulate n init-proc) -> list</procedure><br> -Returns an n-element list. Element i of the list, where 0 <= i < n, -is produced by (init-proc i). No guarantee is made about the -dynamic order in which init-proc is applied to these indices. +Returns an N-element list. Element I of the list, where 0 <= I < N, is +produced by {{(INIT-PROC I)}}. No guarantee is made about the dynamic order +in which INIT-PROC is applied to these indices. + (list-tabulate 4 values) => (0 1 2 3) @@ -50,130 +80,149 @@ dynamic order in which init-proc is applied to these indices. Copies the spine of the argument. -<procedure>(circular-list elt[1] elt[2] ...) -> list</procedure><br> +<procedure>(circular-list elt_1 elt_2 ...) -> list</procedure><br> Constructs a circular list of the elements. + (circular-list 'z 'q) => (z q z q z q ...) + <procedure>(iota count [start step]) -> list</procedure><br> Returns a list containing the elements - (start start+step ... start+(count-1)*step) -The start and step parameters default to 0 and 1, respectively. -This procedure takes its name from the APL primitive. + (START START+STEP ... START+(COUNT-1)*STEP) + + +The START and STEP parameters default to 0 and 1, respectively. This +procedure takes its name from the APL primitive. + (iota 5) => (0 1 2 3 4) (iota 5 0 -0.1) => (0 -0.1 -0.2 -0.3 -0.4) + === Predicates -Note: the predicates proper-list?, circular-list?, and dotted-list? -partition the entire universe of Scheme values. +Note: the predicates {{proper-list?}}, {{circular-list?}}, and +{{dotted-list?}} partition the entire universe of Scheme values. <procedure>(proper-list? x) -> boolean</procedure><br> -Returns true iff x is a proper list -- a finite, nil-terminated -list. +Returns true iff X is a proper list -- a finite, nil-terminated list. + +More carefully: The empty list is a proper list. A pair whose cdr is a +proper list is also a proper list: -More carefully: The empty list is a proper list. A pair whose cdr -is a proper list is also a proper list: - <proper-list> ::= () (Empty proper list) - | (cons <x> <proper-list>) (Proper-list pair) + <proper-list> ::= () (Empty proper list) + | (cons <x> <proper-list>) (Proper-list pair) -Note that this definition rules out circular lists. This function -is required to detect this case and return false. -Nil-terminated lists are called "proper" lists by R5RS and Common -Lisp. The opposite of proper is improper. +Note that this definition rules out circular lists. This function is +required to detect this case and return false. -R5RS binds this function to the variable list?. +Nil-terminated lists are called "proper" lists by R5RS and Common Lisp. The +opposite of proper is improper. - (not (proper-list? x)) = (or (dotted-list? x) (circular-list? x)) +R5RS binds this function to the variable {{list?}}. + + + (not (proper-list? X)) = (or (dotted-list? X) (circular-list? X)) <procedure>(circular-list? x) -> boolean</procedure><br> -True if x is a circular list. A circular list is a value such that -for every n >= 0, cdr^n(x) is a pair. +True if X is a circular list. A circular list is a value such that for +every N >= 0, cdr^N(X) is a pair. Terminology: The opposite of circular is finite. - (not (circular-list? x)) = (or (proper-list? x) (dotted-list? x)) + + (not (circular-list? X)) = (or (proper-list? X) (dotted-list? X)) + <procedure>(dotted-list? x) -> boolean</procedure><br> -True if x is a finite, non-nil-terminated list. That is, there -exists an n >= 0 such that cdr^n(x) is neither a pair nor (). This -includes non-pair, non-() values (e.g. symbols, numbers), which are -considered to be dotted lists of length 0. +True if X is a finite, non-nil-terminated list. That is, there exists an N >= +0 such that cdr^N(X) is neither a pair nor (). This includes non-pair, +non-() values (''e.g.'' symbols, numbers), which are considered to be +dotted lists of length 0. + + + (not (dotted-list? X)) = (or (proper-list? X) (circular-list? X)) - (not (dotted-list? x)) = (or (proper-list? x) (circular-list? x)) +<procedure>(null-list? list) -> boolean</procedure><br> + +LIST is a proper or circular list. This procedure returns true if the +argument is the empty list (), and false otherwise. It is an error to +pass this procedure a value which is not a proper or circular list. This +procedure is recommended as the termination condition for list-processing +procedures that are not defined on dotted lists. <procedure>(not-pair? x) -> boolean</procedure><br> + (lambda (x) (not (pair? x))) -Provided as a procedure as it can be useful as the termination -condition for list-processing procedures that wish to handle all -finite lists, both proper and dotted. +Provided as a procedure as it can be useful as the termination condition +for list-processing procedures that wish to handle all finite lists, both +proper and dotted. + +<procedure>(list= elt= list_1 ...) -> boolean</procedure><br> -<procedure>(list= elt= list[1] ...) -> boolean</procedure><br> +Determines list equality, given an element-equality procedure. Proper +list A equals proper list B if they are of the same length, and +their corresponding elements are equal, as determined by ELT=. If the +element-comparison procedure's first argument is from LIST_I, then its +second argument is from LIST_I+1, ''i.e.'' it is always called as {{(ELT= A +B)}} for A an element of list A, and B an element of list B. -Determines list equality, given an element-equality procedure. -Proper list A equals proper list B if they are of the same length, -and their corresponding elements are equal, as determined by elt=. -If the element-comparison procedure's first argument is from list -[i], then its second argument is from list[i+1], i.e. it is always -called as (elt= a b) for a an element of list A, and b an element -of list B. +In the N-ary case, every LIST_I is compared to LIST_I+1 (as opposed, for +example, to comparing LIST_1 to every LIST_I, for I>1). If there are no +list arguments at all, {{list=}} simply returns true. -In the n-ary case, every list[i] is compared to list[i+1] (as -opposed, for example, to comparing list[1] to every list[i], for i> -1). If there are no list arguments at all, list= simply returns -true. +It is an error to apply {{list=}} to anything except proper lists. While +implementations may choose to extend it to circular lists, note that it +cannot reasonably be extended to dotted lists, as it provides no way to +specify an equality procedure for comparing the list terminators. -It is an error to apply list= to anything except proper lists. -While implementations may choose to extend it to circular lists, -note that it cannot reasonably be extended to dotted lists, as it -provides no way to specify an equality procedure for comparing the -list terminators. +Note that the dynamic order in which the ELT= procedure is applied to pairs +of elements is not specified. For example, if {{list=}} is applied to three +lists, A, B, and C, it may first completely compare A to B, then compare +B to C, or it may compare the first elements of A and B, then the first +elements of B and C, then the second elements of A and B, and so forth. -Note that the dynamic order in which the elt= procedure is applied -to pairs of elements is not specified. For example, if list= is -applied to three lists, A, B, and C, it may first completely -compare A to B, then compare B to C, or it may compare the first -elements of A and B, then the first elements of B and C, then the -second elements of A and B, and so forth. +The equality procedure must be consistent with {{eq?}}. That is, it must be +the case that -The equality procedure must be consistent with eq?. That is, it -must be the case that +{{(eq? X Y)}} => {{(ELT= X Y)}}. - (eq? x y) => (elt= x y). +Note that this implies that two lists which are {{eq?}} are always +LIST=, as well; implementations may exploit this fact to "short-cut" the +element-by-element comparisons. -Note that this implies that two lists which are eq? are always list=, -as well; implementations may exploit this fact to "short-cut" -the element-by-element comparisons. (list= eq?) => #t ; Trivial cases (list= eq? '(a)) => #t + === Selectors -<procedure>(first pair) -> object</procedure><br> -<procedure>(second pair) -> object</procedure><br> -<procedure>(third pair) -> object</procedure><br> -<procedure>(fourth pair) -> object</procedure><br> -<procedure>(fifth pair) -> object</procedure><br> -<procedure>(sixth pair) -> object</procedure><br> +<procedure>(first pair) -> object</procedure><br> +<procedure>(second pair) -> object</procedure><br> +<procedure>(third pair) -> object</procedure><br> +<procedure>(fourth pair) -> object</procedure><br> +<procedure>(fifth pair) -> object</procedure><br> +<procedure>(sixth pair) -> object</procedure><br> <procedure>(seventh pair) -> object</procedure><br> -<procedure>(eighth pair) -> object</procedure><br> -<procedure>(ninth pair) -> object</procedure><br> -<procedure>(tenth pair) -> object</procedure><br> +<procedure>(eighth pair) -> object</procedure><br> +<procedure>(ninth pair) -> object</procedure><br> +<procedure>(tenth pair) -> object</procedure><br> + +Synonyms for {{car}}, {{cadr}}, {{caddr}}, ... -Synonyms for car, cadr, caddr, ... (third '(a b c d e)) => c @@ -181,6 +230,7 @@ Synonyms for car, cadr, caddr, ... The fundamental pair deconstructor: + (lambda (p) (values (car p) (cdr p))) This can, of course, be implemented more efficiently by a compiler. @@ -188,186 +238,202 @@ This can, of course, be implemented more efficiently by a compiler. <procedure>(take x i) -> list</procedure><br> <procedure>(drop x i) -> object</procedure><br> -take returns the first i elements of list x. -drop returns all but the first i elements of list x. +{{take}} returns the first I elements of list X. {{drop}} returns all but +the first I elements of list X. + (take '(a b c d e) 2) => (a b) (drop '(a b c d e) 2) => (c d e) -x may be any value -- a proper, circular, or dotted list: +X may be any value -- a proper, circular, or dotted list: + (take '(1 2 3 . d) 2) => (1 2) (drop '(1 2 3 . d) 2) => (3 . d) (take '(1 2 3 . d) 3) => (1 2 3) (drop '(1 2 3 . d) 3) => d -For a legal i, take and drop partition the list in a manner which -can be inverted with append: - (append (take x i) (drop x i)) = x +For a legal I, {{take}} and {{drop}} partition the list in a manner which +can be inverted with {{append}}: + + + (append (take X I) (drop X I)) = X + -drop is exactly equivalent to performing i cdr operations on x; the -returned value shares a common tail with x. If the argument is a -list of non-zero length, take is guaranteed to return a -freshly-allocated list, even in the case where the entire list is -taken, e.g. (take lis (length lis)). +{{drop}} is exactly equivalent to performing I cdr operations on X; the +returned value shares a common tail with X. If the argument is a list +of non-zero length, {{take}} is guaranteed to return a freshly-allocated +list, even in the case where the entire list is taken, ''e.g.'' {{(take lis +(length lis))}}. <procedure>(take-right flist i) -> object</procedure><br> <procedure>(drop-right flist i) -> list</procedure><br> -take-right returns the last i elements of flist. -drop-right returns all but the last i elements of flist. +{{take-right}} returns the last I elements of FLIST. {{drop-right}} returns +all but the last I elements of FLIST. + (take-right '(a b c d e) 2) => (d e) (drop-right '(a b c d e) 2) => (a b c) The returned list may share a common tail with the argument list. -flist may be any finite list, either proper or dotted: +FLIST may be any finite list, either proper or dotted: + (take-right '(1 2 3 . d) 2) => (2 3 . d) (drop-right '(1 2 3 . d) 2) => (1) (take-right '(1 2 3 . d) 0) => d (drop-right '(1 2 3 . d) 0) => (1 2 3) -For a legal i, take-right and drop-right partition the list in a -manner which can be inverted with append: - (append (take flist i) (drop flist i)) = flist +For a legal I, {{take-right}} and {{drop-right}} partition the list in a +manner which can be inverted with {{append}}: + + + (append (take FLIST I) (drop FLIST I)) = FLIST + -take-right's return value is guaranteed to share a common tail with -flist. If the argument is a list of non-zero length, drop-right is -guaranteed to return a freshly-allocated list, even in the case -where nothing is dropped, e.g. (drop-right lis 0). +{{take-right}}'s return value is guaranteed to share a common tail with +FLIST. If the argument is a list of non-zero length, {{drop-right}} is +guaranteed to return a freshly-allocated list, even in the case where +nothing is dropped, ''e.g.'' {{(drop-right lis 0)}}. <procedure>(take! x i) -> list</procedure><br> <procedure>(drop-right! flist i) -> list</procedure><br> -take! and drop-right! are "linear-update" variants of take and -drop-right: the procedure is allowed, but not required, to alter -the argument list to produce the result. +{{take!}} and {{drop-right!}} are "linear-update" variants of {{take}} and +{{drop-right}}: the procedure is allowed, but not required, to alter the +argument list to produce the result. + +If X is circular, {{take!}} may return a shorter-than-expected list: -If x is circular, take! may return a shorter-than-expected list: (take! (circular-list 1 3 5) 8) => (1 3) (take! (circular-list 1 3 5) 8) => (1 3 5 1 3 5 1 3) -<procedure>(split-at x i) -> [list object]</procedure><br> +<procedure>(split-at x i) -> [list object]</procedure><br> <procedure>(split-at! x i) -> [list object]</procedure><br> -split-at splits the list x at index i, returning a list of the -first i elements, and the remaining tail. It is equivalent to +{{split-at}} splits the list X at index I, returning a list of the first I +elements, and the remaining tail. It is equivalent to + (values (take x i) (drop x i)) -split-at! is the linear-update variant. It is allowed, but not +{{split-at!}} is the linear-update variant. It is allowed, but not required, to alter the argument list to produce the result. + (split-at '(a b c d e f g h) 3) => - (a b c) - (d e f g h) + (a b c) + (d e f g h) <procedure>(last pair) -> object</procedure><br> <procedure>(last-pair pair) -> pair</procedure><br> -last returns the last element of the non-empty, finite list pair. -last-pair returns the last pair in the non-empty, finite list pair. +{{last}} returns the last element of the non-empty, finite list PAIR. +{{last-pair}} returns the last pair in the non-empty, finite list PAIR. + (last '(a b c)) => c (last-pair '(a b c)) => (c) + === Miscellaneous -<procedure>(length list) -> integer</procedure><br> <procedure>(length+ clist) -> integer or #f</procedure><br> -Both length and length+ return the length of the argument. It is an -error to pass a value to length which is not a proper list (finite -and nil-terminated). In particular, this means an implementation -may diverge or signal an error when length is applied to a circular -list. +Both {{length}} and {{length+}} return the length of the argument. It is an +error to pass a value to {{length}} which is not a proper list (finite and +nil-terminated). In particular, this means an implementation may diverge or +signal an error when {{length}} is applied to a circular list. -length+, on the other hand, returns #F when applied to a circular +{{length+}}, on the other hand, returns {{#F}} when applied to a circular list. -The length of a proper list is a non-negative integer n such that -cdr applied n times to the list produces the empty list. +The length of a proper list is a non-negative integer N such that {{cdr}} +applied N times to the list produces the empty list. -<procedure>(append! list[1] ...) -> list</procedure><br> +<procedure>(append! list_1 ...) -> list</procedure><br> -append! is the "linear-update" variant of append -- it is allowed, -but not required, to alter cons cells in the argument lists to -construct the result list. The last argument is never altered; the -result list shares structure with this parameter. +{{append!}} is the "linear-update" variant of {{append}} -- it is allowed, +but not required, to alter cons cells in the argument lists to construct +the result list. The last argument is never altered; the result list shares +structure with this parameter. -<procedure>(concatenate list-of-lists) -> value</procedure><br> +<procedure>(concatenate list-of-lists) -> value</procedure><br> <procedure>(concatenate! list-of-lists) -> value</procedure><br> -These functions append the elements of their argument together. -That is, concatenate returns +These functions append the elements of their argument together. That is, +{{concatenate}} returns + (apply append list-of-lists) or, equivalently, + (reduce-right append '() list-of-lists) -concatenate! is the linear-update variant, defined in terms of -append! instead of append. +{{concatenate!}} is the linear-update variant, defined in terms of +{{append!}} instead of {{append}}. -Note that some Scheme implementations do not support passing more -than a certain number (e.g., 64) of arguments to an n-ary -procedure. In these implementations, the (apply append ...) idiom -would fail when applied to long lists, but concatenate would -continue to function properly. +Note that some Scheme implementations do not support passing more than a +certain number (''e.g.'', 64) of arguments to an n-ary procedure. In these +implementations, the {{(apply append ...)}} idiom would fail when applied +to long lists, but {{concatenate}} would continue to function properly. -As with append and append!, the last element of the input list may +As with {{append}} and {{append!}}, the last element of the input list may be any value at all. <procedure>(reverse! list) -> list</procedure><br> -reverse! is the linear-update variant of reverse. It is permitted, +{{reverse!}} is the linear-update variant of {{reverse}}. It is permitted, but not required, to alter the argument's cons cells to produce the reversed list. -<procedure>(append-reverse rev-head tail) -> list</procedure><br> +<procedure>(append-reverse rev-head tail) -> list</procedure><br> <procedure>(append-reverse! rev-head tail) -> list</procedure><br> -append-reverse returns (append (reverse rev-head) tail). It is -provided because it is a common operation -- a common -list-processing style calls for this exact operation to transfer -values accumulated in reverse order onto the front of another list, -and because the implementation is significantly more efficient than -the simple composition it replaces. (But note that this pattern of -iterative computation followed by a reverse can frequently be -rewritten as a recursion, dispensing with the reverse and -append-reverse steps, and shifting temporary, intermediate storage -from the heap to the stack, which is typically a win for reasons of -cache locality and eager storage reclamation.) - -append-reverse! is just the linear-update variant -- it is allowed, -but not required, to alter rev-head's cons cells to construct the -result. +{{append-reverse}} returns {{(append (reverse REV-HEAD) TAIL)}}. It is +provided because it is a common operation -- a common list-processing style +calls for this exact operation to transfer values accumulated in reverse +order onto the front of another list, and because the implementation is +significantly more efficient than the simple composition it replaces. (But +note that this pattern of iterative computation followed by a reverse can +frequently be rewritten as a recursion, dispensing with the {{reverse}} +and {{append-reverse}} steps, and shifting temporary, intermediate storage +from the heap to the stack, which is typically a win for reasons of cache +locality and eager storage reclamation.) + +{{append-reverse!}} is just the linear-update variant -- it is allowed, but +not required, to alter REV-HEAD's cons cells to construct the result. + +<procedure>(zip clist_1 clist_2 ...) -> list</procedure><br> -<procedure>(zip clist[1] clist[2] ...) -> list</procedure><br> (lambda lists (apply map list lists)) -If zip is passed n lists, it returns a list as long as the shortest -of these lists, each element of which is an n-element list -comprised of the corresponding elements from the parameter lists. +If {{zip}} is passed N lists, it returns a list as long as the shortest of +these lists, each element of which is an N-element list comprised of the +corresponding elements from the parameter lists. - (zip '(one two three) - '(1 2 3) - '(odd even odd even odd even odd even)) - => ((one 1 odd) (two 2 even) (three 3 odd)) - + + (zip '(one two three) + '(1 2 3) + '(odd even odd even odd even odd even)) + => ((one 1 odd) (two 2 even) (three 3 odd)) + (zip '(1 2 3)) => ((1) (2) (3)) + At least one of the argument lists must be finite: - (zip '(3 1 4 1) (circular-list #f #t)) - => ((3 #f) (1 #t) (4 #f) (1 #t)) + + (zip '(3 1 4 1) (circular-list #f #t)) + => ((3 #f) (1 #t) (4 #f) (1 #t)) <procedure>(unzip1 list) -> list</procedure><br> <procedure>(unzip2 list) -> [list list]</procedure><br> @@ -375,186 +441,211 @@ At least one of the argument lists must be finite: <procedure>(unzip4 list) -> [list list list list]</procedure><br> <procedure>(unzip5 list) -> [list list list list list]</procedure><br> -unzip1 takes a list of lists, where every list must contain at -least one element, and returns a list containing the initial -element of each such list. That is, it returns (map car lists). -unzip2 takes a list of lists, where every list must contain at -least two elements, and returns two values: a list of the first -elements, and a list of the second elements. unzip3 does the same -for the first three elements of the lists, and so forth. +{{unzip1}} takes a list of lists, where every list must contain at least +one element, and returns a list containing the initial element of each such +list. That is, it returns {{(map car lists)}}. {{unzip2}} takes a list of +lists, where every list must contain at least two elements, and returns two +values: a list of the first elements, and a list of the second elements. +{{unzip3}} does the same for the first three elements of the lists, and so +forth. + (unzip2 '((1 one) (2 two) (3 three))) => - (1 2 3) - (one two three) + (1 2 3) + (one two three) + +<procedure>(count pred clist_1 clist_2) -> integer</procedure><br> -<procedure>(count pred clist[1] clist[2]) -> integer</procedure><br> +PRED is a procedure taking as many arguments as there are lists and +returning a single value. It is applied element-wise to the elements of +the LISTs, and a count is tallied of the number of elements that produce a +true value. This count is returned. {{count}} is "iterative" in that it is +guaranteed to apply PRED to the LIST elements in a left-to-right order. The +counting stops when the shortest list expires. -pred is a procedure taking as many arguments as there are lists and -returning a single value. It is applied element-wise to the -elements of the lists, and a count is tallied of the number of -elements that produce a true value. This count is returned. count -is "iterative" in that it is guaranteed to apply pred to the list -elements in a left-to-right order. The counting stops when the -shortest list expires. (count even? '(3 1 4 1 5 9 2 5 6)) => 3 (count < '(1 2 4 8) '(2 4 6 8 10 12 14 16)) => 3 + At least one of the argument lists must be finite: + (count < '(3 1 4 1) (circular-list 1 10)) => 2 + === Fold, unfold & map -<procedure>(fold kons knil clist[1] clist[2] ...) -> value</procedure><br> +<procedure>(fold kons knil clist_1 clist_2 ...) -> value</procedure><br> The fundamental list iterator. -First, consider the single list-parameter case. If -clist[1] = (e[1] e[2] ... e[n]), then this procedure returns +First, consider the single list-parameter case. If CLIST_1 = (E_1 E_2 ... +E_N), then this procedure returns - (kons e[n] ... (kons e[2] (kons e[1] knil)) ... ) +{{(KONS E_N ... (KONS E_2 (KONS E_1 KNIL)) ... )}} That is, it obeys the (tail) recursion - (fold kons knil lis) = (fold kons (kons (car lis) knil) (cdr lis)) - (fold kons knil '()) = knil + + (fold KONS KNIL LIS) = (fold KONS (KONS (car LIS) KNIL) (cdr LIS)) + (fold KONS KNIL '()) = KNIL + Examples: - (fold + 0 lis) ; Add up the elements of LIS. - (fold cons '() lis) ; Reverse LIS. - (fold cons tail rev-head) ; See APPEND-REVERSE. + (fold + 0 lis) ; Add up the elements of LIS. + + (fold cons '() lis) ; Reverse LIS. + + (fold cons tail rev-head) ; See APPEND-REVERSE. + ;; How many symbols in LIS? (fold (lambda (x count) (if (symbol? x) (+ count 1) count)) - 0 - lis) - + 0 + lis) + ;; Length of the longest string in LIS: (fold (lambda (s max-len) (max max-len (string-length s))) - 0 - lis) + 0 + lis) + +If N list arguments are provided, then the KONS function must take N+1 +parameters: one element from each list, and the "seed" or fold state, which +is initially KNIL. The fold operation terminates when the shortest list +runs out of values: -If n list arguments are provided, then the kons function must take -n+1 parameters: one element from each list, and the "seed" or fold -state, which is initially knil. The fold operation terminates when -the shortest list runs out of values: (fold cons* '() '(a b c) '(1 2 3 4 5)) => (c 3 b 2 a 1) At least one of the list arguments must be finite. -<procedure>(fold-right kons knil clist[1] clist[2] ...) -> value</procedure><br> +<procedure>(fold-right kons knil clist_1 clist_2 ...) -> value</procedure><br> The fundamental list recursion operator. -First, consider the single list-parameter case. If -clist[1] = (e[1] e[2] ... e[n]), then this procedure returns +First, consider the single list-parameter case. If CLIST_1 = {{(E_1 E_2 ... +E_N)}}, then this procedure returns - (kons e[1] (kons e[2] ... (kons e[n] knil))) +{{ (KONS E_1 (KONS E_2 ... (KONS E_N KNIL))) }} That is, it obeys the recursion - (fold-right kons knil lis) = (kons (car lis) (fold-right kons knil (cdr lis))) - (fold-right kons knil '()) = knil + + (fold-right KONS KNIL LIS) = (KONS (car LIS) (fold-right KONS KNIL (cdr LIS))) + (fold-right KONS KNIL '()) = KNIL + Examples: - (fold-right cons '() lis) ; Copy LIS. - + + (fold-right cons '() lis) ; Copy LIS. + ;; Filter the even numbers out of LIS. (fold-right (lambda (x l) (if (even? x) (cons x l) l)) '() lis)) -If n list arguments are provided, then the kons function must take -n+1 parameters: one element from each list, and the "seed" or fold -state, which is initially knil. The fold operation terminates when -the shortest list runs out of values: +If N list arguments are provided, then the KONS function must take N+1 +parameters: one element from each list, and the "seed" or fold state, which +is initially KNIL. The fold operation terminates when the shortest list +runs out of values: + (fold-right cons* '() '(a b c) '(1 2 3 4 5)) => (a 1 b 2 c 3) At least one of the list arguments must be finite. -<procedure>(pair-fold kons knil clist[1] clist[2] ...) -> value</procedure><br> +<procedure>(pair-fold kons knil clist_1 clist_2 ...) -> value</procedure><br> + +Analogous to {{fold}}, but KONS is applied to successive sublists of the +lists, rather than successive elements -- that is, KONS is applied to the +pairs making up the lists, giving this (tail) recursion: + -Analogous to fold, but kons is applied to successive sublists of -the lists, rather than successive elements -- that is, kons is -applied to the pairs making up the lists, giving this (tail) -recursion: + (pair-fold KONS KNIL LIS) = (let ((tail (cdr LIS))) + (pair-fold KONS (KONS LIS KNIL) tail)) + (pair-fold KONS KNIL {{'()}}) = KNIL - (pair-fold kons knil lis) = (let ((tail (cdr lis))) - (pair-fold kons (kons lis knil) tail)) - (pair-fold kons knil '()) = knil -For finite lists, the kons function may reliably apply set-cdr! to -the pairs it is given without altering the sequence of execution. +For finite lists, the KONS function may reliably apply {{set-cdr!}} to the +pairs it is given without altering the sequence of execution. Example: + ;;; Destructively reverse a list. (pair-fold (lambda (pair tail) (set-cdr! pair tail) pair) '() lis)) At least one of the list arguments must be finite. -<procedure>(pair-fold-right kons knil clist[1] clist[2] ...) -> value</procedure><br> +<procedure>(pair-fold-right kons knil clist_1 clist_2 ...) -> value</procedure><br> + +Holds the same relationship with {{fold-right}} that {{pair-fold}} holds +with {{fold}}. Obeys the recursion + -Holds the same relationship with fold-right that pair-fold holds -with fold. Obeys the recursion + (pair-fold-right KONS KNIL LIS) = + (KONS LIS (pair-fold-right KONS KNIL (cdr LIS))) + (pair-fold-right KONS KNIL {{'()}}) = KNIL - (pair-fold-right kons knil lis) = - (kons lis (pair-fold-right kons knil (cdr lis))) - (pair-fold-right kons knil '()) = knil Example: + (pair-fold-right cons '() '(a b c)) => ((a b c) (b c) (c)) At least one of the list arguments must be finite. <procedure>(reduce f ridentity list) -> value</procedure><br> -reduce is a variant of fold. +{{reduce}} is a variant of {{fold}}. + +RIDENTITY should be a "right identity" of the procedure F -- that is, for +any value X acceptable to F, -ridentity should be a "right identity" of the procedure f -- that -is, for any value x acceptable to f, - (f x ridentity) = x + (F X RIDENTITY) = X -reduce has the following definition: +{{reduce}} has the following definition: - If list = (), return ridentity; - Otherwise, return (fold f (car list) (cdr list)). +If LIST = (), return RIDENTITY; Otherwise, return {{(fold F (car LIST) (cdr +LIST))}}. -...in other words, we compute (fold f ridentity list). +...in other words, we compute {{(fold F RIDENTITY LIST)}}. -Note that ridentity is used only in the empty-list case. You -typically use reduce when applying f is expensive and you'd like to -avoid the extra application incurred when fold applies f to the -head of list and the identity value, redundantly producing the same -value passed in to f. For example, if f involves searching a file -directory or performing a database query, this can be significant. -In general, however, fold is useful in many contexts where reduce -is not (consider the examples given in the fold definition -- only -one of the five folds uses a function with a right identity. The -other four may not be performed with reduce). +Note that RIDENTITY is used ''only'' in the empty-list case. You typically +use {{reduce}} when applying F is expensive and you'd like to avoid the +extra application incurred when {{fold}} applies F to the head of LIST +and the identity value, redundantly producing the same value passed in +to F. For example, if F involves searching a file directory or performing +a database query, this can be significant. In general, however, {{fold}} +is useful in many contexts where {{reduce}} is not (consider the examples +given in the {{fold}} definition -- only one of the five folds uses a +function with a right identity. The other four may not be performed with +{{reduce}}). + +Note: MIT Scheme and Haskell flip F's arg order for their {{reduce}} and +{{fold}} functions. -Note: MIT Scheme and Haskell flip F's arg order for their reduce -and fold functions. ;; Take the max of a list of non-negative integers. (reduce max 0 nums) ; i.e., (apply max 0 nums) <procedure>(reduce-right f ridentity list) -> value</procedure><br> -reduce-right is the fold-right variant of reduce. It obeys the +{{reduce-right}} is the fold-right variant of {{reduce}}. It obeys the following definition: - (reduce-right f ridentity '()) = ridentity - (reduce-right f ridentity '(e[1])) = (f e[1] ridentity) = e[1] - (reduce-right f ridentity '(e[1] e[2] ...)) = - (f e[1] (reduce f ridentity (e[2] ...))) -...in other words, we compute (fold-right f ridentity list). + (reduce-right F RIDENTITY '()) = RIDENTITY + (reduce-right F RIDENTITY '(E_1)) = (F E_1 RIDENTITY) = E_1 + + (reduce-right F RIDENTITY '(E_1 E_2 ...)) = + (F E_1 (reduce F RIDENTITY (E_2 ...))) + + +...in other words, we compute {{(fold-right F RIDENTITY LIST)}}. + ;; Append a bunch of lists together. ;; I.e., (apply append list-of-lists) @@ -562,773 +653,868 @@ following definition: <procedure>(unfold p f g seed [tail-gen]) -> list</procedure><br> -unfold is best described by its basic recursion: +{{unfold}} is best described by its basic recursion: + - (unfold p f g seed) = - (if (p seed) (tail-gen seed) - (cons (f seed) - (unfold p f g (g seed)))) + (unfold P F G SEED) = + (if (P SEED) (TAIL-GEN SEED) + (cons (F SEED) + (unfold P F G (G SEED)))) -; p : Determines when to stop unfolding. -; f : Maps each seed value to the corresponding list element. -; g : Maps each seed value to next seed value. -; seed : The "state" value for the unfold. -; tail-gen : Creates the tail of the list; defaults to (lambda (x) '()) -In other words, we use g to generate a sequence of seed values - seed, g(seed), g^2(seed), g^3(seed), ... -These seed values are mapped to list elements by f, producing the -elements of the result list in a left-to-right order. P says when -to stop. +; P : Determines when to stop unfolding. +; F : Maps each seed value to the corresponding list element. +; G : Maps each seed value to next seed value. +; SEED : The "state" value for the unfold. +; TAIL-GEN : Creates the tail of the list; defaults to {{(lambda (x) '())}} + +In other words, we use G to generate a sequence of seed values + +SEED, G(SEED), G^2(SEED), G^3(SEED), ... + +These seed values are mapped to list elements by F, producing the elements +of the result list in a left-to-right order. P says when to stop. + +{{unfold}} is the fundamental recursive list constructor, just as +{{fold-right}} is the fundamental recursive list consumer. While {{unfold}} +may seem a bit abstract to novice functional programmers, it can be used in +a number of ways: -unfold is the fundamental recursive list constructor, just as -fold-right is the fundamental recursive list consumer. While unfold -may seem a bit abstract to novice functional programmers, it can be -used in a number of ways: ;; List of squares: 1^2 ... 10^2 (unfold (lambda (x) (> x 10)) - (lambda (x) (* x x)) - (lambda (x) (+ x 1)) - 1) - + (lambda (x) (* x x)) + (lambda (x) (+ x 1)) + 1) + (unfold null-list? car cdr lis) ; Copy a proper list. - + ;; Read current input port into a list of values. (unfold eof-object? values (lambda (x) (read)) (read)) - + ;; Copy a possibly non-proper list: - (unfold not-pair? car cdr lis - values) - + (unfold not-pair? car cdr lis + values) + ;; Append HEAD onto TAIL: - (unfold null-list? car cdr head - (lambda (x) tail)) + (unfold null-list? car cdr head + (lambda (x) tail)) + +Interested functional programmers may enjoy noting that {{fold-right}} and +{{unfold}} are in some sense inverses. That is, given operations KNULL?, +KAR, KDR, KONS, and KNIL satisfying -Interested functional programmers may enjoy noting that fold-right -and unfold are in some sense inverses. That is, given operations -knull?, kar, kdr, kons, and knil satisfying - (kons (kar x) (kdr x)) = x and (knull? knil) = #t +{{(KONS (KAR X) (KDR X))}} = {{x}} and {{(KNULL? KNIL)}} = {{#t}} then - (fold-right kons knil (unfold knull? kar kdr x)) = x + +{{(fold-right KONS KNIL (unfold KNULL? KAR KDR X))}} = X and - (unfold knull? kar kdr (fold-right kons knil x)) = x -This combinator sometimes is called an "anamorphism;" when an -explicit tail-gen procedure is supplied, it is called an -"apomorphism." +{{(unfold KNULL? KAR KDR (fold-right KONS KNIL X))}} = X. + +This combinator sometimes is called an "anamorphism;" when an explicit +TAIL-GEN procedure is supplied, it is called an "apomorphism." <procedure>(unfold-right p f g seed [tail]) -> list</procedure><br> -unfold-right constructs a list with the following loop: +{{unfold-right}} constructs a list with the following loop: + (let lp ((seed seed) (lis tail)) - (if (p seed) lis - (lp (g seed) - (cons (f seed) lis)))) + (if (p seed) lis + (lp (g seed) + (cons (f seed) lis)))) + + +; P : Determines when to stop unfolding. +; F : Maps each seed value to the corresponding list element. +; G : Maps each seed value to next seed value. +; SEED : The "state" value for the unfold. +; TAIL : list terminator; defaults to {{'()}}. + +In other words, we use G to generate a sequence of seed values -; p : Determines when to stop unfolding. -; f : Maps each seed value to the corresponding list element. -; g : Maps each seed value to next seed value. -; seed : The "state" value for the unfold. -; tail : list terminator; defaults to '(). +SEED, G(SEED), G^2(SEED), G^3(SEED), ... -In other words, we use g to generate a sequence of seed values - seed, g(seed), g^2(seed), g^3(seed), ... +These seed values are mapped to list elements by F, producing the elements +of the result list in a right-to-left order. P says when to stop. -These seed values are mapped to list elements by f, producing the -elements of the result list in a right-to-left order. P says when -to stop. +{{unfold-right}} is the fundamental iterative list constructor, just as +{{fold}} is the fundamental iterative list consumer. While {{unfold-right}} +may seem a bit abstract to novice functional programmers, it can be used in +a number of ways: -unfold-right is the fundamental iterative list constructor, just as -fold is the fundamental iterative list consumer. While unfold-right -may seem a bit abstract to novice functional programmers, it can be -used in a number of ways: ;; List of squares: 1^2 ... 10^2 - (unfold-right zero? - (lambda (x) (* x x)) - (lambda (x) (- x 1)) - 10) - + (unfold-right zero? + (lambda (x) (* x x)) + (lambda (x) (- x 1)) + 10) + ;; Reverse a proper list. (unfold-right null-list? car cdr lis) - + ;; Read current input port into a list of values. (unfold-right eof-object? values (lambda (x) (read)) (read)) - + ;; (append-reverse rev-head tail) (unfold-right null-list? car cdr rev-head tail) -Interested functional programmers may enjoy noting that fold and -unfold-right are in some sense inverses. That is, given operations -knull?, kar, kdr, kons, and knil satisfying - (kons (kar x) (kdr x)) = x and (knull? knil) = #t +Interested functional programmers may enjoy noting that {{fold}} and +{{unfold-right}} are in some sense inverses. That is, given operations +KNULL?, KAR, KDR, KONS, and KNIL satisfying + +{{(KONS (KAR X) (KDR X))}} = {{x}} and {{(KNULL? KNIL)}} = {{#t}} then - (fold kons knil (unfold-right knull? kar kdr x)) = x + +{{(fold KONS KNIL (unfold-right KNULL? KAR KDR X))}} = X and - (unfold-right knull? kar kdr (fold kons knil x)) = x + +{{(unfold-right KNULL? KAR KDR (fold KONS KNIL X))}} = X. This combinator presumably has some pretentious mathematical name; interested readers are invited to communicate it to the author. -<procedure>(map proc clist[1] clist[2] ...) -> list</procedure><br> +<procedure>(map proc clist_1 clist_2 ...) -> list</procedure><br> This procedure is extended from its R5RS specification to allow the -arguments to be of unequal length; it terminates when the shortest -list runs out. +arguments to be of unequal length; it terminates when the shortest list +runs out. At least one of the argument lists must be finite: (map + '(3 1 4 1) (circular-list 1 0)) => (4 1 5 1) -<procedure>(for-each proc clist[1] clist[2] ...) -> unspecified</procedure><br> +<procedure>(for-each proc clist_1 clist_2 ...) -> unspecified</procedure><br> This procedure is extended from its R5RS specification to allow the -arguments to be of unequal length; it terminates when the shortest -list runs out. +arguments to be of unequal length; it terminates when the shortest list +runs out. At least one of the argument lists must be finite. -<procedure>(append-map f clist[1] clist[2] ...) -> value</procedure><br> -<procedure>(append-map! f clist[1] clist[2] ...) -> value</procedure><br> +<procedure>(append-map f clist_1 clist_2 ...) -> value</procedure><br> +<procedure>(append-map! f clist_1 clist_2 ...) -> value</procedure><br> Equivalent to - (apply append (map f clist[1] clist[2] ...)) + +{{ (apply append (map F CLIST_1 CLIST_2 ...)) }} + and - (apply append! (map f clist[1] clist[2] ...)) -Map f over the elements of the lists, just as in the map function. -However, the results of the applications are appended together to -make the final result. append-map uses append to append the results -together; append-map! uses append!. +{{ (apply append! (map F CLIST_1 CLIST_2 ...)) }} -The dynamic order in which the various applications of f are made -is not specified. +Map F over the elements of the lists, just as in the {{map}} function. +However, the results of the applications are appended together to make +the final result. {{append-map}} uses {{append}} to append the results +together; {{append-map!}} uses {{append!}}. + +The dynamic order in which the various applications of F are made is not +specified. Example: + (append-map! (lambda (x) (list x (- x))) '(1 3 8)) - => (1 -1 3 -3 8 -8) + => (1 -1 3 -3 8 -8) At least one of the list arguments must be finite. -<procedure>(map! f list[1] clist[2] ...) -> list</procedure><br> +<procedure>(map! f list_1 clist_2 ...) -> list</procedure><br> -Linear-update variant of map -- map! is allowed, but not required, -to alter the cons cells of list[1] to construct the result list. +Linear-update variant of {{map}} -- {{map!}} is allowed, but not required, +to alter the cons cells of LIST_1 to construct the result list. -The dynamic order in which the various applications of f are made -is not specified. In the n-ary case, clist[2], clist[3], ... must -have at least as many elements as list[1]. +The dynamic order in which the various applications of F are made is not +specified. In the n-ary case, CLIST_2, CLIST_3, ... must have at least as +many elements as LIST_1. -<procedure>(map-in-order f clist[1] clist[2] ...) -> list</procedure><br> +<procedure>(map-in-order f) clist_1 clist_2 ...) -> list</procedure><br> -A variant of the map procedure that guarantees to apply f across -the elements of the list[i] arguments in a left-to-right order. -This is useful for mapping procedures that both have side effects -and return useful values. +A variant of the {{map}} procedure that guarantees to apply F across the +elements of the LIST_I arguments in a left-to-right order. This is useful +for mapping procedures that both have side effects and return useful +values. At least one of the list arguments must be finite. -<procedure>(pair-for-each f clist[1] clist[2] ...) -> unspecific</procedure><br> +<procedure>(pair-for-each f clist_1 clist_2 ...) -> unspecific</procedure><br> -Like for-each, but f is applied to successive sublists of the -argument lists. That is, f is applied to the cons cells of the -lists, rather than the lists' elements. These applications occur in -left-to-right order. +Like {{for-each}}, but F is applied to successive sublists of the argument +lists. That is, F is applied to the cons cells of the lists, rather than +the lists' elements. These applications occur in left-to-right order. + +The F procedure may reliably apply {{set-cdr!}} to the pairs it is given +without altering the sequence of execution. -The f procedure may reliably apply set-cdr! to the pairs it is -given without altering the sequence of execution. (pair-for-each (lambda (pair) (display pair) (newline)) '(a b c)) ==> - (a b c) - (b c) - (c) + (a b c) + (b c) + (c) At least one of the list arguments must be finite. -<procedure>(filter-map f clist[1] clist[2] ...) -> list</procedure><br> +<procedure>(filter-map f clist_1 clist_2 ...) -> list</procedure><br> + +Like {{map}}, but only true values are saved. -Like map, but only true values are saved. (filter-map (lambda (x) (and (number? x) (* x x))) '(a 1 b 3 c 7)) - => (1 9 49) + => (1 9 49) -The dynamic order in which the various applications of f are made -is not specified. +The dynamic order in which the various applications of F are made is not +specified. At least one of the list arguments must be finite. + === Filtering & partitioning <procedure>(filter pred list) -> list</procedure><br> -Return all the elements of list that satisfy predicate pred. The -list is not disordered -- elements that appear in the result list -occur in the same order as they occur in the argument list. The -returned list may share a common tail with the argument list. The -dynamic order in which the various applications of pred are made is -not specified. +Return all the elements of LIST that satisfy predicate PRED. The list is +not disordered -- elements that appear in the result list occur in the same +order as they occur in the argument list. The returned list may share a +common tail with the argument list. The dynamic order in which the various +applications of PRED are made is not specified. + (filter even? '(0 7 8 8 43 -4)) => (0 8 8 -4) <procedure>(partition pred list) -> [list list]</procedure><br> -Partitions the elements of list with predicate pred, and returns -two values: the list of in-elements and the list of out-elements. -The list is not disordered -- elements occur in the result lists in -the same order as they occur in the argument list. The dynamic -order in which the various applications of pred are made is not -specified. One of the returned lists may share a common tail with -the argument list. +Partitions the elements of LIST with predicate PRED, and returns two +values: the list of in-elements and the list of out-elements. The list is +not disordered -- elements occur in the result lists in the same order as +they occur in the argument list. The dynamic order in which the various +applications of PRED are made is not specified. One of the returned lists +may share a common tail with the argument list. - (partition symbol? '(one 2 3 four five 6)) => - (one four five) - (2 3 6) + + (partition symbol? '(one 2 3 four five 6)) => + (one four five) + (2 3 6) <procedure>(remove pred list) -> list</procedure><br> -Returns list without the elements that satisfy predicate pred: +Returns LIST without the elements that satisfy predicate PRED: + (lambda (pred list) (filter (lambda (x) (not (pred x))) list)) -The list is not disordered -- elements that appear in the result -list occur in the same order as they occur in the argument list. -The returned list may share a common tail with the argument list. -The dynamic order in which the various applications of pred are -made is not specified. +The list is not disordered -- elements that appear in the result list occur +in the same order as they occur in the argument list. The returned list may +share a common tail with the argument list. The dynamic order in which the +various applications of PRED are made is not specified. + (remove even? '(0 7 8 8 43 -4)) => (7 43) -<procedure>(filter! pred list) -> list</procedure><br> +<procedure>(filter! pred list) -> list</procedure><br> <procedure>(partition! pred list) -> [list list]</procedure><br> -<procedure>(remove! pred list) -> list</procedure><br> +<procedure>(remove! pred list) -> list</procedure><br> + +Linear-update variants of {{filter}}, {{partition}} and {{remove}}. These +procedures are allowed, but not required, to alter the cons cells in the +argument list to construct the result lists. -Linear-update variants of filter, partition and remove. These -procedures are allowed, but not required, to alter the cons cells -in the argument list to construct the result lists. === Searching +The following procedures all search lists for a leftmost element satisfying +some criteria. This means they do not always examine the entire list; +thus, there is no efficient way for them to reliably detect and signal an +error when passed a dotted or circular list. Here are the general rules +describing how these procedures work when applied to different kinds of +lists: + + +; Proper lists : The standard, canonical behavior happens in this case. +; Dotted lists : It is an error to pass these procedures a dotted list that does not contain an element satisfying the search criteria. That is, it is an error if the procedure has to search all the way to the end of the dotted list. However, this SRFI does ''not'' specify anything at all about the behavior of these procedures when passed a dotted list containing an element satisfying the search criteria. It may finish successfully, signal an error, or perform some third action. Different implementations may provide different functionality in this case; code which is compliant with this SRFI may not rely on any particular behavior. Future SRFI's may refine SRFI-1 to define specific behavior in this case. In brief, SRFI-1 compliant code may not pass a dotted list argument to these procedures. +; Circular lists : It is an error to pass these procedures a circular list that does not contain an element satisfying the search criteria. Note that the procedure is not required to detect this case; it may simply diverge. It is, however, acceptable to search a circular list ''if the search is successful'' -- that is, if the list contains an element satisfying the search criteria. + +Here are some examples, using the {{find}} and {{any}} procedures as +canonical representatives: + + + ;; Proper list -- success + (find even? '(1 2 3)) => 2 + (any even? '(1 2 3)) => #t + + ;; proper list -- failure + (find even? '(1 7 3)) => #f + (any even? '(1 7 3)) => #f + + ;; Failure is error on a dotted list. + (find even? '(1 3 . x)) => error + (any even? '(1 3 . x)) => error + + ;; The dotted list contains an element satisfying the search. + ;; This case is not specified -- it could be success, an error, + ;; or some third possibility. + (find even? '(1 2 . x)) => error/undefined + (any even? '(1 2 . x)) => error/undefined ; success, error or other. + + ;; circular list -- success + (find even? (circular-list 1 6 3)) => 6 + (any even? (circular-list 1 6 3)) => #t + + ;; circular list -- failure is error. Procedure may diverge. + (find even? (circular-list 1 3)) => error + (any even? (circular-list 1 3)) => error + + <procedure>(find pred clist) -> value</procedure><br> -Return the first element of clist that satisfies predicate pred; -false if no element does. +Return the first element of CLIST that satisfies predicate PRED; false if +no element does. + (find even? '(3 1 4 1 5 9)) => 4 -Note that find has an ambiguity in its lookup semantics -- if find -returns #f, you cannot tell (in general) if it found a #f element -that satisfied pred, or if it did not find any element at all. In -many situations, this ambiguity cannot arise -- either the list -being searched is known not to contain any #f elements, or the list -is guaranteed to have an element satisfying pred. However, in cases -where this ambiguity can arise, you should use find-tail instead of -find -- find-tail has no such ambiguity: +Note that {{find}} has an ambiguity in its lookup semantics -- if {{find}} +returns {{#f}}, you cannot tell (in general) if it found a {{#f}} element +that satisfied PRED, or if it did not find any element at all. In many +situations, this ambiguity cannot arise -- either the list being searched +is known not to contain any {{#f}} elements, or the list is guaranteed to +have an element satisfying PRED. However, in cases where this ambiguity can +arise, you should use {{find-tail}} instead of {{find}} -- {{find-tail}} +has no such ambiguity: + (cond ((find-tail pred lis) => (lambda (pair) ...)) ; Handle (CAR PAIR) - (else ...)) ; Search failed. + (else ...)) ; Search failed. <procedure>(find-tail pred clist) -> pair or false</procedure><br> -Return the first pair of clist whose car satisfies pred. If no pair -does, return false. +Return the first pair of CLIST whose car satisfies PRED. If no pair does, +return false. -find-tail can be viewed as a general-predicate variant of the -member function. +{{find-tail}} can be viewed as a general-predicate variant of the +{{member}} function. Examples: + (find-tail even? '(3 1 37 -8 -5 0 0)) => (-8 -5 0 0) (find-tail even? '(3 1 37 -5)) => #f - + ;; MEMBER X LIS: (find-tail (lambda (elt) (equal? x elt)) lis) + In the circular-list case, this procedure "rotates" the list. -Find-tail is essentially drop-while, where the sense of the -predicate is inverted: Find-tail searches until it finds an element -satisfying the predicate; drop-while searches until it finds an -element that doesn't satisfy the predicate. +{{Find-tail}} is essentially {{drop-while}}, where the sense of the +predicate is inverted: {{Find-tail}} searches until it finds an element +satisfying the predicate; {{drop-while}} searches until it finds an element +that ''doesn't'' satisfy the predicate. -<procedure>(take-while pred clist) -> list</procedure><br> +<procedure>(take-while pred clist) -> list</procedure><br> <procedure>(take-while! pred clist) -> list</procedure><br> -Returns the longest initial prefix of clist whose elements all -satisfy the predicate pred. +Returns the longest initial prefix of CLIST whose elements all satisfy the +predicate PRED. -Take-while! is the linear-update variant. It is allowed, but not +{{Take-while!}} is the linear-update variant. It is allowed, but not required, to alter the argument list to produce the result. + (take-while even? '(2 18 3 10 22 9)) => (2 18) <procedure>(drop-while pred clist) -> list</procedure><br> -Drops the longest initial prefix of clist whose elements all -satisfy the predicate pred, and returns the rest of the list. +Drops the longest initial prefix of CLIST whose elements all satisfy the +predicate PRED, and returns the rest of the list. + (drop-while even? '(2 18 3 10 22 9)) => (3 10 22 9) + The circular-list case may be viewed as "rotating" the list. -<procedure>(span pred clist) -> [list clist]</procedure><br> -<procedure>(span! pred list ) -> [list list]</procedure><br> -<procedure>(break pred clist) -> [list clist]</procedure><br> -<procedure>(break! pred list ) -> [list list]</procedure><br> +<procedure>(span pred clist) -> [list clist]</procedure><br> +<procedure>(span! pred list) -> [list list]</procedure><br> +<procedure>(break pred clist) -> [list clist]</procedure><br> +<procedure>(break! pred list) -> [list list]</procedure><br> + +{{Span}} splits the list into the longest initial prefix whose elements all +satisfy PRED, and the remaining tail. {{Break}} inverts the sense of the +predicate: the tail commences with the first element of the input list that +satisfies the predicate. -Span splits the list into the longest initial prefix whose elements -all satisfy pred, and the remaining tail. Break inverts the sense -of the predicate: the tail commences with the first element of the -input list that satisfies the predicate. +In other words: {{span}} finds the intial span of elements satisfying PRED, +and {{break}} breaks the list at the first element satisfying PRED. -In other words: span finds the intial span of elements satisfying -pred, and break breaks the list at the first element satisfying -pred. +{{Span}} is equivalent to -Span is equivalent to - (values (take-while pred clist) - (drop-while pred clist)) + (values (take-while PRED CLIST) + (drop-while PRED CLIST)) -Span! and break! are the linear-update variants. They are allowed, +{{Span!}} and {{break!}} are the linear-update variants. They are allowed, but not required, to alter the argument list to produce the result. + (span even? '(2 18 3 10 22 9)) => - (2 18) - (3 10 22 9) - + (2 18) + (3 10 22 9) + (break even? '(3 1 4 1 5 9)) => - (3 1) - (4 1 5 9) + (3 1) + (4 1 5 9) -<procedure>(any pred clist[1] clist[2] ...) -> value</procedure><br> +<procedure>(any pred clist_1 clist_2 ...) -> value</procedure><br> -Applies the predicate across the lists, returning true if the -predicate returns true on any application. +Applies the predicate across the lists, returning true if the predicate +returns true on any application. -If there are n list arguments clist[1] ... clist[n], then pred must -be a procedure taking n arguments and returning a boolean result. +If there are N list arguments CLIST_1 ... CLIST_N, then PRED must be a +procedure taking N arguments and returning a boolean result. -any applies pred to the first elements of the clist[i] parameters. -If this application returns a true value, any immediately returns -that value. Otherwise, it iterates, applying pred to the second -elements of the clist[i] parameters, then the third, and so forth. -The iteration stops when a true value is produced or one of the -lists runs out of values; in the latter case, any returns #f. The -application of pred to the last element of the lists is a tail -call. +{{any}} applies PRED to the first elements of the CLIST_I parameters. If +this application returns a true value, {{any}} immediately returns that +value. Otherwise, it iterates, applying PRED to the second elements of the +CLIST_I parameters, then the third, and so forth. The iteration stops when +a true value is produced or one of the lists runs out of values; in the +latter case, {{any}} returns {{#f}}. The application of PRED to the last +element of the lists is a tail call. -Note the difference between find and any -- find returns the -element that satisfied the predicate; any returns the true value -that the predicate produced. +Note the difference between {{find}} and {{any}} -- {{find}} returns the +element that satisfied the predicate; {{any}} returns the true value that +the predicate produced. -Like every, any's name does not end with a question mark -- this is -to indicate that it does not return a simple boolean (#t or #f), +Like {{every}}, {{any}}'s name does not end with a question mark -- this +is to indicate that it does not return a simple boolean ({{#t}} or {{#f}}), but a general value. + (any integer? '(a 3 b 2.7)) => #t (any integer? '(a 3.1 b 2.7)) => #f (any < '(3 1 4 1 5) - '(2 7 1 8 2)) => #t + '(2 7 1 8 2)) => #t -<procedure>(every pred clist[1] clist[2] ...) -> value</procedure><br> +<procedure>(every pred clist_1 clist_2 ...) -> value</procedure><br> -Applies the predicate across the lists, returning true if the -predicate returns true on every application. +Applies the predicate across the lists, returning true if the predicate +returns true on every application. -If there are n list arguments clist[1] ... clist[n], then pred must -be a procedure taking n arguments and returning a boolean result. +If there are N list arguments CLIST_1 ... CLIST_N, then PRED must be a +procedure taking N arguments and returning a boolean result. -every applies pred to the first elements of the clist[i] -parameters. If this application returns false, every immediately -returns false. Otherwise, it iterates, applying pred to the second -elements of the clist[i] parameters, then the third, and so forth. -The iteration stops when a false value is produced or one of the -lists runs out of values. In the latter case, every returns the -true value produced by its final application of pred. The -application of pred to the last element of the lists is a tail +{{every}} applies PRED to the first elements of the CLIST_I parameters. +If this application returns false, {{every}} immediately returns false. +Otherwise, it iterates, applying PRED to the second elements of the CLIST_I +parameters, then the third, and so forth. The iteration stops when a false +value is produced or one of the lists runs out of values. In the latter +case, {{every}} returns the true value produced by its final application +of PRED. The application of PRED to the last element of the lists is a tail call. -If one of the clist[i] has no elements, every simply returns #t. +If one of the CLIST_I has no elements, {{every}} simply returns {{#t}}. -Like any, every's name does not end with a question mark -- this is -to indicate that it does not return a simple boolean (#t or #f), +Like {{any}}, {{every}}'s name does not end with a question mark -- this +is to indicate that it does not return a simple boolean ({{#t}} or {{#f}}), but a general value. -<procedure>(list-index pred clist[1] clist[2] ...) -> integer or false</procedure><br> +<procedure>(list-index pred clist_1 clist_2 ...) -> integer or false</procedure><br> -Return the index of the leftmost element that satisfies pred. +Return the index of the leftmost element that satisfies PRED. -If there are n list arguments clist[1] ... clist[n], then pred must -be a function taking n arguments and returning a boolean result. +If there are N list arguments CLIST_1 ... CLIST_N, then PRED must be a +function taking N arguments and returning a boolean result. -list-index applies pred to the first elements of the clist[i] -parameters. If this application returns true, list-index -immediately returns zero. Otherwise, it iterates, applying pred to -the second elements of the clist[i] parameters, then the third, and -so forth. When it finds a tuple of list elements that cause pred to -return true, it stops and returns the zero-based index of that -position in the lists. +{{list-index}} applies PRED to the first elements of the CLIST_I +parameters. If this application returns true, {{list-index}} immediately +returns zero. Otherwise, it iterates, applying PRED to the second elements +of the CLIST_I parameters, then the third, and so forth. When it finds a +tuple of list elements that cause PRED to return true, it stops and returns +the zero-based index of that position in the lists. + +The iteration stops when one of the lists runs out of values; in this case, +{{list-index}} returns {{#f}}. -The iteration stops when one of the lists runs out of values; in -this case, list-index returns #f. (list-index even? '(3 1 4 1 5 9)) => 2 (list-index < '(3 1 4 1 5 9 2 5 6) '(2 7 1 8 2)) => 1 (list-index = '(3 1 4 1 5 9 2 5 6) '(2 7 1 8 2)) => #f + <procedure>(member x list [=]) -> list</procedure><br> -member is extended from its R5RS definition to allow the client to -pass in an optional equality procedure = used to compare keys. +{{member}} is extended from its R5RS definition to allow the client to pass +in an optional equality procedure = used to compare keys. + +The comparison procedure is used to compare the elements E_I of LIST to the +key X in this way: -The comparison procedure is used to compare the elements e[i] of -list to the key x in this way: - (= x e[i]) ; list is (E1 ... En) +{{ (= X E_I) ; list is (E1 ... En) }} -That is, the first argument is always x, and the second argument is -one of the list elements. Thus one can reliably find the first -element of list that is greater than five with - (member 5 list <) +That is, the first argument is always X, and the second argument is one +of the list elements. Thus one can reliably find the first element of LIST +that is greater than five with {{(member 5 LIST <)}} Note that fully general list searching may be performed with the -find-tail and find procedures, e.g. +{{find-tail}} and {{find}} procedures, ''e.g.'' + (find-tail even? list) ; Find the first elt with an even key. + === Deletion -<procedure>(delete x list [=]) -> list</procedure><br> +<procedure>(delete x list [=]) -> list</procedure><br> <procedure>(delete! x list [=]) -> list</procedure><br> -delete uses the comparison procedure =, which defaults to equal?, -to find all elements of list that are equal to x, and deletes them -from list. The dynamic order in which the various applications of = -are made is not specified. +{{delete}} uses the comparison procedure =, which defaults to {{equal?}}, +to find all elements of LIST that are equal to X, and deletes them from +LIST. The dynamic order in which the various applications of = are made is +not specified. -The list is not disordered -- elements that appear in the result -list occur in the same order as they occur in the argument list. -The result may share a common tail with the argument list. +The list is not disordered -- elements that appear in the result list occur +in the same order as they occur in the argument list. The result may share +a common tail with the argument list. Note that fully general element deletion can be performed with the -remove and remove! procedures, e.g.: +{{remove}} and {{remove!}} procedures, ''e.g.'': + ;; Delete all the even elements from LIS: (remove even? lis) -The comparison procedure is used in this way: (= x e[i]). That is, -x is always the first argument, and a list element is always the -second argument. The comparison procedure will be used to compare -each element of list exactly once; the order in which it is applied -to the various e[i] is not specified. Thus, one can reliably remove -all the numbers greater than five from a list with - (delete 5 list <) +The comparison procedure is used in this way: {{(= X E_I)}}. That is, +X is always the first argument, and a list element is always the second +argument. The comparison procedure will be used to compare each element of +LIST exactly once; the order in which it is applied to the various E_I is +not specified. Thus, one can reliably remove all the numbers greater than +five from a list with {{(delete 5 list <)}} -delete! is the linear-update variant of delete. It is allowed, but -not required, to alter the cons cells in its argument list to -construct the result. +{{delete!}} is the linear-update variant of {{delete}}. It is allowed, but +not required, to alter the cons cells in its argument list to construct the +result. -<procedure>(delete-duplicates list [=]) -> list</procedure><br> +<procedure>(delete-duplicates list [=]) -> list</procedure><br> <procedure>(delete-duplicates! list [=]) -> list</procedure><br> -delete-duplicates removes duplicate elements from the list -argument. If there are multiple equal elements in the argument -list, the result list only contains the first or leftmost of these -elements in the result. The order of these surviving elements is -the same as in the original list -- delete-duplicates does not -disorder the list (hence it is useful for "cleaning up" association -lists). - -The = parameter is used to compare the elements of the list; it -defaults to equal?. If x comes before y in list, then the -comparison is performed (= x y). The comparison procedure will be -used to compare each pair of elements in list no more than once; -the order in which it is applied to the various pairs is not -specified. +{{delete-duplicates}} removes duplicate elements from the list argument. +If there are multiple equal elements in the argument list, the result list +only contains the first or leftmost of these elements in the result. The +order of these surviving elements is the same as in the original list -- +{{delete-duplicates}} does not disorder the list (hence it is useful for +"cleaning up" association lists). + +The = parameter is used to compare the elements of the list; it defaults to +{{equal?}}. If X comes before Y in LIST, then the comparison is performed +{{(= X Y)}}. The comparison procedure will be used to compare each pair of +elements in LIST no more than once; the order in which it is applied to the +various pairs is not specified. -Implementations of delete-duplicates are allowed to share common -tails between argument and result lists -- for example, if the list -argument contains only unique elements, it may simply return -exactly this list. +Implementations of {{delete-duplicates}} are allowed to share common tails +between argument and result lists -- for example, if the list argument +contains only unique elements, it may simply return exactly this list. -Be aware that, in general, delete-duplicates runs in time O(n^2) -for n-element lists. Uniquifying long lists can be accomplished in -O(n lg n) time by sorting the list to bring equal elements -together, then using a linear-time algorithm to remove equal -elements. Alternatively, one can use algorithms based on -element-marking, with linear-time results. +Be aware that, in general, {{delete-duplicates}} runs in time O(n^2) for +N-element lists. Uniquifying long lists can be accomplished in O(n lg n) +time by sorting the list to bring equal elements together, then using a +linear-time algorithm to remove equal elements. Alternatively, one can use +algorithms based on element-marking, with linear-time results. + +{{delete-duplicates!}} is the linear-update variant of +{{delete-duplicates}}; it is allowed, but not required, to alter the cons +cells in its argument list to construct the result. -delete-duplicates! is the linear-update variant of -delete-duplicates; it is allowed, but not required, to alter the -cons cells in its argument list to construct the result. (delete-duplicates '(a b a c a b c z)) => (a b c z) - + ;; Clean up an alist: (delete-duplicates '((a . 3) (b . 7) (a . 9) (c . 1)) - (lambda (x y) (eq? (car x) (car y)))) - => ((a . 3) (b . 7) (c . 1)) + (lambda (x y) (eq? (car x) (car y)))) + => ((a . 3) (b . 7) (c . 1)) + === Association lists An "association list" (or "alist") is a list of pairs. The car of each -pair contains a key value, and the cdr contains the associated data -value. They can be used to construct simple look-up tables in Scheme. -Note that association lists are probably inappropriate for -performance-critical use on large data; in these cases, hash tables or -some other alternative should be employed. +pair contains a key value, and the cdr contains the associated data value. +They can be used to construct simple look-up tables in Scheme. Note that +association lists are probably inappropriate for performance-critical use +on large data; in these cases, hash tables or some other alternative should +be employed. <procedure>(assoc key alist [=]) -> pair or #f</procedure><br> -assoc is extended from its R5RS definition to allow the client to -pass in an optional equality procedure = used to compare keys. +{{assoc}} is extended from its R5RS definition to allow the client to pass +in an optional equality procedure = used to compare keys. + +The comparison procedure is used to compare the elements E_I of LIST to the +KEY parameter in this way: -The comparison procedure is used to compare the elements e[i] of -list to the key parameter in this way: - (= key (car e[i])) ; list is (E1 ... En) -That is, the first argument is always key, and the second argument -is one of the list elements. Thus one can reliably find the first -entry of alist whose key is greater than five with - (assoc 5 alist <) +{{ (= KEY (car E_I)) ; list is (E1 ... En) }} + +That is, the first argument is always KEY, and the second argument is one +of the list elements. Thus one can reliably find the first entry of ALIST +whose key is greater than five with {{(assoc 5 ALIST <)}} Note that fully general alist searching may be performed with the -find-tail and find procedures, e.g. +{{find-tail}} and {{find}} procedures, ''e.g.'' + - ;; Look up the first association in alist with an even key: + ;; Look up the first association in ALIST with an even key: (find (lambda (a) (even? (car a))) alist) <procedure>(alist-cons key datum alist) -> alist</procedure><br> + (lambda (key datum alist) (cons (cons key datum) alist)) -Cons a new alist entry mapping key -> datum onto alist. +Cons a new alist entry mapping KEY -> DATUM onto ALIST. + <procedure>(alist-copy alist) -> alist</procedure><br> -Make a fresh copy of alist. This means copying each pair that forms -an association as well as the spine of the list, i.e. + +Make a fresh copy of ALIST. This means copying each pair that forms an +association as well as the spine of the list, ''i.e.'' + (lambda (a) (map (lambda (elt) (cons (car elt) (cdr elt))) a)) -<procedure>(alist-delete key alist [=]) -> alist</procedure><br> + +<procedure>(alist-delete key alist [=]) -> alist</procedure><br> <procedure>(alist-delete! key alist [=]) -> alist</procedure><br> -alist-delete deletes all associations from alist with the given -key, using key-comparison procedure =, which defaults to equal?. -The dynamic order in which the various applications of = are made -is not specified. +{{alist-delete}} deletes all associations from ALIST with the given KEY, +using key-comparison procedure =, which defaults to {{equal?}}. The dynamic +order in which the various applications of = are made is not specified. -Return values may share common tails with the alist argument. The -alist is not disordered -- elements that appear in the result alist -occur in the same order as they occur in the argument alist. +Return values may share common tails with the ALIST argument. The alist +is not disordered -- elements that appear in the result alist occur in the +same order as they occur in the argument alist. -The comparison procedure is used to compare the element keys k[i] -of alist's entries to the key parameter in this way: (= key k[i]). -Thus, one can reliably remove all entries of alist whose key is -greater than five with (alist-delete 5 alist <) +The comparison procedure is used to compare the element keys K_I of ALIST's +entries to the KEY parameter in this way: {{(= KEY K_I)}}. Thus, one can +reliably remove all entries of ALIST whose key is greater than five with +{{(alist-delete 5 ALIST <)}} + +{{alist-delete!}} is the linear-update variant of {{alist-delete}}. It is +allowed, but not required, to alter cons cells from the ALIST parameter to +construct the result. -alist-delete! is the linear-update variant of alist-delete. It is -allowed, but not required, to alter cons cells from the alist -parameter to construct the result. === Set operations on lists -Be aware that these procedures typically run in time O(n * m) for n- -and m-element list arguments. Performance-critical applications -operating upon large sets will probably wish to use other data -structures and algorithms. +These procedures implement operations on sets represented as lists of +elements. They all take an = argument used to compare elements of lists. +This equality procedure is required to be consistent with {{eq?}}. That is, +it must be the case that + +{{(eq? X Y)}} => {{(= X Y)}}. + +Note that this implies, in turn, that two lists that are {{eq?}} are also +set-equal by any legal comparison procedure. This allows for constant-time +determination of set operations on {{eq?}} lists. + +Be aware that these procedures typically run in time O(N * M) for N- and +M-element list arguments. Performance-critical applications operating upon +large sets will probably wish to use other data structures and algorithms. + +<procedure>(lset<= = list_1 ...) -> boolean</procedure><br> -<procedure>(lset<= = list[1] ...) -> boolean</procedure><br> +Returns true iff every LIST_I is a subset of LIST_I+1, using = for the +element-equality procedure. List A is a subset of list B if every element +in A is equal to some element of B. When performing an element comparison, +the = procedure's first argument is an element of A; its second, an element +of B. -Returns true iff every list[i] is a subset of list[i+1], using = -for the element-equality procedure. List A is a subset of list B if -every element in A is equal to some element of B. When performing -an element comparison, the = procedure's first argument is an -element of A; its second, an element of B. (lset<= eq? '(a) '(a b a) '(a b c c)) => #t + (lset<= eq?) => #t ; Trivial cases (lset<= eq? '(a)) => #t -<procedure>(lset= = list[1] list[2] ...) -> boolean</procedure><br> +<procedure>(lset= = list_1 list_2 ...) -> boolean</procedure><br> + +Returns true iff every LIST_I is set-equal to LIST_I+1, using = for the +element-equality procedure. "Set-equal" simply means that LIST_I is a +subset of LIST_I+1, and LIST_I+1 is a subset of LIST_I. The = procedure's +first argument is an element of LIST_I; its second is an element of +LIST_I+1. -Returns true iff every list[i] is set-equal to list[i+1], using = -for the element-equality procedure. "Set-equal" simply means that -list[i] is a subset of list[i+1], and list[i+1] is a subset of list -[i]. The = procedure's first argument is an element of list[i]; its -second is an element of list[i+1]. (lset= eq? '(b e a) '(a e b) '(e e b a)) => #t + (lset= eq?) => #t ; Trivial cases (lset= eq? '(a)) => #t -<procedure>(lset-adjoin = list elt[1] ...) -> list</procedure><br> +<procedure>(lset-adjoin = list elt_1 ...) -> list</procedure><br> + +Adds the ELT_I elements not already in the list parameter to the result +list. The result shares a common tail with the list parameter. The new +elements are added to the front of the list, but no guarantees are made +about their order. The = parameter is an equality procedure used to +determine if an ELT_I is already a member of LIST. Its first argument is an +element of LIST; its second is one of the ELT_I. -Adds the elt[i] elements not already in the list parameter to the -result list. The result shares a common tail with the list -parameter. The new elements are added to the front of the list, but -no guarantees are made about their order. The = parameter is an -equality procedure used to determine if an elt[i] is already a -member of list. Its first argument is an element of list; its -second is one of the elt[i]. +The list parameter is always a suffix of the result -- even if the list +parameter contains repeated elements, these are not reduced. -The list parameter is always a suffix of the result -- even if the -list parameter contains repeated elements, these are not reduced. (lset-adjoin eq? '(a b c d c e) 'a 'e 'i 'o 'u) => (u o i a b c d c e) -<procedure>(lset-union = list[1] ...) -> list</procedure><br> +<procedure>(lset-union = list_1 ...) -> list</procedure><br> -Returns the union of the lists, using = for the element-equality -procedure. +Returns the union of the lists, using = for the element-equality procedure. The union of lists A and B is constructed as follows: -* If A is the empty list, the answer is B (or a copy of B). -* Otherwise, the result is initialised to be list A (or a copy of -A). -* Proceed through the elements of list B in a left-to-right -order. If b is such an element of B, compare every element r of -the current result list to b: (= r b). If all comparisons fail, -b is consed onto the front of the result. - -However, there is no guarantee that = will be applied to every pair -of arguments from A and B. In particular, if A is eq? to B, the -operation may immediately terminate. - -In the n-ary case, the two-argument list-union operation is simply -folded across the argument lists. - - (lset-union eq? '(a b c d e) '(a e i o u)) => - (u o i a b c d e) - + + +* If A is the empty list, the answer is B (or a copy of B). +* Otherwise, the result is initialised to be list A (or a copy of A). +* Proceed through the elements of list B in a left-to-right order. If B is such an element of B, compare every element R of the current result list to B: {{(= R B)}}. If all comparisons fail, B is consed onto the front of the result. + +However, there is no guarantee that = will be applied to every pair of +arguments from A and B. In particular, if A is {{eq}}? to B, the operation +may immediately terminate. + +In the n-ary case, the two-argument list-union operation is simply folded +across the argument lists. + + + (lset-union eq? '(a b c d e) '(a e i o u)) => + (u o i a b c d e) + ;; Repeated elements in LIST1 are preserved. (lset-union eq? '(a a c) '(x a x)) => (x a a c) - + ;; Trivial cases (lset-union eq?) => () (lset-union eq? '(a b c)) => (a b c) -<procedure>(lset-intersection = list[1] list[2] ...) -> list</procedure><br> +<procedure>(lset-intersection = list_1 list_2 ...) -> list</procedure><br> + +Returns the intersection of the lists, using = for the element-equality +procedure. -Returns the intersection of the lists, using = for the -element-equality procedure. +The intersection of lists A and B is comprised of every element of A that +is = to some element of B: {{(= A B)}}, for A in A, and B in B. Note this +implies that an element which appears in B and multiple times in list A +will also appear multiple times in the result. -The intersection of lists A and B is comprised of every element of -A that is = to some element of B: (= a b), for a in A, and b in B. -Note this implies that an element which appears in B and multiple -times in list A will also appear multiple times in the result. +The order in which elements appear in the result is the same as they appear +in LIST_1 -- that is, {{lset-intersection}} essentially filters LIST_1, +without disarranging element order. The result may share a common tail with +LIST_1. -The order in which elements appear in the result is the same as -they appear in list[1] -- that is, lset-intersection essentially -filters list[1], without disarranging element order. The result may -share a common tail with list[1]. +In the n-ary case, the two-argument list-intersection operation is simply +folded across the argument lists. However, the dynamic order in which the +applications of = are made is not specified. The procedure may check an +element of LIST_1 for membership in every other list before proceeding to +consider the next element of LIST_1, or it may completely intersect LIST_1 +and LIST_2 before proceeding to LIST_3, or it may go about its work in some +third order. -In the n-ary case, the two-argument list-intersection operation is -simply folded across the argument lists. However, the dynamic order -in which the applications of = are made is not specified. The -procedure may check an element of list[1] for membership in every -other list before proceeding to consider the next element of list -[1], or it may completely intersect list[1] and list[2] before -proceeding to list[3], or it may go about its work in some third -order. (lset-intersection eq? '(a b c d e) '(a e i o u)) => (a e) - + ;; Repeated elements in LIST1 are preserved. (lset-intersection eq? '(a x y a) '(x a x z)) => '(a x a) - + (lset-intersection eq? '(a b c)) => (a b c) ; Trivial case -<procedure>(lset-difference = list[1] list[2] ...) -> list</procedure><br> - -Returns the difference of the lists, using = for the -element-equality procedure -- all the elements of list[1] that are -not = to any element from one of the other list[i] parameters. - -The = procedure's first argument is always an element of list[1]; -its second is an element of one of the other list[i]. Elements that -are repeated multiple times in the list[1] parameter will occur -multiple times in the result. The order in which elements appear in -the result is the same as they appear in list[1] -- that is, -lset-difference essentially filters list[1], without disarranging -element order. The result may share a common tail with list[1]. The -dynamic order in which the applications of = are made is not -specified. The procedure may check an element of list[1] for -membership in every other list before proceeding to consider the -next element of list[1], or it may completely compute the -difference of list[1] and list[2] before proceeding to list[3], or -it may go about its work in some third order. +<procedure>(lset-difference = list_1 list_2 ...) -> list</procedure><br> + +Returns the difference of the lists, using = for the element-equality +procedure -- all the elements of LIST_1 that are not = to any element from +one of the other LIST_I parameters. + +The = procedure's first argument is always an element of LIST_1; its +second is an element of one of the other LIST_I. Elements that are repeated +multiple times in the LIST_1 parameter will occur multiple times in the +result. The order in which elements appear in the result is the same as +they appear in LIST_1 -- that is, {{lset-difference}} essentially filters +LIST_1, without disarranging element order. The result may share a common +tail with LIST_1. The dynamic order in which the applications of = are +made is not specified. The procedure may check an element of LIST_1 for +membership in every other list before proceeding to consider the next +element of LIST_1, or it may completely compute the difference of LIST_1 +and LIST_2 before proceeding to LIST_3, or it may go about its work in some +third order. + (lset-difference eq? '(a b c d e) '(a e i o u)) => (b c d) + (lset-difference eq? '(a b c)) => (a b c) ; Trivial case -<procedure>(lset-xor = list[1] ...) -> list</procedure><br> -Returns the exclusive-or of the sets, using = for the -element-equality procedure. If there are exactly two lists, this is -all the elements that appear in exactly one of the two lists. The -operation is associative, and thus extends to the n-ary case -- the -elements that appear in an odd number of the lists. The result may -share a common tail with any of the list[i] parameters. +<procedure>(lset-xor = list_1 ...) -> list</procedure><br> + +Returns the exclusive-or of the sets, using = for the element-equality +procedure. If there are exactly two lists, this is all the elements that +appear in exactly one of the two lists. The operation is associative, and +thus extends to the n-ary case -- the elements that appear in an odd number +of the lists. The result may share a common tail with any of the LIST_I +parameters. More precisely, for two lists A and B, A xor B is a list of -* every element a of A such that there is no element b of B such -that (= a b), and -* every element b of B such that there is no element a of A such -that (= b a). -However, an implementation is allowed to assume that = is symmetric-- -that is, that - (= a b) => (= b a). -This means, for example, that if a comparison (= a b) produces true -for some a in A and b in B, both a and b may be removed from -inclusion in the result. +* every element A of A such that there is no element B of B such that {{(= A B)}}, and +* every element B of B such that there is no element A of A such that {{(= B A)}}. + +However, an implementation is allowed to assume that = is symmetric -- that +is, that + +{{(= A B)}} => {{(= B A)}}. + +This means, for example, that if a comparison {{(= A B)}} produces true for +some A in A and B in B, both A and B may be removed from inclusion in the +result. + +In the n-ary case, the binary-xor operation is simply folded across the +lists. -In the n-ary case, the binary-xor operation is simply folded across -the lists. (lset-xor eq? '(a b c d e) '(a e i o u)) => (d c b i o u) - + ;; Trivial cases. (lset-xor eq?) => () (lset-xor eq? '(a b c d e)) => (a b c d e) -<procedure>(lset-diff+intersection = list[1] list[2] ...) -> [list list]</procedure><br> +<procedure>(lset-diff+intersection = list_1 list_2 ...) -> [list list]</procedure><br> + +Returns two values -- the difference and the intersection of the lists. Is +equivalent to -Returns two values -- the difference and the intersection of the -lists. Is equivalent to - (values (lset-difference = list[1] list[2] ...) - (lset-intersection = list[1] - (lset-union = list[2] ...))) + (values (lset-difference = LIST_1 LIST_2 ...) + (lset-intersection = LIST_1 + (lset-union = LIST_2 ...))) but can be implemented more efficiently. -The = procedure's first argument is an element of list[1]; its -second is an element of one of the other list[i]. +The = procedure's first argument is an element of LIST_1; its second is an +element of one of the other LIST_I. -Either of the answer lists may share a common tail with list[1]. -This operation essentially partitions list[1]. +Either of the answer lists may share a common tail with LIST_1. This +operation essentially partitions LIST_1. -<procedure>(lset-union! = list[1] ...) -> list</procedure><br> -<procedure>(lset-intersection! = list[1] list[2] ...) -> list</procedure><br> -<procedure>(lset-difference! = list[1] list[2] ...) -> list</procedure><br> -<procedure>(lset-xor! = list[1] ...) -> list</procedure><br> -<procedure>(lset-diff+intersection! = list[1] list[2] ...) -> [list list]</procedure><br> +<procedure>(lset-union! = list_1 ...) -> list</procedure><br> +<procedure>(lset-intersection! = list_1 list_2 ...) -> list</procedure><br> +<procedure>(lset-difference! = list_1 list_2 ...) -> list</procedure><br> +<procedure>(lset-xor! = list_1 ...) -> list</procedure><br> +<procedure>(lset-diff+intersection! = list_1 list_2 ...) -> [list list]</procedure><br> -These are linear-update variants. They are allowed, but not -required, to use the cons cells in their first list parameter to -construct their answer. lset-union! is permitted to recycle cons -cells from any of its list arguments. +These are linear-update variants. They are allowed, but not required, to +use the cons cells in their first list parameter to construct their answer. +{{lset-union!}} is permitted to recycle cons cells from ''any'' of its list +arguments. +---- ---- Previous: [[Unit regex]] Next: [[Unit srfi-4]] diff --git a/manual/Unit srfi-13 b/manual/Unit srfi-13 index 08ac36b3..81dd16fb 100644 --- a/manual/Unit srfi-13 +++ b/manual/Unit srfi-13 @@ -1,24 +1,1351 @@ [[tags: manual]] -== Unit srfi-13 - -String library, see the documentation for -[[http://srfi.schemers.org/srfi-13/srfi-13.html|SRFI-13]] +== Unit srfi-13 +SRFI 13 (string library). Certain procedures contained in this SRFI, +such as {{string-append}}, are identical to R5RS versions and are +omitted from this document. For full documentation, see the +[[http://srfi.schemers.org/srfi-13/srfi-13.html|original SRFI-13 +document]]. On systems that support dynamic loading, the {{srfi-13}} unit can -be made available in the interpreter ({{csi}}) by entering +be made available in the Chicken interpreter ({{csi}}) by entering <enscript highlight=scheme> (require-extension srfi-13) </enscript> The {{string-hash}} and {{string-hash-ci}} procedures are -not provided in this library unit, [[Unit srfi-69]] has +not provided in this library unit. [[Unit srfi-69]] has compatible definitions. +[[toc:]] + +== Notes + +=== Strings are code-point sequences + +This SRFI considers strings simply to be a sequence of "code points" or +character encodings. Operations such as comparison or reversal are always +done code point by code point. + +Chicken's native strings are simple byte sequences (not Unicode code points). +Comparison or reversal is done byte-wise. If Unicode semantics are +desired, see the [[utf8]] egg. + +=== Case mapping and case-folding + +Upper- and lower-casing characters is complex in super-ASCII encodings. +SRFI 13 makes no attempt to deal with these issues; it uses a simple 1-1 +locale- and context-independent case-mapping, specifically Unicode's 1-1 +case-mappings given in [[ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt]]. + +On Chicken, case-mapping is restricted to operate on ASCII characters. + +=== String equality & string normalisation + +SRFI 13 string equality is simply based upon comparing the encoding +values used for the characters. On Chicken, strings are compared +byte-wise. + +=== String inequality + +SRFI 13 string ordering is strictly based upon a +character-by-character comparison of the values used for representing +the string. + +=== Naming conventions + +* Procedures whose names end in "-ci" are case-insensitive variants. +* Procedures whose names end in "!" are side-effecting variants. What values these procedures return is usually not specified. +* The order of common parameters is consistent across the different procedures. +* Left/right/both directionality: Procedures that have left/right directional variants use the following convention: + +<table> +<tr><th>Direction</th><th>Suffix</th></tr> +<tr><td>left-to-right</td><td>''none''</td></tr> +<tr><td>right-to-left</td><td>{{-right}}</td></tr> +<tr><td>both</td><td>{{-both}}</td></tr></table> + +=== Shared storage + +Chicken does not currently have shared-text substrings, nor does its +implementation of SRFI 13 routines ever return one of the +strings that was passed in as a parameter, as is allowed by the +specification. + +On the other hand, the functionality is present to allow one to write +efficient code ''without'' shared-text substrings. You can write +efficient code that works by passing around start/end ranges indexing +into a string instead of simply building a shared-text substring. + +== Procedure Specification + +In the following procedure specifications: + + +* An S parameter is a string. +* A CHAR parameter is a character. +* START and END parameters are half-open string indices specifying a substring within a string parameter; when optional, they default to 0 and the length of the string, respectively. When specified, it must be the case that 0 <= START <= END <= {{(string-length S)}}, for the corresponding parameter S. They typically restrict a procedure's action to the indicated substring. +* A PRED parameter is a unary character predicate procedure, returning a true/false value when applied to a character. +* A CHAR/CHAR-SET/PRED parameter is a value used to select/search for a character in a string. If it is a character, it is used in an equality test; if it is a character set, it is used as a membership test; if it is a procedure, it is applied to the characters as a test predicate. +* An I parameter is an exact non-negative integer specifying an index into a string. +* LEN and NCHARS parameters are exact non-negative integers specifying a length of a string or some number of characters. +* An OBJ parameter may be any value at all. + +Passing values to procedures with these parameters that do not satisfy +these types is an error. + +Parameters given in square brackets are optional. Unless otherwise noted in +the text describing the procedure, any prefix of these optional parameters +may be supplied, from zero arguments to the full list. When a procedure +returns multiple values, this is shown by listing the return values in +square brackets, as well. So, for example, the procedure with signature + + + halts? F [X INIT-STORE] -> [BOOLEAN INTEGER] + +would take one (F), two (F, X) or three (F, X, INIT-STORE) input +parameters, and return two values, a boolean and an integer. + +A parameter followed by "{{...}}" means zero-or-more elements. So the +procedure with the signature + + + sum-squares X ... -> NUMBER + +takes zero or more arguments (X ...), while the procedure with signature + + + spell-check DOC DICT_1 DICT_2 ... -> STRING-LIST + + +takes two required parameters (DOC and DICT_1) and zero or more optional +parameters (DICT_2 ...). + +If a procedure is said to return "unspecified," this means that nothing +at all is said about what the procedure returns. Such a procedure is not +even required to be consistent from call to call. It is simply required to +return a value (or values) that may be passed to a command continuation, +''e.g.'' as the value of an expression appearing as a non-terminal +subform of a {{begin}} expression. Note that in R5RS, this restricts such +a procedure to returning a single value; non-R5RS systems may not even +provide this restriction. + + +=== Main procedures + +==== Predicates + +<procedure>(string-null? s) -> boolean</procedure><br> + +Is S the empty string? + +<procedure>(string-every char/char-set/pred s [start end]) -> value</procedure><br> +<procedure>(string-any char/char-set/pred s [start end]) -> value</procedure><br> + +Checks to see if the given criteria is true of every / any character in S, +proceeding from left (index START) to right (index END). + +If CHAR/CHAR-SET/PRED is a character, it is tested for equality with the +elements of S. + +If CHAR/CHAR-SET/PRED is a character set, the elements of S are tested for +membership in the set. + +If CHAR/CHAR-SET/PRED is a predicate procedure, it is applied to the +elements of S. The predicate is "witness-generating:" + + +* If {{string-any}} returns true, the returned true value is the one produced by the application of the predicate. +* If {{string-every}} returns true, the returned true value is the one produced by the final application of the predicate to S[END-1]. If {{string-every}} is applied to an empty sequence of characters, it simply returns {{#t}}. + +If {{string-every}} or {{string-any}} apply the predicate to the final +element of the selected sequence (''i.e.'', S[END-1]), that final +application is a tail call. + +The names of these procedures do not end with a question mark -- this is to +indicate that, in the predicate case, they do not return a simple boolean +({{#t}} or {{#f}}), but a general value. + + +==== Constructors + +<procedure>(string-tabulate proc len) -> string</procedure><br> + +PROC is an integer->char procedure. Construct a string of size LEN by +applying PROC to each index to produce the corresponding string element. +The order in which PROC is applied to the indices is not specified. + + +==== List & string conversion + +<procedure>(string->list s [start end]) -> char-list</procedure><br> + +{{string->list}} is extended from the R5RS definition to take optional +START/END arguments. + +<procedure>(reverse-list->string char-list) -> string</procedure><br> + +An efficient implementation of {{(compose list->string reverse)}}: + + + (reverse-list->string '(#\a #\B #\c)) -> "cBa" + +This is a common idiom in the epilog of string-processing loops +that accumulate an answer in a reverse-order list. (See also +{{string-concatenate-reverse}} for the "chunked" variant.) + +<procedure>(string-join string-list [delimiter grammar]) -> string</procedure><br> + +This procedure is a simple unparser --- it pastes strings together using +the delimiter string. + +The GRAMMAR argument is a symbol that determines how the delimiter is used, +and defaults to {{'infix}}. + + +* {{'infix}} means an infix or separator grammar: insert the delimiter between list elements. An empty list will produce an empty string -- note, however, that parsing an empty string with an infix or separator grammar is ambiguous. Is it an empty list, or a list of one element, the empty string? +* {{'strict-infix}} means the same as {{'infix}}, but will raise an error if given an empty list. +* {{'suffix}} means a suffix or terminator grammar: insert the delimiter after every list element. This grammar has no ambiguities. +* {{'prefix}} means a prefix grammar: insert the delimiter before every list element. This grammar has no ambiguities. + +The delimiter is the string used to delimit elements; it defaults to a +single space " ". + + + (string-join '("foo" "bar" "baz") ":") => "foo:bar:baz" + (string-join '("foo" "bar" "baz") ":" 'suffix) => "foo:bar:baz:" + + ;; Infix grammar is ambiguous wrt empty list vs. empty string, + (string-join '() ":") => "" + (string-join '("") ":") => "" + + ;; but suffix & prefix grammars are not. + (string-join '() ":" 'suffix) => "" + (string-join '("") ":" 'suffix) => ":" + + + +==== Selection + +<procedure>(string-copy s [start end]) -> string</procedure><br> +<procedure>(substring/shared s start [end]) -> string</procedure><br> + +[R5RS+] {{substring/shared}} returns a string whose contents are the +characters of S beginning with index START (inclusive) and ending with +index END (exclusive). It differs from the R5RS {{substring}} in two ways: + + +* The END parameter is optional, not required. +* {{substring/shared}} may return a value that shares memory with S or is {{eq?}} to S. + +{{string-copy}} is extended from its R5RS definition by the addition of its +optional START/END parameters. In contrast to {{substring/shared}}, it is +guaranteed to produce a freshly-allocated string. + +Use {{string-copy}} when you want to indicate explicitly in your code that +you wish to allocate new storage; use {{substring/shared}} when you don't +care if you get a fresh copy or share storage with the original string. + + + (string-copy "Beta substitution") => "Beta substitution" + (string-copy "Beta substitution" 1 10) + => "eta subst" + (string-copy "Beta substitution" 5) => "substitution" + +<procedure>(string-copy! target tstart s [start end]) -> unspecified</procedure><br> + +Copy the sequence of characters from index range [START,END) in string +S to string TARGET, beginning at index TSTART. The characters are copied +left-to-right or right-to-left as needed -- the copy is guaranteed to work, +even if TARGET and S are the same string. + +It is an error if the copy operation runs off the end of the target string, +''e.g.'' + + + (string-copy! (string-copy "Microsoft") 0 + "Regional Microsoft Operating Companies") => ''error'' + +<procedure>(string-take s nchars) -> string</procedure><br> +<procedure>(string-drop s nchars) -> string</procedure><br> +<procedure>(string-take-right s nchars) -> string</procedure><br> +<procedure>(string-drop-right s nchars) -> string</procedure><br> + +{{string-take}} returns the first NCHARS of S; {{string-drop}} returns all +but the first NCHARS of S. {{string-take-right}} returns the last NCHARS +of S; {{string-drop-right}} returns all but the last NCHARS of S. If these +procedures produce the entire string, they may return either S or a copy of +S; in some implementations, proper substrings may share memory with S. + + + (string-take "Pete Szilagyi" 6) => "Pete S" + (string-drop "Pete Szilagyi" 6) => "zilagyi" + + (string-take-right "Beta rules" 5) => "rules" + (string-drop-right "Beta rules" 5) => "Beta " + +It is an error to take or drop more characters than are in the string: + + + (string-take "foo" 37) => ''error'' + + +<procedure>(string-pad s len [char start end]) -> string</procedure><br> +<procedure>(string-pad-right s len [char start end]) -> string</procedure><br> + +Build a string of length LEN comprised of S padded on the left (right) by +as many occurrences of the character CHAR as needed. If S has more than LEN +chars, it is truncated on the left (right) to length LEN. CHAR defaults to +#\space. + +If LEN <= END-START, the returned value is allowed to share storage with S, +or be exactly S (if LEN = END-START). + + + (string-pad "325" 5) => " 325" + (string-pad "71325" 5) => "71325" + (string-pad "8871325" 5) => "71325" + +<procedure>(string-trim s [char/char-set/pred start end]) -> string</procedure><br> +<procedure>(string-trim-right s [char/char-set/pred start end]) -> string</procedure><br> +<procedure>(string-trim-both s [char/char-set/pred start end]) -> string</procedure><br> + +Trim S by skipping over all characters on the left / on the right / on both +sides that satisfy the second parameter CHAR/CHAR-SET/PRED: + + +* if it is a character CHAR, characters equal to CHAR are trimmed; +* if it is a char set CS, characters contained in CS are trimmed; +* if it is a predicate PRED, it is a test predicate that is applied to the characters in S; a character causing it to return true is skipped. + +CHAR/CHAR-SET/PRED defaults to the character set {{char-set:whitespace}} +defined in SRFI 14. + +If no trimming occurs, these functions may return either S or a copy of S; +in some implementations, proper substrings may share memory with S. + + + (string-trim-both " The outlook wasn't brilliant, \n\r") + => "The outlook wasn't brilliant," + + +==== Modification + +<procedure>(string-fill! s char [start end]) -> unspecified</procedure><br> + +[R5RS+] Stores CHAR in every element of S. + +{{string-fill}} is extended from the R5RS definition to take optional +START/END arguments. + + +==== Comparison + +<procedure>(string-compare s1 s2 proc< proc= proc> [start1 end1 start2 end2]) -> values</procedure><br> +<procedure>(string-compare-ci s1 s2 proc< proc= proc> [start1 end1 start2 end2]) -> values</procedure><br> + +Apply PROC<, PROC=, or PROC> to the mismatch index, depending upon whether +S1 is less than, equal to, or greater than S2. The "mismatch index" is the +largest index I such that for every 0 <= J < I, S1[J] = S2[J] -- that is, I +is the first position that doesn't match. + +{{string-compare-ci}} is the case-insensitive variant. Case-insensitive +comparison is done by case-folding characters with the operation + + + (char-downcase (char-upcase C)) + +where the two case-mapping operations are assumed to be 1-1, locale- and +context-insensitive, and compatible with the 1-1 case mappings specified by +Unicode's UnicodeData.txt table: + +[[ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt]] + +The optional start/end indices restrict the comparison to the indicated +substrings of S1 and S2. The mismatch index is always an index into S1; +in the case of PROC=, it is always END1; we observe the protocol in this +redundant case for uniformity. + + + (string-compare "The cat in the hat" "abcdefgh" + values values values + 4 6 ; Select "ca" + 2 4) ; & "cd" + => 5 ; Index of S1's "a" + +Comparison is simply done on individual code-points of the string. True +text collation is not handled by this SRFI. + +<procedure>(string= s1 s2 [start1 end1 start2 end2]) -> boolean</procedure><br> +<procedure>(string<> s1 s2 [start1 end1 start2 end2]) -> boolean</procedure><br> +<procedure>(string< s1 s2 [start1 end1 start2 end2]) -> boolean</procedure><br> +<procedure>(string> s1 s2 [start1 end1 start2 end2]) -> boolean</procedure><br> +<procedure>(string<= s1 s2 [start1 end1 start2 end2]) -> boolean</procedure><br> +<procedure>(string>= s1 s2 [start1 end1 start2 end2]) -> boolean</procedure><br> + +These procedures are the lexicographic extensions to strings of the +corresponding orderings on characters. For example, {{string<}} is the +lexicographic ordering on strings induced by the ordering {{char<?}} on +characters. If two strings differ in length but are the same up to the +length of the shorter string, the shorter string is considered to be +lexicographically less than the longer string. + +The optional start/end indices restrict the comparison to the indicated +substrings of S1 and S2. + +Comparison is simply done on individual code-points of the string. True +text collation is not handled by this SRFI. + +<procedure>(string-ci= s1 s2 [start1 end1 start2 end2]) -> boolean</procedure><br> +<procedure>(string-ci<> s1 s2 [start1 end1 start2 end2]) -> boolean</procedure><br> +<procedure>(string-ci< s1 s2 [start1 end1 start2 end2]) -> boolean</procedure><br> +<procedure>(string-ci> s1 s2 [start1 end1 start2 end2]) -> boolean</procedure><br> +<procedure>(string-ci<= s1 s2 [start1 end1 start2 end2]) -> boolean</procedure><br> +<procedure>(string-ci>= s1 s2 [start1 end1 start2 end2]) -> boolean</procedure><br> + +Case-insensitive variants. + +Case-insensitive comparison is done by case-folding characters with the +operation + + + (char-downcase (char-upcase C)) + +where the two case-mapping operations are assumed to be 1-1, locale- and +context-insensitive, and compatible with the 1-1 case mappings specified by +Unicode's UnicodeData.txt table: + +[[ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt]] + +<procedure>(string-hash s [bound start end]) -> integer</procedure><br> +<procedure>(string-hash-ci s [bound start end]) -> integer</procedure><br> + +Compute a hash value for the string S. BOUND is a non-negative exact +integer specifying the range of the hash function. A positive value +restricts the return value to the range [0,BOUND). + +If BOUND is either zero or not given, the implementation may use an +implementation-specific default value, chosen to be as large as is +efficiently practical. For instance, the default range might be chosen for +a given implementation to map all strings into the range of integers that +can be represented with a single machine word. + +The optional start/end indices restrict the hash operation to the indicated +substring of S. + +{{string-hash-ci}} is the case-insensitive variant. Case-insensitive +comparison is done by case-folding characters with the operation + + + (char-downcase (char-upcase C)) + +where the two case-mapping operations are assumed to be 1-1, locale- and +context-insensitive, and compatible with the 1-1 case mappings specified by +Unicode's UnicodeData.txt table: + +[[ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt]] + +Invariants: + + + (<= 0 (string-hash s b) (- b 1)) ; When B > 0. + (string= s1 s2) => (= (string-hash s1 b) (string-hash s2 b)) + (string-ci= s1 s2) => (= (string-hash-ci s1 b) (string-hash-ci s2 b)) + +A legal but nonetheless discouraged implementation: + + + (define (string-hash s . other-args) 1) + (define (string-hash-ci s . other-args) 1) + +Rationale: allowing the user to specify an explicit bound simplifies user +code by removing the mod operation that typically accompanies every hash +computation, and also may allow the implementation of the hash function to +exploit a reduced range to efficiently compute the hash value. ''E.g.'', +for small bounds, the hash function may be computed in a fashion such +that intermediate values never overflow into bignum integers, allowing +the implementor to provide a fixnum-specific "fast path" for computing the +common cases very rapidly. + + +==== Prefixes & suffixes + +<procedure>(string-prefix-length s1 s2 [start1 end1 start2 end2]) -> integer</procedure><br> +<procedure>(string-suffix-length s1 s2 [start1 end1 start2 end2]) -> integer</procedure><br> +<procedure>(string-prefix-length-ci s1 s2 [start1 end1 start2 end2]) -> integer</procedure><br> +<procedure>(string-suffix-length-ci s1 s2 [start1 end1 start2 end2]) -> integer</procedure><br> + +Return the length of the longest common prefix/suffix of the two strings. +For prefixes, this is equivalent to the "mismatch index" for the strings +(modulo the STARTi index offsets). + +The optional start/end indices restrict the comparison to the indicated +substrings of S1 and S2. + +{{string-prefix-length-ci}} and {{string-suffix-length-ci}} are the +case-insensitive variants. Case-insensitive comparison is done by +case-folding characters with the operation + + + (char-downcase (char-upcase c)) + +where the two case-mapping operations are assumed to be 1-1, locale- and +context-insensitive, and compatible with the 1-1 case mappings specified by +Unicode's UnicodeData.txt table: + +[[ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt]] + +Comparison is simply done on individual code-points of the string. + +<procedure>(string-prefix? s1 s2 [start1 end1 start2 end2]) -> boolean</procedure><br> +<procedure>(string-suffix? s1 s2 [start1 end1 start2 end2]) -> boolean</procedure><br> +<procedure>(string-prefix-ci? s1 s2 [start1 end1 start2 end2]) -> boolean</procedure><br> +<procedure>(string-suffix-ci? s1 s2 [start1 end1 start2 end2]) -> boolean</procedure><br> + +Is S1 a prefix/suffix of S2? + +The optional start/end indices restrict the comparison to the indicated +substrings of S1 and S2. + +{{string-prefix-ci?}} and {{string-suffix-ci?}} are the case-insensitive +variants. Case-insensitive comparison is done by case-folding characters +with the operation + + + (char-downcase (char-upcase c)) + +where the two case-mapping operations are assumed to be 1-1, locale- and +context-insensitive, and compatible with the 1-1 case mappings specified by +Unicode's UnicodeData.txt table: + +[[ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt]] + +Comparison is simply done on individual code-points of the string. + + +==== Searching + +<procedure>(string-index s char/char-set/pred [start end]) -> integer or #f</procedure><br> +<procedure>(string-index-right s char/char-set/pred [start end]) -> integer or #f</procedure><br> +<procedure>(string-skip s char/char-set/pred [start end]) -> integer or #f</procedure><br> +<procedure>(string-skip-right s char/char-set/pred [start end]) -> integer or #f</procedure><br> + +{{string-index}} ({{string-index-right}}) searches through the string +from the left (right), returning the index of the first occurrence of a +character which + + +* equals CHAR/CHAR-SET/PRED (if it is a character); +* is in CHAR/CHAR-SET/PRED (if it is a character set); +* satisfies the predicate CHAR/CHAR-SET/PRED (if it is a procedure). + +If no match is found, the functions return false. + +The START and END parameters specify the beginning and end indices of the +search; the search includes the start index, but not the end index. Be +careful of "fencepost" considerations: when searching right-to-left, the +first index considered is + +END-1 + +whereas when searching left-to-right, the first index considered is + +START + +That is, the start/end indices describe a same half-open interval +[START,END) in these procedures that they do in all the other SRFI 13 +procedures. + +The skip functions are similar, but use the complement of the criteria: +they search for the first char that ''doesn't'' satisfy the test. ''E.g.'', +to skip over initial whitespace, say + + + (cond ((string-skip s char-set:whitespace) => + + (lambda (i) ...)) ; s[i] is not whitespace. + ...) + +<procedure>(string-count s char/char-set/pred [start end]) -> integer</procedure><br> + +Return a count of the number of characters in S that satisfy the +CHAR/CHAR-SET/PRED argument. If this argument is a procedure, it is applied +to the character as a predicate; if it is a character set, the character +is tested for membership; if it is a character, it is used in an equality +test. + +<procedure>(string-contains s1 s2 [start1 end1 start2 end2]) -> integer or false</procedure><br> +<procedure>(string-contains-ci s1 s2 [start1 end1 start2 end2]) -> integer or false</procedure><br> + +Does string S1 contain string S2? + +Return the index in S1 where S2 occurs as a substring, or false. The +optional start/end indices restrict the operation to the indicated +substrings. + +The returned index is in the range [START1,END1). A successful match must +lie entirely in the [START1,END1) range of S1. + + + (string-contains "eek -- what a geek." "ee" + 12 18) ; Searches "a geek" + => 15 + +{{string-contains-ci}} is the case-insensitive variant. Case-insensitive +comparison is done by case-folding characters with the operation + + + (char-downcase (char-upcase C)) + +where the two case-mapping operations are assumed to be 1-1, locale- and +context-insensitive, and compatible with the 1-1 case mappings specified by +Unicode's UnicodeData.txt table: + +[[ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt]] + +Comparison is simply done on individual code-points of the string. + +The names of these procedures do not end with a question mark -- this is +to indicate that they do not return a simple boolean ({{#t}} or {{#f}}). +Rather, they return either false ({{#f}}) or an exact non-negative integer. + + +==== Alphabetic case mapping + +<procedure>(string-titlecase s [start end]) -> string</procedure><br> +<procedure>(string-titlecase! s [start end]) -> unspecified</procedure><br> + +For every character C in the selected range of S, if C is preceded by a +cased character, it is downcased; otherwise it is titlecased. + +{{string-titlecase}} returns the result string and does not alter its S +parameter. {{string-titlecase!}} is the in-place side-effecting variant. + + + (string-titlecase "--capitalize tHIS sentence.") => + "--Capitalize This Sentence." + + (string-titlecase "see Spot run. see Nix run.") => + "See Spot Run. See Nix Run." + + (string-titlecase "3com makes routers.") => + "3Com Makes Routers." + +Note that if a START index is specified, then the character preceding +S[START] has no effect on the titlecase decision for character S[START]: + + + (string-titlecase "greasy fried chicken" 2) => "Easy Fried Chicken" + +Titlecase and cased information must be compatible with the Unicode +specification. + +<procedure>(string-upcase s [start end]) -> string</procedure><br> +<procedure>(string-upcase! s [start end]) -> unspecified</procedure><br> +<procedure>(string-downcase s [start end]) -> string</procedure><br> +<procedure>(string-downcase! s [start end]) -> unspecified</procedure><br> + +Raise or lower the case of the alphabetic characters in the string. + +{{string-upcase}} and {{string-downcase}} return the result string and do +not alter their S parameter. {{string-upcase!}} and {{string-downcase!}} +are the in-place side-effecting variants. + +These procedures use the locale- and context-insensitive 1-1 case mappings +defined by Unicode's UnicodeData.txt table: + +[[ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt]] + + +==== Reverse & append + +<procedure>(string-reverse s [start end]) -> string</procedure><br> +<procedure>(string-reverse! s [start end]) -> unspecified</procedure><br> + +Reverse the string. + +{{string-reverse}} returns the result string and does not alter its S +parameter. {{string-reverse!}} is the in-place side-effecting variant. + + + (string-reverse "Able was I ere I saw elba.") + => ".able was I ere I saw elbA" + + ;;; In-place rotate-left, the Bell Labs way: + (lambda (s i) + (let ((i (modulo i (string-length s)))) + (string-reverse! s 0 i) + (string-reverse! s i) + (string-reverse! s))) + +Unicode note: Reversing a string simply reverses the sequence of +code-points it contains. So a zero-width accent character A coming +''after'' a base character B in string S would come out ''before'' B in the +reversed result. + +<procedure>(string-concatenate string-list) -> string</procedure><br> + +Append the elements of {{string-list}} together into a single string. +Guaranteed to return a freshly allocated string. + +Note that the {{(apply string-append STRING-LIST)}} idiom is not robust for +long lists of strings, as some Scheme implementations limit the number of +arguments that may be passed to an n-ary procedure. + +<procedure>(string-concatenate/shared string-list) -> string</procedure><br> +<procedure>(string-append/shared s_1 ...) -> string</procedure><br> + +These two procedures are variants of {{string-concatenate}} and +{{string-append}} that are permitted to return results that share storage +with their parameters. In particular, if {{string-append/shared}} is +applied to just one argument, it may return exactly that argument, whereas +{{string-append}} is required to allocate a fresh string. + +<procedure>(string-concatenate-reverse string-list [final-string end]) -> string</procedure><br> +<procedure>(string-concatenate-reverse/shared string-list [final-string end]) -> string</procedure><br> + +With no optional arguments, these functions are equivalent to + + + (string-concatenate (reverse STRING-LIST)) + +and + + + (string-concatenate/shared (reverse STRING-LIST)) + +respectively. + +If the optional argument FINAL-STRING is specified, it is consed onto +the beginning of STRING-LIST before performing the list-reverse and +string-concatenate operations. + +If the optional argument END is given, only the first END characters of +FINAL-STRING are added to the string list, thus producing + + + (string-concatenate + (reverse (cons (substring/shared FINAL-STRING 0 END) + STRING-LIST))) + + +''E.g.'' + + + (string-concatenate-reverse '(" must be" "Hello, I") " going.XXXX" 7) + => "Hello, I must be going." + +This procedure is useful in the construction of procedures that accumulate +character data into lists of string buffers, and wish to convert the +accumulated data into a single string when done. + +Unicode note: Reversing a string simply reverses the sequence of +code-points it contains. So a zero-width accent character AC coming +''after'' a base character BC in string S would come out ''before'' BC in +the reversed result. + + +==== Fold, unfold & map + +<procedure>(string-map proc s [start end]) -> string</procedure><br> +<procedure>(string-map! proc s [start end]) -> unspecified</procedure><br> + +PROC is a char->char procedure; it is mapped over S. + +{{string-map}} returns the result string and does not alter its S +parameter. {{string-map!}} is the in-place side-effecting variant. + +Note: The order in which PROC is applied to the elements of S is not +specified. + +<procedure>(string-fold kons knil s [start end]) -> value</procedure><br> +<procedure>(string-fold-right kons knil s [start end]) -> value</procedure><br> + +These are the fundamental iterators for strings. + +The left-fold operator maps the KONS procedure across the string from left +to right + + + (... (KONS S[2] (KONS S[1] (KONS S[0] KNIL)))) + + +In other words, {{string-fold}} obeys the (tail) recursion + + + (string-fold KONS KNIL S START END) = + (string-fold KONS (KONS S[START] KNIL) START+1 END) + + +The right-fold operator maps the KONS procedure across the string from +right to left + + + (KONS S[0] (... (KONS S[END-3] (KONS S[END-2] (KONS S[END-1] KNIL))))) + + +obeying the (tail) recursion + + + (string-fold-right KONS KNIL S START END) = + (string-fold-right KONS (KONS S[END-1] KNIL) START END-1) + + +Examples: + + + ;;; Convert a string to a list of chars. + (string-fold-right cons '() s) + + ;;; Count the number of lower-case characters in a string. + (string-fold (lambda (c count) + (if (char-lower-case? c) + (+ count 1) + count)) + 0 + s) + + ;;; Double every backslash character in S. + (let* ((ans-len (string-fold (lambda (c sum) + (+ sum (if (char=? c #\\) 2 1))) + 0 s)) + (ans (make-string ans-len))) + (string-fold (lambda (c i) + (let ((i (if (char=? c #\\) + (begin (string-set! ans i #\\) (+ i 1)) + i))) + (string-set! ans i c) + (+ i 1))) + 0 s) + ans) + +The right-fold combinator is sometimes called a "catamorphism." + +<procedure>(string-unfold p f g seed [base make-final]) -> string</procedure><br> + +This is a fundamental constructor for strings. + + +* G is used to generate a series of "seed" values from the initial seed: SEED, (G SEED), (G^2 SEED), (G^3 SEED), ... +* P tells us when to stop -- when it returns true when applied to one of these seed values. +* F maps each seed value to the corresponding character in the result string. These chars are assembled into the string in a left-to-right order. +* BASE is the optional initial/leftmost portion of the constructed string; it defaults to the empty string "". +* MAKE-FINAL is applied to the terminal seed value (on which P returns true) to produce the final/rightmost portion of the constructed string. It defaults to {{(lambda (x) "")}}. + +More precisely, the following (simple, inefficient) definitions hold: + + + ;;; Iterative + (define (string-unfold p f g seed base make-final) + (let lp ((seed seed) (ans base)) + (if (p seed) + (string-append ans (make-final seed)) + (lp (g seed) (string-append ans (string (f seed))))))) + + ;;; Recursive + (define (string-unfold p f g seed base make-final) + (string-append base + (let recur ((seed seed)) + (if (p seed) (make-final seed) + (string-append (string (f seed)) + (recur (g seed))))))) + +{{string-unfold}} is a fairly powerful string constructor -- you can use it +to convert a list to a string, read a port into a string, reverse a string, +copy a string, and so forth. Examples: + + + (port->string p) = (string-unfold eof-object? values + (lambda (x) (read-char p)) + (read-char p)) + + (list->string lis) = (string-unfold null? car cdr lis) + + (string-tabulate f size) = (string-unfold (lambda (i) (= i size)) f add1 0) + +To map F over a list LIS, producing a string: + + + (string-unfold null? (compose f car) cdr lis) + +Interested functional programmers may enjoy noting that +{{string-fold-right}} and {{string-unfold}} are in some sense inverses. +That is, given operations KNULL?, KAR, KDR, KONS, and KNIL satisfying + + + (KONS (KAR x) (KDR x)) = x and (KNULL? KNIL) = #t + +then + + + (string-fold-right KONS KNIL (string-unfold KNULL? KAR KDR X)) = X + + +and + + + (string-unfold KNULL? KAR KDR (string-fold-right KONS KNIL S)) = S. + + +The final string constructed does not share storage with either BASE or the +value produced by MAKE-FINAL. + +This combinator sometimes is called an "anamorphism." + +Note: implementations should take care that runtime stack limits do not +cause overflow when constructing large (''e.g.'', megabyte) strings with +{{string-unfold}}. + +<procedure>(string-unfold-right p f g seed [base make-final]) -> string</procedure><br> + +This is a fundamental constructor for strings. + + +* G is used to generate a series of "seed" values from the initial seed: SEED, (G SEED), (G^2 SEED), (G^3 SEED), ... +* P tells us when to stop -- when it returns true when applied to one of these seed values. +* F maps each seed value to the corresponding character in the result string. These chars are assembled into the string in a right-to-left order. +* BASE is the optional initial/rightmost portion of the constructed string; it defaults to the empty string "". +* MAKE-FINAL is applied to the terminal seed value (on which P returns true) to produce the final/leftmost portion of the constructed string. It defaults to {{(lambda (x) "")}}. + +More precisely, the following (simple, inefficient) definitions hold: + + + ;;; Iterative + (define (string-unfold-right p f g seed base make-final) + (let lp ((seed seed) (ans base)) + (if (p seed) + (string-append (make-final seed) ans) + (lp (g seed) (string-append (string (f seed)) ans))))) + + ;;; Recursive + (define (string-unfold-right p f g seed base make-final) + (string-append (let recur ((seed seed)) + (if (p seed) (make-final seed) + (string-append (recur (g seed)) + (string (f seed))))) + base)) + +Interested functional programmers may enjoy noting that {{string-fold}} +and {{string-unfold-right}} are in some sense inverses. That is, given +operations KNULL?, KAR, KDR, KONS, and KNIL satisfying + +{{(KONS (KAR X) (KDR X))}} = X and {{(KNULL? KNIL)}} = #t + +then + + + (string-fold KONS KNIL (string-unfold-right KNULL? KAR KDR X)) = X + + +and + + + (string-unfold-right KNULL? KAR KDR (string-fold KONS KNIL S)) = S. + + +The final string constructed does not share storage with either BASE or the +value produced by MAKE-FINAL. + +Note: implementations should take care that runtime stack limits do not +cause overflow when constructing large (''e.g.'', megabyte) strings with +{{string-unfold-right.}} + +<procedure>(string-for-each proc s [start end]) -> unspecified</procedure><br> + +Apply PROC to each character in S. {{string-for-each}} is required to +iterate from START to END in increasing order. + +<procedure>(string-for-each-index proc s [start end]) -> unspecified</procedure><br> + +Apply PROC to each index of S, in order. The optional START/END pairs +restrict the endpoints of the loop. This is simply a method of looping over +a string that is guaranteed to be safe and correct. Example: + + + (let* ((len (string-length s)) + (ans (make-string len))) + (string-for-each-index + (lambda (i) (string-set! ans (- len i) (string-ref s i))) + s) + ans) + + +==== Replicate & rotate + +<procedure>(xsubstring s from [to start end]) -> string</procedure><br> + +This is the "extended substring" procedure that implements replicated +copying of a substring of some string. + +S is a string; START and END are optional arguments that demarcate a +substring of S, defaulting to 0 and the length of S (''i.e.'', the whole +string). Replicate this substring up and down index space, in both the +positive and negative directions. For example, if S = "abcdefg", START=3, +and END=6, then we have the conceptual bidirectionally-infinite string + + +<table> +<tr><td>...</td><td>d</td><td>e</td><td>f</td><td>d</td><td>e</td><td>f</td><td>d</td><td>e</td><td>f</td><td>d</td><td>e</td><td>f</td><td>d</td><td>e</td><td>f</td><td>d</td><td>e</td><td>f</td><td>d</td><td>...</td></tr> +<tr><td>...</td><td>-9</td><td>-8</td><td>-7</td><td>-6</td><td>-5</td><td>-4</td><td>-3</td><td>-2</td><td>-1</td><td>0</td><td>+1</td><td>+2</td><td>+3</td><td>+4</td><td>+5</td><td>+6</td><td>+7</td><td>+8</td><td>+9</td><td>...</td></tr></table> + +{{xsubstring}} returns the substring of this string beginning at index +FROM, and ending at TO (which defaults to FROM+(END-START)). + +You can use {{xsubstring}} to perform a variety of tasks: + + +* To rotate a string left: {{(xsubstring "abcdef" 2)}} => {{"cdefab"}} +* To rotate a string right: {{(xsubstring "abcdef" -2)}} => {{"efabcd"}} +* To replicate a string: {{(xsubstring "abc" 0 7)}} => {{"abcabca"}} + +Note that + + +* The FROM/TO indices give a half-open range -- the characters from index FROM up to, but not including, index TO. +* The FROM/TO indices are not in terms of the index space for string S. They are in terms of the replicated index space of the substring defined by S, START, and END. + +It is an error if START=END -- although this is allowed by special +dispensation when FROM=TO. + +<procedure>(string-xcopy! target tstart s sfrom [sto start end]) -> unspecified</procedure><br> + +Exactly the same as {{xsubstring,}} but the extracted text is written into +the string TARGET starting at index TSTART. This operation is not defined +if {{(eq? TARGET S)}} or these two arguments share storage -- you cannot +copy a string on top of itself. + + +==== Miscellaneous: insertion, parsing + +<procedure>(string-replace s1 s2 start1 end1 [start2 end2]) -> string</procedure><br> + +Returns + + + (string-append (substring/shared S1 0 START1) + (substring/shared S2 START2 END2) + (substring/shared S1 END1 (string-length S1))) + + +That is, the segment of characters in S1 from START1 to END1 is replaced by +the segment of characters in S2 from START2 to END2. If START1=END1, this +simply splices the S2 characters into S1 at the specified index. + +Examples: + + + (string-replace "The TCL programmer endured daily ridicule." + "another miserable perl drone" 4 7 8 22 ) => + "The miserable perl programmer endured daily ridicule." + + (string-replace "It's easy to code it up in Scheme." "lots of fun" 5 9) => + "It's lots of fun to code it up in Scheme." + + (define (string-insert s i t) (string-replace s t i i)) + + (string-insert "It's easy to code it up in Scheme." 5 "really ") => + "It's really easy to code it up in Scheme." + +<procedure>(string-tokenize s [token-set start end]) -> list</procedure><br> + +Split the string S into a list of substrings, where each substring is a +maximal non-empty contiguous sequence of characters from the character set +TOKEN-SET. + + +* TOKEN-SET defaults to {{char-set:graphic}} (see SRFI 14 for more on character sets and {{char-set:graphic}}). +* If START or END indices are provided, they restrict {{string-tokenize}} to operating on the indicated substring of S. + +This function provides a minimal parsing facility for simple applications. +More sophisticated parsers that handle quoting and backslash effects can +easily be constructed using regular-expression systems; be careful not to +use {{string-tokenize}} in contexts where more serious parsing is needed. + + + (string-tokenize "Help make programs run, run, RUN!") => + ("Help" "make" "programs" "run," "run," "RUN!") + + +==== Filtering & deleting + +<procedure>(string-filter char/char-set/pred s [start end]) -> string</procedure><br> +<procedure>(string-delete har/char-set/pred s [start end]) -> string</procedure><br> + +Filter the string S, retaining only those characters that satisfy / do not +satisfy the CHAR/CHAR-SET/PRED argument. If this argument is a procedure, +it is applied to the character as a predicate; if it is a char-set, the +character is tested for membership; if it is a character, it is used in an +equality test. + +If the string is unaltered by the filtering operation, these functions may +return either S or a copy of S. + + +=== Low-level procedures + +The following procedures are useful for writing other string-processing +functions. In a Scheme system that has a module or package system, these +procedures should be contained in a module named "string-lib-internals". + + +==== Start/end optional-argument parsing & checking utilities + +<procedure>(string-parse-start+end proc s args) -> [rest start end]</procedure><br> +<procedure>(string-parse-final-start+end proc s args) -> [start end]</procedure><br> + +{{string-parse-start+end}} may be used to parse a pair of optional +START/END arguments from an argument list, defaulting them to 0 and the +length of some string S, respectively. Let the length of string S be SLEN. + + +* If ARGS = (), the function returns {{(values '() 0 SLEN)}} +* If ARGS = (I), I is checked to ensure it is an exact integer, and that 0 <= i <= SLEN. Returns {{(values (cdr ARGS) I SLEN)}}. +* If ARGS = {{(I J ...)}}, I and J are checked to ensure they are exact integers, and that 0 <= I <= J <= SLEN. Returns {{(values (cddr ARGS) I J)}}. + +If any of the checks fail, an error condition is raised, and PROC is used +as part of the error condition -- it should be the client procedure whose +argument list {{string-parse-start+end}} is parsing. + +{{string-parse-final-start+end}} is exactly the same, except that the args +list passed to it is required to be of length two or less; if it is longer, +an error condition is raised. It may be used when the optional START/END +parameters are final arguments to the procedure. + +Note that in all cases, these functions ensure that S is a string (by +necessity, since all cases apply {{string-length}} to S either to default +END or to bounds-check it). + +<procedure>(let-string-start+end (start end [rest]) proc-exp s-exp args-exp body ...) -> value(s)</procedure><br> + +[Syntax] Syntactic sugar for an application of {{string-parse-start+end}} +or {{string-parse-final-start+end.}} + +If a REST variable is given, the form is equivalent to + + + (call-with-values + (lambda () (string-parse-start+end PROC-EXP S-EXP ARGS-EXP)) + (lambda (REST START END) BODY ...)) + + +If no REST variable is given, the form is equivalent to + + + (call-with-values + (lambda () (string-parse-final-start+end PROC-EXP S-EXP ARGS-EXP)) + (lambda (START END) BODY ...)) + + +<procedure>(check-substring-spec proc s start end) -> unspecified</procedure><br> +<procedure>(substring-spec-ok? s start end) -> boolean</procedure><br> + +Check values S, START and END to ensure they specify a valid substring. +This means that S is a string, START and END are exact integers, and 0 <= +START <= END <= {{(string-length S)}} + +If the values are not proper + + +* {{check-substring-spec}} raises an error condition. PROC is used as part of the error condition, and should be the procedure whose parameters we are checking. +* {{substring-spec-ok?}} returns false. + +Otherwise, {{substring-spec-ok?}} returns true, and +{{check-substring-spec}} simply returns (what it returns is not specified). + + +==== Knuth-Morris-Pratt searching + +The Knuth-Morris-Pratt string-search algorithm is a method of rapidly +scanning a sequence of text for the occurrence of some fixed string. It has +the advantage of never requiring backtracking -- hence, it is useful for +searching not just strings, but also other sequences of text that do not +support backtracking or random-access, such as input ports. These routines +package up the initialisation and searching phases of the algorithm for +general use. They also support searching through sequences of text that +arrive in buffered chunks, in that intermediate search state can be +carried across applications of the search loop from the end of one buffer +application to the next. + +A second critical property of KMP search is that it requires the allocation +of auxiliary memory proportional to the length of the pattern, but +''constant'' in the size of the character type. Alternate searching +algorithms frequently require the construction of a table with an entry for +every possible character -- which can be prohibitively expensive in a 16- +or 32-bit character representation. + +<procedure>(make-kmp-restart-vector s [c= start end]) -> integer-vector</procedure><br> + +Build a Knuth-Morris-Pratt "restart vector," which is useful for quickly +searching character sequences for the occurrence of string S (or the +substring of S demarcated by the optional START/END parameters, if +provided). C= is a character-equality function used to construct the +restart vector. It defaults to {{char=?}}; use {{char-ci=?}} instead for +case-folded string search. + +The definition of the restart vector RV for string S is: If we have matched +chars 0..I-1 of S against some search string SS, and S[I] doesn't match +SS[K], then reset I := RV[I], and try again to match SS[K]. If RV[I] = -1, +then punt SS[K] completely, and move on to SS[K+1] and S[0]. + +In other words, if you have matched the first I chars of S, but the I+1'th +char doesn't match, RV[I] tells you what the next-longest prefix of S is +that you have matched. + +The following string-search function shows how a restart vector is used to +search. Note the attractive feature of the search process: it is "on line," +that is, it never needs to back up and reconsider previously seen data. It +simply consumes characters one-at-a-time until declaring a complete match +or reaching the end of the sequence. Thus, it can be easily adapted to +search other character sequences (such as ports) that do not provide random +access to their contents. + + + (define (find-substring pattern source start end) + (let ((plen (string-length pattern)) + (rv (make-kmp-restart-vector pattern))) + + ;; The search loop. SJ & PJ are redundant state. + (let lp ((si start) (pi 0) + (sj (- end start)) ; (- end si) -- how many chars left. + (pj plen)) ; (- plen pi) -- how many chars left. + + (if (= pi plen) (- si plen) ; Win. + + (and (<= pj sj) ; Lose. + + (if (char=? (string-ref source si) ; Test. + (string-ref pattern pi)) + (lp (+ 1 si) (+ 1 pi) (- sj 1) (- pj 1)) ; Advance. + + (let ((pi (vector-ref rv pi))) ; Retreat. + (if (= pi -1) + (lp (+ si 1) 0 (- sj 1) plen) ; Punt. + (lp si pi sj (- plen pi)))))))))) + +The optional START/END parameters restrict the restart vector to the +indicated substring of PAT; RV is END - START elements long. If START > +0, then RV is offset by START elements from PAT. That is, RV[I] describes +pattern element PAT[I + START]. Elements of RV are themselves indices that +range just over [0, END-START), ''not'' [START, END). + +Rationale: the actual value of RV is "position independent" -- it does +not depend on where in the PAT string the pattern occurs, but only on the +actual characters comprising the pattern. + +<procedure>(kmp-step pat rv c i c= p-start) -> integer</procedure><br> + +This function encapsulates the work performed by one step of the KMP string +search; it can be used to scan strings, input ports, or other on-line +character sources for fixed strings. + +PAT is the non-empty string specifying the text for which we are searching. +RV is the Knuth-Morris-Pratt restart vector for the pattern, as constructed +by {{make-kmp-restart-vector.}} The pattern begins at PAT[P-START], and +is {{(string-length RV)}} characters long. C= is the character-equality +function used to construct the restart vector, typically {{char=?}} or +{{char-ci=?}}. + +Suppose the pattern is N characters in length: PAT[P-START, P-START + +N). We have already matched I characters: PAT[P-START, P-START + I). +(P-START is typically zero.) C is the next character in the input stream. +{{kmp-step}} returns the new I value -- that is, how much of the pattern +we have matched, ''including'' character C. When I reaches N, the entire +pattern has been matched. + +Thus a typical search loop looks like this: + + + (let lp ((i 0)) + (or (= i n) ; Win -- #t + (and (not (end-of-stream)) ; Lose -- #f + (lp (kmp-step pat rv (get-next-character) i char=? 0))))) + +Example: + + + ;; Read chars from IPORT until we find string PAT or hit EOF. + (define (port-skip pat iport) + (let* ((rv (make-kmp-restart-vector pat)) + (patlen (string-length pat))) + (let lp ((i 0) (nchars 0)) + (if (= i patlen) nchars ; Win -- nchars skipped + (let ((c (read-char iport))) + (if (eof-object? c) c ; Fail -- EOF + (lp (kmp-step pat rv c i char=? 0) ; Continue + (+ nchars 1)))))))) + +This procedure could be defined as follows: + + + (define (kmp-step pat rv c i c= p-start) + (let lp ((i i)) + (if (c= c (string-ref pat (+ i p-start))) ; Match => + (+ i 1) ; Done. + (let ((i (vector-ref rv i))) ; Back up in PAT. + (if (= i -1) 0 ; Can't back up more. + (lp i))))))) ; Keep going. + +Rationale: this procedure takes no optional arguments because it is +intended as an inner-loop primitive and we do not want any run-time penalty +for optional-argument parsing and defaulting, nor do we wish barriers to +procedure integration/inlining. + +<procedure>(string-kmp-partial-search pat rv s i [c= p-start s-start s-end]) -> integer</procedure><br> + +Applies {{kmp-step}} across S; optional S-START/S-END bounds parameters +restrict search to a substring of S. The pattern is {{(vector-length RV)}} +characters long; optional P-START index indicates non-zero start of pattern +in PAT. + +Suppose PLEN = {{(vector-length RV)}} is the length of the pattern. I is an +integer index into the pattern (that is, 0 <= I < PLEN) indicating how much +of the pattern has already been matched. (This means the pattern must be +non-empty -- PLEN > 0.) + + +* On success, returns -J, where J is the index in S bounding the ''end'' of the pattern -- ''e.g.'', a value that could be used as the END parameter in a call to {{substring/shared}}. +* On continue, returns the current search state I' (an index into RV) when the search reached the end of the string. This is a non-negative integer. + +Hence: + + +* A negative return value indicates success, and says where in the string the match occured. +* A non-negative return value provides the I to use for continued search in a following string. + +This utility is designed to allow searching for occurrences of a fixed +string that might extend across multiple buffers of text. This is why, +for example, we do not provide the index of the ''start'' of the match on +success -- it may have occurred in a previous buffer. + +To search a character sequence that arrives in "chunks," write a loop of +this form: + + + (let lp ((i 0)) + (and (not (end-of-data?)) ; Lose -- return #f. + (let* ((buf (get-next-chunk)) ; Get or fill up the buffer. + (i (string-kmp-partial-search pat rv buf i))) + (if (< i 0) (- i) ; Win -- return end index. + (lp i))))) ; Keep looking. + +Modulo start/end optional-argument parsing, this procedure could be defined +as follows: + + + (define (string-kmp-partial-search pat rv s i c= p-start s-start s-end) + (let ((patlen (vector-length rv))) + (let lp ((si s-start) ; An index into S. + (vi i)) ; An index into RV. + (cond ((= vi patlen) (- si)) ; Win. + ((= si end) vi) ; Ran off the end. + (else (lp (+ si 1) ; Match s[si] & loop. + (kmp-step pat rv (string-ref s si) + vi c= p-start))))))) + +---- ---- Previous: [[Unit srfi-4]] Next: [[Unit srfi-14]] diff --git a/manual/Unit srfi-14 b/manual/Unit srfi-14 index 975191ab..37ecb9e0 100644 --- a/manual/Unit srfi-14 +++ b/manual/Unit srfi-14 @@ -2,8 +2,10 @@ == Unit srfi-14 -Character set library, see the documentation for -[[http://srfi.schemers.org/srfi-14/srfi-14.html|SRFI-14]] +Character set library. An abbreviated version of the SRFI is provided +in this document. Full documentation is available in the +[[http://srfi.schemers.org/srfi-14/srfi-14.html|original SRFI-14 +document]]. On systems that support dynamic loading, the {{srfi-14}} unit can be made available in the interpreter ({{csi}}) by entering @@ -12,7 +14,925 @@ be made available in the interpreter ({{csi}}) by entering (require-extension srfi-14) </enscript> -This library provides only the Latin-1 character set. +This library provides only the Latin-1 character set. To get Unicode +semantics, see the [[utf8]] egg. However, information on Unicode +character sets is still provided in this document. + +== Specification + +In the following procedure specifications: + + +* A CS parameter is a character set. +* An S parameter is a string. +* A CHAR parameter is a character. +* A CHAR-LIST parameter is a list of characters. +* A PRED parameter is a unary character predicate procedure, returning a true/false value when applied to a character. +* An OBJ parameter may be any value at all. + +Passing values to procedures with these parameters that do not satisfy +these types is an error. + +Unless otherwise noted in the specification of a procedure, procedures +always return character sets that are distinct (from the point of view +of the linear-update operations) from the parameter character sets. For +example, {{char-set-adjoin}} is guaranteed to provide a fresh character +set, even if it is not given any character parameters. + +Parameters given in square brackets are optional. Unless otherwise noted in +the text describing the procedure, any prefix of these optional parameters +may be supplied, from zero arguments to the full list. When a procedure +returns multiple values, this is shown by listing the return values in +square brackets, as well. So, for example, the procedure with signature + + + halts? F [X INIT-STORE] -> [BOOLEAN INTEGER] + + +would take one (F), two (F, X) or three (F, X, INIT-STORE) input +parameters, and return two values, a boolean and an integer. + +A parameter followed by "{{...}}" means zero-or-more elements. So the +procedure with the signature + + + sum-squares X ... -> NUMBER + +takes zero or more arguments (X ...), while the procedure with signature + + + spell-check DOC DICT_1 DICT_2 ... -> STRING-LIST + + +takes two required parameters (DOC and DICT_1) and zero or more optional +parameters (DICT_2 ...). + + +=== General procedures + +<procedure>(char-set? obj) -> boolean</procedure><br> + +Is the object OBJ a character set? + +<procedure>(char-set= cs_1 ...) -> boolean</procedure><br> + +Are the character sets equal? + +Boundary cases: + + + (char-set=) => TRUE + (char-set= cs) => TRUE + +Rationale: transitive binary relations are generally extended to n-ary +relations in Scheme, which enables clearer, more concise code to be +written. While the zero-argument and one-argument cases will almost +certainly not arise in first-order uses of such relations, they may well +arise in higher-order cases or macro-generated code. ''E.g.,'' consider + + + (apply char-set= cset-list) + + +This is well-defined if the list is empty or a singleton list. Hence +we extend these relations to any number of arguments. Implementors have +reported actual uses of n-ary relations in higher-order cases allowing for +fewer than two arguments. The way of Scheme is to handle the general case; +we provide the fully general extension. + +A counter-argument to this extension is that R5RS's transitive binary +arithmetic relations ({{=}}, {{<}}, ''etc.'') require at least two +arguments, hence this decision is a break with the prior convention -- +although it is at least one that is backwards-compatible. + +<procedure>(char-set<= cs_1 ...) -> boolean</procedure><br> + +Returns true if every character set CS_I is a subset of character set +CS_I+1. + +Boundary cases: + + + (char-set<=) => TRUE + (char-set<= cs) => TRUE + +Rationale: See {{char-set=}} for discussion of zero- and one-argument +applications. Consider testing a list of char-sets for monotonicity with + + + (apply char-set<= cset-list) + +<procedure>(char-set-hash cs [bound]) -> integer</procedure><br> + +Compute a hash value for the character set CS. BOUND is a non-negative +exact integer specifying the range of the hash function. A positive value +restricts the return value to the range [0,BOUND). + +If BOUND is either zero or not given, the implementation may use an +implementation-specific default value, chosen to be as large as is +efficiently practical. For instance, the default range might be chosen for +a given implementation to map all strings into the range of integers that +can be represented with a single machine word. + +Invariant: + + + (char-set= cs_1 cs_2) => (= (char-set-hash cs_1 b) (char-set-hash cs_2 b)) + + +A legal but nonetheless discouraged implementation: + + + (define (char-set-hash cs . maybe-bound) 1) + +Rationale: allowing the user to specify an explicit bound simplifies user +code by removing the mod operation that typically accompanies every hash +computation, and also may allow the implementation of the hash function to +exploit a reduced range to efficiently compute the hash value. ''E.g.'', +for small bounds, the hash function may be computed in a fashion such +that intermediate values never overflow into bignum integers, allowing +the implementor to provide a fixnum-specific "fast path" for computing the +common cases very rapidly. + + +=== Iterating over character sets + +<procedure>(char-set-cursor cset) -> cursor</procedure><br> +<procedure>(char-set-ref cset cursor) -> char</procedure><br> +<procedure>(char-set-cursor-next cset cursor) -> cursor</procedure><br> +<procedure>(end-of-char-set? cursor) -> boolean</procedure><br> + +Cursors are a low-level facility for iterating over the characters +in a set. A cursor is a value that indexes a character in a char set. +{{char-set-cursor}} produces a new cursor for a given char set. The +set element indexed by the cursor is fetched with {{char-set-ref}}. +A cursor index is incremented with {{char-set-cursor-next}}; in this +way, code can step through every character in a char set. Stepping +a cursor "past the end" of a char set produces a cursor that answers +true to {{end-of-char-set?}}. It is an error to pass such a cursor to +{{char-set-ref}} or to {{char-set-cursor-next}}. + +A cursor value may not be used in conjunction with a different character +set; if it is passed to {{char-set-ref}} or {{char-set-cursor-next}} with a +character set other than the one used to create it, the results and effects +are undefined. + +Cursor values are ''not'' necessarily distinct from other types. They +may be integers, linked lists, records, procedures or other values. +This license is granted to allow cursors to be very "lightweight" values +suitable for tight iteration, even in fairly simple implementations. + +Note that these primitives are necessary to export an iteration facility +for char sets to loop macros. + +Example: + + + (define cs (char-set #\G #\a #\T #\e #\c #\h)) + + ;; Collect elts of CS into a list. + (let lp ((cur (char-set-cursor cs)) (ans '())) + (if (end-of-char-set? cur) ans + (lp (char-set-cursor-next cs cur) + (cons (char-set-ref cs cur) ans)))) + => (#\G #\T #\a #\c #\e #\h) + + ;; Equivalently, using a list unfold (from SRFI 1): + (unfold-right end-of-char-set? + (curry char-set-ref cs) + (curry char-set-cursor-next cs) + (char-set-cursor cs)) + => (#\G #\T #\a #\c #\e #\h) + +Rationale: Note that the cursor API's four functions "fit" the functional +protocol used by the unfolders provided by the list, string and char-set +SRFIs (see the example above). By way of contrast, here is a simpler, +two-function API that was rejected for failing this criterion. Besides +{{char-set-cursor}}, it provided a single function that mapped a cursor and +a character set to two values, the indexed character and the next cursor. +If the cursor had exhausted the character set, then this function returned +false instead of the character value, and another end-of-char-set cursor. +In this way, the other three functions of the current API were combined +together. + +<procedure>(char-set-fold kons knil cs) -> object</procedure><br> + +This is the fundamental iterator for character sets. Applies the function +KONS across the character set CS using initial state value KNIL. That is, +if CS is the empty set, the procedure returns KNIL. Otherwise, some element +C of CS is chosen; let CS' be the remaining, unchosen characters. The +procedure returns + + + (char-set-fold KONS (KONS C KNIL) CS') + +Examples: + + + ;; CHAR-SET-MEMBERS + (lambda (cs) (char-set-fold cons '() cs)) + + ;; CHAR-SET-SIZE + (lambda (cs) (char-set-fold (lambda (c i) (+ i 1)) 0 cs)) + + ;; How many vowels in the char set? + (lambda (cs) + (char-set-fold (lambda (c i) (if (vowel? c) (+ i 1) i)) + 0 cs)) + +<procedure>(char-set-unfold f p g seed [base-cs]) -> char-set</procedure><br> +<procedure>(char-set-unfold! f p g seed base-cs) -> char-set</procedure><br> + +This is a fundamental constructor for char-sets. + + +* G is used to generate a series of "seed" values from the initial seed: SEED, (G SEED), (G^2 SEED), (G^3 SEED), ... +* P tells us when to stop -- when it returns true when applied to one of these seed values. +* F maps each seed value to a character. These characters are added to the base character set BASE-CS to form the result; BASE-CS defaults to the empty set. {{char-set-unfold!}} adds the characters to BASE-CS in a linear-update -- it is allowed, but not required, to side-effect and use BASE-CS's storage to construct the result. + +More precisely, the following definitions hold, ignoring the +optional-argument issues: + + + (define (char-set-unfold p f g seed base-cs) + (char-set-unfold! p f g seed (char-set-copy base-cs))) + + (define (char-set-unfold! p f g seed base-cs) + (let lp ((seed seed) (cs base-cs)) + (if (p seed) cs ; P says we are done. + (lp (g seed) ; Loop on (G SEED). + (char-set-adjoin! cs (f seed)))))) ; Add (F SEED) to set. + +(Note that the actual implementation may be more efficient.) + +Examples: + + + + (port->char-set p) = (char-set-unfold eof-object? values + (lambda (x) (read-char p)) + (read-char p)) + + (list->char-set lis) = (char-set-unfold null? car cdr lis) + + +<procedure>(char-set-for-each proc cs) -> unspecified</procedure><br> + +Apply procedure PROC to each character in the character set CS. Note that +the order in which PROC is applied to the characters in the set is not +specified, and may even change from one procedure application to another. + +Nothing at all is specified about the value returned by this procedure; +it is not even required to be consistent from call to call. It is simply +required to be a value (or values) that may be passed to a command +continuation, ''e.g.'' as the value of an expression appearing as a +non-terminal subform of a {{begin}} expression. Note that in R5RS, this +restricts the procedure to returning a single value; non-R5RS systems may +not even provide this restriction. + +<procedure>(char-set-map proc cs) -> char-set</procedure><br> + +PROC is a char->char procedure. Apply it to all the characters in the +char-set CS, and collect the results into a new character set. + +Essentially lifts PROC from a char->char procedure to a char-set -> +char-set procedure. + +Example: + + + (char-set-map char-downcase cset) + + +=== Creating character sets + +<procedure>(char-set-copy cs) -> char-set</procedure><br> + +Returns a copy of the character set CS. "Copy" means that if either the +input parameter or the result value of this procedure is passed to one of +the linear-update procedures described below, the other character set is +guaranteed not to be altered. + +A system that provides pure-functional implementations of the +linear-operator suite could implement this procedure as the identity +function -- so copies are ''not'' guaranteed to be distinct by {{eq?}}. + +<procedure>(char-set char_1 ...) -> char-set</procedure><br> + +Return a character set containing the given characters. + +<procedure>(list->char-set char-list [base-cs]) -> char-set</procedure><br> +<procedure>(list->char-set! char-list base-cs) -> char-set</procedure><br> + +Return a character set containing the characters in the list of characters +CHAR-LIST. + +If character set BASE-CS is provided, the characters from CHAR-LIST +are added to it. {{list->char-set!}} is allowed, but not required, to +side-effect and reuse the storage in BASE-CS; {{list->char-set}} produces a +fresh character set. + +<procedure>(string->char-set s [base-cs]) -> char-set</procedure><br> +<procedure>(string->char-set! s base-cs) -> char-set</procedure><br> + +Return a character set containing the characters in the string S. + +If character set BASE-CS is provided, the characters from S are added to +it. {{string->char-set!}} is allowed, but not required, to side-effect +and reuse the storage in BASE-CS; {{string->char-set}} produces a fresh +character set. + +<procedure>(char-set-filter pred cs [base-cs]) -> char-set</procedure><br> +<procedure>(char-set-filter! pred cs base-cs) -> char-set</procedure><br> + +Returns a character set containing every character C in CS such that +{{(PRED C)}} returns true. + +If character set BASE-CS is provided, the characters specified by PRED +are added to it. {{char-set-filter!}} is allowed, but not required, to +side-effect and reuse the storage in BASE-CS; {{char-set-filter}} produces +a fresh character set. + +An implementation may not save away a reference to PRED and invoke it after +{{char-set-filter}} or {{char-set-filter!}} returns -- that is, "lazy," +on-demand implementations are not allowed, as PRED may have external +dependencies on mutable data or have other side-effects. + +Rationale: This procedure provides a means of converting a character +predicate into its equivalent character set; the CS parameter allows the +programmer to bound the predicate's domain. Programmers should be aware +that filtering a character set such as {{char-set:full}} could be a very +expensive operation in an implementation that provided an extremely large +character type, such as 32-bit Unicode. An earlier draft of this library +provided a simple {{predicate->char-set}} procedure, which was rejected in +favor of {{char-set-filter}} for this reason. + +<procedure>(ucs-range->char-set lower upper [error? base-cs]) -> char-set</procedure><br> +<procedure>(ucs-range->char-set! lower upper error? base-cs) -> char-set</procedure><br> + +LOWER and UPPER are exact non-negative integers; LOWER <= UPPER. + +Returns a character set containing every character whose ISO/IEC 10646 +UCS-4 code lies in the half-open range [LOWER,UPPER). + + +* If the requested range includes unassigned UCS values, these are silently ignored (the current UCS specification has "holes" in the space of assigned codes). +* If the requested range includes "private" or "user space" codes, these are handled in an implementation-specific manner; however, a UCS- or Unicode-based Scheme implementation should pass them through transparently. +* If any code from the requested range specifies a valid, assigned UCS character that has no corresponding representative in the implementation's character type, then (1) an error is raised if ERROR? is true, and (2) the code is ignored if ERROR? is false (the default). This might happen, for example, if the implementation uses ASCII characters, and the requested range includes non-ASCII characters. + +If character set BASE-CS is provided, the characters specified by the range +are added to it. {{ucs-range->char-set!}} is allowed, but not required, +to side-effect and reuse the storage in BASE-CS; {{ucs-range->char-set}} +produces a fresh character set. + +Note that ASCII codes are a subset of the Latin-1 codes, which are in turn +a subset of the 16-bit Unicode codes, which are themselves a subset of +the 32-bit UCS-4 codes. We commit to a specific encoding in this routine, +regardless of the underlying representation of characters, so that client +code using this library will be portable. ''I.e.'', a conformant Scheme +implementation may use EBCDIC or SHIFT-JIS to encode characters; it +must simply map the UCS characters from the given range into the native +representation when possible, and report errors when not possible. + +<procedure>(->char-set x) -> char-set</procedure><br> + +Coerces X into a char-set. X may be a string, character or char-set. A +string is converted to the set of its constituent characters; a character +is converted to a singleton set; a char-set is returned as-is. This +procedure is intended for use by other procedures that want to provide +"user-friendly," wide-spectrum interfaces to their clients. + + +=== Querying character sets + +<procedure>(char-set-size cs) -> integer</procedure><br> + +Returns the number of elements in character set CS. + +<procedure>(char-set-count pred cs) -> integer</procedure><br> + +Apply PRED to the chars of character set CS, and return the number of chars +that caused the predicate to return true. + +<procedure>(char-set->list cs) -> character-list</procedure><br> + +This procedure returns a list of the members of character set CS. The order +in which CS's characters appear in the list is not defined, and may be +different from one call to another. + +<procedure>(char-set->string cs) -> string</procedure><br> + +This procedure returns a string containing the members of character set CS. +The order in which CS's characters appear in the string is not defined, and +may be different from one call to another. + +<procedure>(char-set-contains? cs char) -> boolean</procedure><br> + +This procedure tests CHAR for membership in character set CS. + +The MIT Scheme character-set package called this procedure +CHAR-SET-MEMBER?, but the argument order isn't consistent with the name. + +<procedure>(char-set-every pred cs) -> boolean</procedure><br> +<procedure>(char-set-any pred cs) -> boolean</procedure><br> + +The {{char-set-every}} procedure returns true if predicate PRED returns +true of every character in the character set CS. Likewise, {{char-set-any}} +applies PRED to every character in character set CS, and returns the first +true value it finds. If no character produces a true value, it returns +false. The order in which these procedures sequence through the elements of +CS is not specified. + +Note that if you need to determine the actual character on which a +predicate returns true, use {{char-set-any}} and arrange for the predicate +to return the character parameter as its true value, ''e.g.'' + + + (char-set-any (lambda (c) (and (char-upper-case? c) c)) + cs) + + +=== Character-set algebra + +<procedure>(char-set-adjoin cs char_1 ...) -> char-set</procedure><br> +<procedure>(char-set-delete cs char_1 ...) -> char-set</procedure><br> + +Add/delete the CHAR_I characters to/from character set CS. + +<procedure>(char-set-adjoin! cs char_1 ...) -> char-set</procedure><br> +<procedure>(char-set-delete! cs char_1 ...) -> char-set</procedure><br> + +Linear-update variants. These procedures are allowed, but not required, to +side-effect their first parameter. + +<procedure>(char-set-complement cs) -> char-set</procedure><br> +<procedure>(char-set-union cs_1 ...) -> char-set</procedure><br> +<procedure>(char-set-intersection cs_1 ...) -> char-set</procedure><br> +<procedure>(char-set-difference cs_1 cs_2 ...) -> char-set</procedure><br> +<procedure>(char-set-xor cs_1 ...) -> char-set</procedure><br> +<procedure>(char-set-diff+intersection cs_1 cs_2 ...) -> [char-set char-set]</procedure><br> + +These procedures implement set complement, union, intersection, difference, +and exclusive-or for character sets. The union, intersection and xor +operations are n-ary. The difference function is also n-ary, associates to +the left (that is, it computes the difference between its first argument +and the union of all the other arguments), and requires at least one +argument. + +Boundary cases: + + + (char-set-union) => char-set:empty + (char-set-intersection) => char-set:full + (char-set-xor) => char-set:empty + (char-set-difference CS) => CS + + +{{char-set-diff+intersection}} returns both the difference and the +intersection of the arguments -- it partitions its first parameter. It is +equivalent to + + + (values (char-set-difference CS_1 CS_2 ...) + (char-set-intersection CS_1 (char-set-union CS_2 ...))) + + +but can be implemented more efficiently. + +Programmers should be aware that {{char-set-complement}} could potentially +be a very expensive operation in Scheme implementations that provide a very +large character type, such as 32-bit Unicode. If this is a possibility, +sets can be complimented with respect to a smaller universe using +{{char-set-difference}}. + +<procedure>(char-set-complement! cs) -> char-set</procedure><br> +<procedure>(char-set-union! cs_1 cs_2 ...) -> char-set</procedure><br> +<procedure>(char-set-intersection! cs_1 cs_2 ...) -> char-set</procedure><br> +<procedure>(char-set-difference! cs_1 cs_2 ...) -> char-set</procedure><br> +<procedure>(char-set-xor! cs_1 cs_2 ...) -> char-set</procedure><br> +<procedure>(char-set-diff+intersection! cs_1 cs_2 cs_3 ...) -> [char-set char-set]</procedure><br> + +These are linear-update variants of the set-algebra functions. They are +allowed, but not required, to side-effect their first (required) parameter. + +{{char-set-diff+intersection!}} is allowed to side-effect both of its two +required parameters, CS_1 and CS_2. + +== Standard character sets + +Several character sets are predefined for convenience: + +<table> +<tr><td>{{char-set:lower-case}}</td><td>Lower-case letters</td></tr> +<tr><td>{{char-set:upper-case}}</td><td>Upper-case letters</td></tr> +<tr><td>{{char-set:title-case}}</td><td>Title-case letters</td></tr> +<tr><td>{{char-set:letter}}</td><td>Letters</td></tr> +<tr><td>{{char-set:digit}}</td><td>Digits</td></tr> +<tr><td>{{char-set:letter+digit}}</td><td>Letters and digits</td></tr> +<tr><td>{{char-set:graphic}}</td><td>Printing characters except spaces</td></tr> +<tr><td>{{char-set:printing}}</td><td>Printing characters including spaces</td></tr> +<tr><td>{{char-set:whitespace}}</td><td>Whitespace characters</td></tr> +<tr><td>{{char-set:iso-control}}</td><td>The ISO control characters</td></tr> +<tr><td>{{char-set:punctuation}}</td><td>Punctuation characters</td></tr> +<tr><td>{{char-set:symbol}}</td><td>Symbol characters</td></tr> +<tr><td>{{char-set:hex-digit}}</td><td>A hexadecimal digit: 0-9, A-F, a-f</td></tr> +<tr><td>{{char-set:blank}}</td><td>Blank characters -- horizontal whitespace</td></tr> +<tr><td>{{char-set:ascii}}</td><td>All characters in the ASCII set.</td></tr> +<tr><td>{{char-set:empty}}</td><td>Empty set</td></tr> +<tr><td>{{char-set:full}}</td><td>All characters</td></tr></table> + +In Unicode Scheme implementations, the base character sets are compatible +with Java's Unicode specifications. For ASCII or Latin-1, we simply +restrict the Unicode set specifications to their first 128 or 256 codes, +respectively. + +Here are the definitions for some of the sets in an ASCII implementation: + +<table> +<tr><td>{{char-set:lower-case}}</td><td>a-z</td></tr> +<tr><td>{{char-set:upper-case}}</td><td>A-Z</td></tr> +<tr><td>{{char-set:letter}}</td><td>A-Z and a-z</td></tr> +<tr><td>{{char-set:digit}}</td><td>0123456789</td></tr> +<tr><td>{{char-set:punctuation}}</td><td>{{!"#%&'()*,-./:;?@[\]_{}}}</td></tr> +<tr><td>{{char-set:symbol}}</td><td>{{$+<=>^`|~}}</td></tr> +<tr><td>{{char-set:whitespace}}</td><td>Space, newline, tab, form feed, vertical tab, carriage return</td></tr> +<tr><td>{{char-set:blank}}</td><td>Space and tab</td></tr> +<tr><td>{{char-set:graphic}}</td><td>letter + digit + punctuation + symbol</td></tr> +<tr><td>{{char-set:printing}}</td><td>graphic + whitespace</td></tr> +<tr><td>{{char-set:iso-control}}</td><td>ASCII 0-31 and 127</td></tr></table> + +=== Character set constants + +<constant>char-set:lower-case</constant> + +For Unicode, a character is lowercase if + + +* it is not in the range [U+2000,U+2FFF], and +* the Unicode attribute table does not give a lowercase mapping for it, and +* at least one of the following is true: +** the Unicode attribute table gives a mapping to uppercase for the character, or +** the name for the character in the Unicode attribute table contains the words "SMALL LETTER" or "SMALL LIGATURE". + +The lower-case ASCII characters are + +abcdefghijklmnopqrstuvwxyz + +Latin-1 adds another 33 lower-case characters to the ASCII set: + + +<table> +<tr><td>00B5</td><td>MICRO SIGN</td></tr> +<tr><td>00DF</td><td>LATIN SMALL LETTER SHARP S</td></tr> +<tr><td>00E0</td><td>LATIN SMALL LETTER A WITH GRAVE</td></tr> +<tr><td>00E1</td><td>LATIN SMALL LETTER A WITH ACUTE</td></tr> +<tr><td>00E2</td><td>LATIN SMALL LETTER A WITH CIRCUMFLEX</td></tr> +<tr><td>00E3</td><td>LATIN SMALL LETTER A WITH TILDE</td></tr> +<tr><td>00E4</td><td>LATIN SMALL LETTER A WITH DIAERESIS</td></tr> +<tr><td>00E5</td><td>LATIN SMALL LETTER A WITH RING ABOVE</td></tr> +<tr><td>00E6</td><td>LATIN SMALL LETTER AE</td></tr> +<tr><td>00E7</td><td>LATIN SMALL LETTER C WITH CEDILLA</td></tr> +<tr><td>00E8</td><td>LATIN SMALL LETTER E WITH GRAVE</td></tr> +<tr><td>00E9</td><td>LATIN SMALL LETTER E WITH ACUTE</td></tr> +<tr><td>00EA</td><td>LATIN SMALL LETTER E WITH CIRCUMFLEX</td></tr> +<tr><td>00EB</td><td>LATIN SMALL LETTER E WITH DIAERESIS</td></tr> +<tr><td>00EC</td><td>LATIN SMALL LETTER I WITH GRAVE</td></tr> +<tr><td>00ED</td><td>LATIN SMALL LETTER I WITH ACUTE</td></tr> +<tr><td>00EE</td><td>LATIN SMALL LETTER I WITH CIRCUMFLEX</td></tr> +<tr><td>00EF</td><td>LATIN SMALL LETTER I WITH DIAERESIS</td></tr> +<tr><td>00F0</td><td>LATIN SMALL LETTER ETH</td></tr> +<tr><td>00F1</td><td>LATIN SMALL LETTER N WITH TILDE</td></tr> +<tr><td>00F2</td><td>LATIN SMALL LETTER O WITH GRAVE</td></tr> +<tr><td>00F3</td><td>LATIN SMALL LETTER O WITH ACUTE</td></tr> +<tr><td>00F4</td><td>LATIN SMALL LETTER O WITH CIRCUMFLEX</td></tr> +<tr><td>00F5</td><td>LATIN SMALL LETTER O WITH TILDE</td></tr> +<tr><td>00F6</td><td>LATIN SMALL LETTER O WITH DIAERESIS</td></tr> +<tr><td>00F8</td><td>LATIN SMALL LETTER O WITH STROKE</td></tr> +<tr><td>00F9</td><td>LATIN SMALL LETTER U WITH GRAVE</td></tr> +<tr><td>00FA</td><td>LATIN SMALL LETTER U WITH ACUTE</td></tr> +<tr><td>00FB</td><td>LATIN SMALL LETTER U WITH CIRCUMFLEX</td></tr> +<tr><td>00FC</td><td>LATIN SMALL LETTER U WITH DIAERESIS</td></tr> +<tr><td>00FD</td><td>LATIN SMALL LETTER Y WITH ACUTE</td></tr> +<tr><td>00FE</td><td>LATIN SMALL LETTER THORN</td></tr> +<tr><td>00FF</td><td>LATIN SMALL LETTER Y WITH DIAERESIS</td></tr></table> + +Note that three of these have no corresponding Latin-1 upper-case +character: + + +<table> +<tr><td>00B5</td><td>MICRO SIGN</td></tr> +<tr><td>00DF</td><td>LATIN SMALL LETTER SHARP S</td></tr> +<tr><td>00FF</td><td>LATIN SMALL LETTER Y WITH DIAERESIS</td></tr></table> + +(The compatibility micro character uppercases to the non-Latin-1 Greek +capital mu; the German sharp s character uppercases to the pair of +characters "SS," and the capital y-with-diaeresis is non-Latin-1.) + +<constant>char-set:upper-case</constant> + +For Unicode, a character is uppercase if + + +* it is not in the range [U+2000,U+2FFF], and +* the Unicode attribute table does not give an uppercase mapping for it (this excludes titlecase characters), and +* at least one of the following is true: +** the Unicode attribute table gives a mapping to lowercase for the character, or +** the name for the character in the Unicode attribute table contains the words "CAPITAL LETTER" or "CAPITAL LIGATURE". + +The upper-case ASCII characters are + +ABCDEFGHIJKLMNOPQRSTUVWXYZ + +Latin-1 adds another 30 upper-case characters to the ASCII set: + + +<table> +<tr><td>00C0</td><td>LATIN CAPITAL LETTER A WITH GRAVE</td></tr> +<tr><td>00C1</td><td>LATIN CAPITAL LETTER A WITH ACUTE</td></tr> +<tr><td>00C2</td><td>LATIN CAPITAL LETTER A WITH CIRCUMFLEX</td></tr> +<tr><td>00C3</td><td>LATIN CAPITAL LETTER A WITH TILDE</td></tr> +<tr><td>00C4</td><td>LATIN CAPITAL LETTER A WITH DIAERESIS</td></tr> +<tr><td>00C5</td><td>LATIN CAPITAL LETTER A WITH RING ABOVE</td></tr> +<tr><td>00C6</td><td>LATIN CAPITAL LETTER AE</td></tr> +<tr><td>00C7</td><td>LATIN CAPITAL LETTER C WITH CEDILLA</td></tr> +<tr><td>00C8</td><td>LATIN CAPITAL LETTER E WITH GRAVE</td></tr> +<tr><td>00C9</td><td>LATIN CAPITAL LETTER E WITH ACUTE</td></tr> +<tr><td>00CA</td><td>LATIN CAPITAL LETTER E WITH CIRCUMFLEX</td></tr> +<tr><td>00CB</td><td>LATIN CAPITAL LETTER E WITH DIAERESIS</td></tr> +<tr><td>00CC</td><td>LATIN CAPITAL LETTER I WITH GRAVE</td></tr> +<tr><td>00CD</td><td>LATIN CAPITAL LETTER I WITH ACUTE</td></tr> +<tr><td>00CE</td><td>LATIN CAPITAL LETTER I WITH CIRCUMFLEX</td></tr> +<tr><td>00CF</td><td>LATIN CAPITAL LETTER I WITH DIAERESIS</td></tr> +<tr><td>00D0</td><td>LATIN CAPITAL LETTER ETH</td></tr> +<tr><td>00D1</td><td>LATIN CAPITAL LETTER N WITH TILDE</td></tr> +<tr><td>00D2</td><td>LATIN CAPITAL LETTER O WITH GRAVE</td></tr> +<tr><td>00D3</td><td>LATIN CAPITAL LETTER O WITH ACUTE</td></tr> +<tr><td>00D4</td><td>LATIN CAPITAL LETTER O WITH CIRCUMFLEX</td></tr> +<tr><td>00D5</td><td>LATIN CAPITAL LETTER O WITH TILDE</td></tr> +<tr><td>00D6</td><td>LATIN CAPITAL LETTER O WITH DIAERESIS</td></tr> +<tr><td>00D8</td><td>LATIN CAPITAL LETTER O WITH STROKE</td></tr> +<tr><td>00D9</td><td>LATIN CAPITAL LETTER U WITH GRAVE</td></tr> +<tr><td>00DA</td><td>LATIN CAPITAL LETTER U WITH ACUTE</td></tr> +<tr><td>00DB</td><td>LATIN CAPITAL LETTER U WITH CIRCUMFLEX</td></tr> +<tr><td>00DC</td><td>LATIN CAPITAL LETTER U WITH DIAERESIS</td></tr> +<tr><td>00DD</td><td>LATIN CAPITAL LETTER Y WITH ACUTE</td></tr> +<tr><td>00DE</td><td>LATIN CAPITAL LETTER THORN</td></tr></table> + + +<constant>char-set:title-case</constant> + +In Unicode, a character is titlecase if it has the category Lt in the +character attribute database. There are very few of these characters; here +is the entire 31-character list as of Unicode 3.0: + + +<table> +<tr><td>01C5</td><td>LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON</td></tr> +<tr><td>01C8</td><td>LATIN CAPITAL LETTER L WITH SMALL LETTER J</td></tr> +<tr><td>01CB</td><td>LATIN CAPITAL LETTER N WITH SMALL LETTER J</td></tr> +<tr><td>01F2</td><td>LATIN CAPITAL LETTER D WITH SMALL LETTER Z</td></tr> +<tr><td>1F88</td><td>GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI</td></tr> +<tr><td>1F89</td><td>GREEK CAPITAL LETTER ALPHA WITH DASIA AND PROSGEGRAMMENI</td></tr> +<tr><td>1F8A</td><td>GREEK CAPITAL LETTER ALPHA WITH PSILI AND VARIA AND PROSGEGRAMMENI</td></tr> +<tr><td>1F8B</td><td>GREEK CAPITAL LETTER ALPHA WITH DASIA AND VARIA AND PROSGEGRAMMENI</td></tr> +<tr><td>1F8C</td><td>GREEK CAPITAL LETTER ALPHA WITH PSILI AND OXIA AND PROSGEGRAMMENI</td></tr> +<tr><td>1F8D</td><td>GREEK CAPITAL LETTER ALPHA WITH DASIA AND OXIA AND PROSGEGRAMMENI</td></tr> +<tr><td>1F8E</td><td>GREEK CAPITAL LETTER ALPHA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI</td></tr> +<tr><td>1F8F</td><td>GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI</td></tr> +<tr><td>1F98</td><td>GREEK CAPITAL LETTER ETA WITH PSILI AND PROSGEGRAMMENI</td></tr> +<tr><td>1F99</td><td>GREEK CAPITAL LETTER ETA WITH DASIA AND PROSGEGRAMMENI</td></tr> +<tr><td>1F9A</td><td>GREEK CAPITAL LETTER ETA WITH PSILI AND VARIA AND PROSGEGRAMMENI</td></tr> +<tr><td>1F9B</td><td>GREEK CAPITAL LETTER ETA WITH DASIA AND VARIA AND PROSGEGRAMMENI</td></tr> +<tr><td>1F9C</td><td>GREEK CAPITAL LETTER ETA WITH PSILI AND OXIA AND PROSGEGRAMMENI</td></tr> +<tr><td>1F9D</td><td>GREEK CAPITAL LETTER ETA WITH DASIA AND OXIA AND PROSGEGRAMMENI</td></tr> +<tr><td>1F9E</td><td>GREEK CAPITAL LETTER ETA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI</td></tr> +<tr><td>1F9F</td><td>GREEK CAPITAL LETTER ETA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI</td></tr> +<tr><td>1FA8</td><td>GREEK CAPITAL LETTER OMEGA WITH PSILI AND PROSGEGRAMMENI</td></tr> +<tr><td>1FA9</td><td>GREEK CAPITAL LETTER OMEGA WITH DASIA AND PROSGEGRAMMENI</td></tr> +<tr><td>1FAA</td><td>GREEK CAPITAL LETTER OMEGA WITH PSILI AND VARIA AND PROSGEGRAMMENI</td></tr> +<tr><td>1FAB</td><td>GREEK CAPITAL LETTER OMEGA WITH DASIA AND VARIA AND PROSGEGRAMMENI</td></tr> +<tr><td>1FAC</td><td>GREEK CAPITAL LETTER OMEGA WITH PSILI AND OXIA AND PROSGEGRAMMENI</td></tr> +<tr><td>1FAD</td><td>GREEK CAPITAL LETTER OMEGA WITH DASIA AND OXIA AND PROSGEGRAMMENI</td></tr> +<tr><td>1FAE</td><td>GREEK CAPITAL LETTER OMEGA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI</td></tr> +<tr><td>1FAF</td><td>GREEK CAPITAL LETTER OMEGA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI</td></tr> +<tr><td>1FBC</td><td>GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI</td></tr> +<tr><td>1FCC</td><td>GREEK CAPITAL LETTER ETA WITH PROSGEGRAMMENI</td></tr> +<tr><td>1FFC</td><td>GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI</td></tr></table> + +There are no ASCII or Latin-1 titlecase characters. + + +<constant>char-set:letter</constant> + +In Unicode, a letter is any character with one of the letter categories +(Lu, Ll, Lt, Lm, Lo) in the Unicode character database. + +There are 52 ASCII letters + +abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ + +There are 117 Latin-1 letters. These are the 115 characters that are +members of the Latin-1 {{char-set:lower-case}} and {{char-set:upper-case}} +sets, plus + + +<table> +<tr><td>00AA</td><td>FEMININE ORDINAL INDICATOR</td></tr> +<tr><td>00BA</td><td>MASCULINE ORDINAL INDICATOR</td></tr></table> + +(These two letters are considered lower-case by Unicode, but not by SRFI 14.) + + +<constant>char-set:digit</constant> + +In Unicode, a character is a digit if it has the category Nd in the +character attribute database. In Latin-1 and ASCII, the only such +characters are 0123456789. In Unicode, there are other digit characters in +other code blocks, such as Gujarati digits and Tibetan digits. + + +<constant>char-set:hex-digit</constant> + +The only hex digits are 0123456789abcdefABCDEF. + + +<constant>char-set:letter+digit</constant> + +The union of {{char-set:letter}} and {{char-set:digit.}} + + +<constant>char-set:graphic</constant> + +A graphic character is one that would put ink on paper. The ASCII and +Latin-1 graphic characters are the members of + + +<table> +<tr><td>{{char-set:letter}}</td></tr> +<tr><td>{{char-set:digit}}</td></tr> +<tr><td>{{char-set:punctuation}}</td></tr> +<tr><td>{{char-set:symbol}}</td></tr></table> + + +<constant>char-set:printing</constant> + +A printing character is one that would occupy space when printed, ''i.e.'', +a graphic character or a space character. {{char-set:printing}} is the +union of {{char-set:whitespace}} and {{char-set:graphic.}} + + +<constant>char-set:whitespace</constant> + +In Unicode, a whitespace character is either + + +* a character with one of the space, line, or paragraph separator categories (Zs, Zl or Zp) of the Unicode character database. +* U+0009 Horizontal tabulation (\t control-I) +* U+000A Line feed (\n control-J) +* U+000B Vertical tabulation (\v control-K) +* U+000C Form feed (\f control-L) +* U+000D Carriage return (\r control-M) + +There are 24 whitespace characters in Unicode 3.0: + + +<table> +<tr><td>0009</td><td>HORIZONTAL TABULATION</td><td>\t control-I</td></tr> +<tr><td>000A</td><td>LINE FEED</td><td>\n control-J</td></tr> +<tr><td>000B</td><td>VERTICAL TABULATION</td><td>\v control-K</td></tr> +<tr><td>000C</td><td>FORM FEED</td><td>\f control-L</td></tr> +<tr><td>000D</td><td>CARRIAGE RETURN</td><td>\r control-M</td></tr> +<tr><td>0020</td><td>SPACE</td><td>Zs</td></tr> +<tr><td>00A0</td><td>NO-BREAK SPACE</td><td>Zs</td></tr> +<tr><td>1680</td><td>OGHAM SPACE MARK</td><td>Zs</td></tr> +<tr><td>2000</td><td>EN QUAD</td><td>Zs</td></tr> +<tr><td>2001</td><td>EM QUAD</td><td>Zs</td></tr> +<tr><td>2002</td><td>EN SPACE</td><td>Zs</td></tr> +<tr><td>2003</td><td>EM SPACE</td><td>Zs</td></tr> +<tr><td>2004</td><td>THREE-PER-EM SPACE</td><td>Zs</td></tr> +<tr><td>2005</td><td>FOUR-PER-EM SPACE</td><td>Zs</td></tr> +<tr><td>2006</td><td>SIX-PER-EM SPACE</td><td>Zs</td></tr> +<tr><td>2007</td><td>FIGURE SPACE</td><td>Zs</td></tr> +<tr><td>2008</td><td>PUNCTUATION SPACE</td><td>Zs</td></tr> +<tr><td>2009</td><td>THIN SPACE</td><td>Zs</td></tr> +<tr><td>200A</td><td>HAIR SPACE</td><td>Zs</td></tr> +<tr><td>200B</td><td>ZERO WIDTH SPACE</td><td>Zs</td></tr> +<tr><td>2028</td><td>LINE SEPARATOR</td><td>Zl</td></tr> +<tr><td>2029</td><td>PARAGRAPH SEPARATOR</td><td>Zp</td></tr> +<tr><td>202F</td><td>NARROW NO-BREAK SPACE</td><td>Zs</td></tr> +<tr><td>3000</td><td>IDEOGRAPHIC SPACE</td><td>Zs</td></tr></table> + +The ASCII whitespace characters are the first six characters in the +above list -- line feed, horizontal tabulation, vertical tabulation, form +feed, carriage return, and space. These are also exactly the characters +recognised by the Posix {{isspace()}} procedure. Latin-1 adds the no-break +space. + +<constant>char-set:iso-control</constant> + +The ISO control characters are the Unicode/Latin-1 characters in the ranges +[U+0000,U+001F] and [U+007F,U+009F]. + +ASCII restricts this set to the characters in the range [U+0000,U+001F] +plus the character U+007F. + +Note that Unicode defines other control characters which do not belong to +this set (hence the qualifying prefix "iso-" in the name). + +<constant>char-set:punctuation</constant> + +In Unicode, a punctuation character is any character that has one of the +punctuation categories in the Unicode character database (Pc, Pd, Ps, Pe, +Pi, Pf, or Po.) + +ASCII has 23 punctuation characters: + + + !"#%&'()*,-./:;?@[\]_{} + +Latin-1 adds six more: + + +<table> +<tr><td>00A1</td><td>INVERTED EXCLAMATION MARK</td></tr> +<tr><td>00AB</td><td>LEFT-POINTING DOUBLE ANGLE QUOTATION MARK</td></tr> +<tr><td>00AD</td><td>SOFT HYPHEN</td></tr> +<tr><td>00B7</td><td>MIDDLE DOT</td></tr> +<tr><td>00BB</td><td>RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK</td></tr> +<tr><td>00BF</td><td>INVERTED QUESTION MARK</td></tr></table> + +Note that the nine ASCII characters {{$+<=>^`|~}} are ''not'' punctuation. +They are "symbols." + + +<constant>char-set:symbol</constant> + +In Unicode, a symbol is any character that has one of the symbol categories +in the Unicode character database (Sm, Sc, Sk, or So). There are nine ASCII +symbol characters: + + + $+<=>^`|~ + +Latin-1 adds 18 more: + + +<table> +<tr><td>00A2</td><td>CENT SIGN</td></tr> +<tr><td>00A3</td><td>POUND SIGN</td></tr> +<tr><td>00A4</td><td>CURRENCY SIGN</td></tr> +<tr><td>00A5</td><td>YEN SIGN</td></tr> +<tr><td>00A6</td><td>BROKEN BAR</td></tr> +<tr><td>00A7</td><td>SECTION SIGN</td></tr> +<tr><td>00A8</td><td>DIAERESIS</td></tr> +<tr><td>00A9</td><td>COPYRIGHT SIGN</td></tr> +<tr><td>00AC</td><td>NOT SIGN</td></tr> +<tr><td>00AE</td><td>REGISTERED SIGN</td></tr> +<tr><td>00AF</td><td>MACRON</td></tr> +<tr><td>00B0</td><td>DEGREE SIGN</td></tr> +<tr><td>00B1</td><td>PLUS-MINUS SIGN</td></tr> +<tr><td>00B4</td><td>ACUTE ACCENT</td></tr> +<tr><td>00B6</td><td>PILCROW SIGN</td></tr> +<tr><td>00B8</td><td>CEDILLA</td></tr> +<tr><td>00D7</td><td>MULTIPLICATION SIGN</td></tr> +<tr><td>00F7</td><td>DIVISION SIGN</td></tr></table> + + +<constant>char-set:blank</constant> + +Blank chars are horizontal whitespace. In Unicode, a blank character is +either + + +* a character with the space separator category (Zs) in the Unicode character database. +* U+0009 Horizontal tabulation (\t control-I) + +There are eighteen blank characters in Unicode 3.0: + + +<table> +<tr><td>0009</td><td>HORIZONTAL TABULATION</td><td>\t control-I</td></tr> +<tr><td>0020</td><td>SPACE</td><td>Zs</td></tr> +<tr><td>00A0</td><td>NO-BREAK SPACE</td><td>Zs</td></tr> +<tr><td>1680</td><td>OGHAM SPACE MARK</td><td>Zs</td></tr> +<tr><td>2000</td><td>EN QUAD</td><td>Zs</td></tr> +<tr><td>2001</td><td>EM QUAD</td><td>Zs</td></tr> +<tr><td>2002</td><td>EN SPACE</td><td>Zs</td></tr> +<tr><td>2003</td><td>EM SPACE</td><td>Zs</td></tr> +<tr><td>2004</td><td>THREE-PER-EM SPACE</td><td>Zs</td></tr> +<tr><td>2005</td><td>FOUR-PER-EM SPACE</td><td>Zs</td></tr> +<tr><td>2006</td><td>SIX-PER-EM SPACE</td><td>Zs</td></tr> +<tr><td>2007</td><td>FIGURE SPACE</td><td>Zs</td></tr> +<tr><td>2008</td><td>PUNCTUATION SPACE</td><td>Zs</td></tr> +<tr><td>2009</td><td>THIN SPACE</td><td>Zs</td></tr> +<tr><td>200A</td><td>HAIR SPACE</td><td>Zs</td></tr> +<tr><td>200B</td><td>ZERO WIDTH SPACE</td><td>Zs</td></tr> +<tr><td>202F</td><td>NARROW NO-BREAK SPACE</td><td>Zs</td></tr> +<tr><td>3000</td><td>IDEOGRAPHIC SPACE</td><td>Zs</td></tr></table> + +The ASCII blank characters are the first two characters above -- horizontal +tab and space. Latin-1 adds the no-break space. --- Previous: [[Unit srfi-13]] diff --git a/manual/Unit srfi-18 b/manual/Unit srfi-18 index 1386f814..e47a3bfb 100644 --- a/manual/Unit srfi-18 +++ b/manual/Unit srfi-18 @@ -4,11 +4,24 @@ == Unit srfi-18 -A simple multithreading package. This threading package follows largely -the specification of SRFI-18. For more information see the documentation -for [[http://srfi.schemers.org/srfi-18/srfi-18.html|SRFI-18]]. +A simple multithreading package, largely following the specification +of [[http://srfi.schemers.org/srfi-18/srfi-18.html|SRFI-18]]. This +document contains the core of the SRFI-18 documentation as well as +information on Chicken deviations from the spec. -'''Notes:''' +SRFI-18 defines the following multithreading datatypes: + +* Thread +* Mutex +* Condition variable +* Time + +It also defines a mechanism to handle exceptions and some multithreading +exception datatypes. + +== Chicken implementation + +=== Notes * {{thread-start!}} accepts a thunk (a zero argument procedure) as argument, which is equivalent to {{(thread-start! (make-thread THUNK))}}. @@ -34,11 +47,9 @@ for [[http://srfi.schemers.org/srfi-18/srfi-18.html|SRFI-18]]. * When an error is triggered inside the execution context of a thread, the default exception-handler will simply terminate the thread (and store the error condition for later use). Pending {{dynamic-wind}} thunks will ''not'' be invoked. Use a custom exception handler for the thread in that case. -The following procedures are provided, in addition to the procedures defined in SRFI-18: +=== Procedures - - -=== thread-signal! +The following procedures are provided in addition to the procedures defined in SRFI-18. <procedure>(thread-signal! THREAD X)</procedure> @@ -46,48 +57,34 @@ This will cause {{THREAD}} to signal the condition {{X}} once it is scheduled for execution. After signalling the condition, the thread continues with its normal execution. -=== thread-quantum - <procedure>(thread-quantum THREAD)</procedure> Returns the quantum of {{THREAD}}, which is an exact integer specifying the approximate time-slice of the thread in milliseconds. -=== thread-quantum-set! - <procedure>(thread-quantum-set! THREAD QUANTUM)</procedure> Sets the quantum of {{THREAD}} to {{QUANTUM}}. -=== thread-suspend! - <procedure>(thread-suspend! THREAD)</procedure> Suspends the execution of {{THREAD}} until resumed. -=== thread-resume! - <procedure>(thread-resume! THREAD)</procedure> Readies the suspended thread {{THREAD}}. -=== thread-wait-for-i/o! - <procedure>(thread-wait-for-i/o! FD [MODE])</procedure> Suspends the current thread until input ({{MODE}} is {{#:input}}), output ({{MODE}} is {{#:output}}) or both ({{MODE}} is {{#:all}}) is available. {{FD}} should be a file-descriptor (not a port!) open for input or output, respectively. -=== time->milliseconds - <procedure>(time->milliseconds TIME)</procedure> Converts a time object (as created via {{current-time}}) into an exact integer representing the number of milliseconds since process startup. -=== milliseconds->time - <procedure>(milliseconds->time ms)</procedure> Converts into a time object an exact integer representing @@ -99,6 +96,940 @@ This procedure may be useful in combination with {{thread-sleep!}} when your com (thread-sleep! (milliseconds->time (+ ms (current-milliseconds))))) + +== SRFI-18 specification + +The thread system provides the following data types: + +* Thread (a virtual processor which shares object space with all other threads) +* Mutex (a mutual exclusion device, also known as a lock and binary semaphore) +* Condition variable (a set of blocked threads) +* Time (an absolute point on the time line) + +Some multithreading exception datatypes are also specified, and a general +mechanism for handling exceptions. + +=== Background information + +==== Threads + +A "running" thread is a thread that is currently executing. There can be +more than one running thread on a multiprocessor machine. A "runnable" +thread is a thread that is ready to execute or running. A thread is +"blocked" if it is waiting for a mutex to become unlocked, an I/O operation +to become possible, the end of a "sleep" period, etc. A "new" thread is a +thread that has not yet become runnable. A new thread becomes runnable when +it is started. A "terminated" thread is a thread that can no longer become +runnable (but "deadlocked" threads are not considered terminated). The +only valid transitions between the thread states are from new to runnable, +between runnable and blocked, and from any state to terminated: + + + unblock + start <------- + NEW -------> RUNNABLE -------> BLOCKED + \ | block / + \ v / + +-----> TERMINATED <----+ + + +Each thread has a "specific" field which can be used in an application +specific way to associate data with the thread (some thread systems call +this "thread local storage"). + +==== Mutexes + +A mutex can be in one of four states: locked (either owned or not owned) +and unlocked (either abandoned or not abandoned). An attempt to lock +a mutex only succeeds if the mutex is in an unlocked state, otherwise +the current thread must wait. A mutex in the locked/owned state has an +associated "owner" thread, which by convention is the thread that is +responsible for unlocking the mutex (this case is typical of critical +sections implemented as "lock mutex, perform operation, unlock mutex"). A +mutex in the locked/not-owned state is not linked to a particular thread. +A mutex becomes locked when a thread locks it using the {{mutex-lock!}} +primitive. A mutex becomes unlocked/abandoned when the owner of a +locked/owned mutex terminates. A mutex becomes unlocked/not-abandoned +when a thread unlocks it using the {{mutex-unlock!}} primitive. The mutex +primitives specified in this SRFI do not implement "recursive" mutex +semantics; an attempt to lock a mutex that is locked implies that the +current thread must wait even if the mutex is owned by the current thread +(this can lead to a deadlock if no other thread unlocks the mutex). + +Each mutex has a "specific" field which can be used in an application +specific way to associate data with the mutex. + + +==== Condition variables + +A condition variable represents a set of blocked threads. These blocked +threads are waiting for a certain condition to become true. When a thread +modifies some program state that might make the condition true, the thread +unblocks some number of threads (one or all depending on the primitive +used) so they can check the value of the condition. This allows complex +forms of interthread synchronization to be expressed more conveniently than +with mutexes alone. + +Each condition variable has a "specific" field which can be used in an +application specific way to associate data with the condition variable. + + +==== Fairness + +In various situations the scheduler must select one thread from a set of +threads (e.g. which thread to run when a running thread blocks or expires +its quantum, which thread to unblock when a mutex unlocks or a condition +variable is signaled). The constraints on the selection process determine +the scheduler's "fairness". Typically the selection depends on the order in +which threads become runnable or blocked and on some "priority" attached to +the threads. + +Because we do not wish to preclude extensions to this SRFI (such as for +real-time multithreading) that require specific fairness constraints, there +are no fairness constraints imposed by this SRFI. It is expected however +that implementations of Scheme that support this SRFI will document the +fairness constraints they provide. + + +==== Memory coherency and lack of atomicity + +Read and write operations on the store (such as reading and writing a +variable, an element of a vector or a string) are not required to be +atomic. It is an error for a thread to write a location in the store +while some other thread reads or writes that same location. It is the +responsibility of the application to avoid write/read and write/write races +through appropriate uses of the synchronization primitives. + +Concurrent reads and writes to ports are allowed. It is the responsibility +of the implementation to serialize accesses to a given port using the +appropriate synchronization primitives. + + +==== Dynamic environments, continuations and {{dynamic-wind}} + +The "dynamic environment" is a structure which allows the system to find +the value returned by {{current-input-port}}, {{current-output-port}}, +etc. The procedures {{with-input-from-file}}, {{with-output-to-file}}, +etc extend the dynamic environment to produce a new dynamic environment +which is in effect for the duration of the call to the thunk passed as the +last argument. Some Scheme systems generalize the dynamic environment by +providing procedures and special forms to define new "dynamic variables" +and bind them in the dynamic environment (e.g. {{make-parameter}} and +{{parameterize}}). + +Each thread has its own dynamic environment. When a thread's dynamic +environment is extended this does not affect the dynamic environment +of other threads. When a thread creates a continuation, the thread's +dynamic environment and the {{dynamic-wind}} stack are saved within +the continuation (an alternate but equivalent point of view is that the +{{dynamic-wind}} stack is part of the dynamic environment). When this +continuation is invoked the required {{dynamic-wind}} before and after +thunks are called and the saved dynamic environment is reinstated as the +dynamic environment of the current thread. During the call to each required +{{dynamic-wind}} before and after thunk, the dynamic environment and the +{{dynamic-wind}} stack in effect when the corresponding {{dynamic-wind}} +was executed are reinstated. Note that this specification clearly defines +the semantics of calling {{call-with-current-continuation}} or invoking a +continuation within a before or after thunk. The semantics are well defined +even when a continuation created by another thread is invoked. Below is an +example exercising the subtleties of this semantics. + + + (with-output-to-file + "foo" + (lambda () + (let ((k (call-with-current-continuation + (lambda (exit) + (with-output-to-file + "bar" + (lambda () + (dynamic-wind + (lambda () (write '(b1))) + (lambda () + (let ((x (call-with-current-continuation + (lambda (cont) (exit cont))))) + (write '(t1)) + x)) + (lambda () (write '(a1)))))))))) + (if k + (dynamic-wind + (lambda () (write '(b2))) + (lambda () + (with-output-to-file + "baz" + (lambda () + (write '(t2)) + ; go back inside (with-output-to-file "bar" ...) + (k #f)))) + (lambda () (write '(a2)))))))) + +In an implementation of Scheme where {{with-output-to-file}} only closes +the port it opened when the thunk returns normally, then the following +actions will occur: {{(b1)(a1)}} is written to "bar", {{(b2)}} is written +to "foo", {{(t2)}} is written to "baz", {{(a2)}} is written to "foo", and +{{(b1)(t1)(a1)}} is written to "bar". + +When the scheduler stops the execution of a running thread T1 (whether +because it blocked, expired its quantum, was terminated, etc) and then +resumes the execution of a thread T2, there is in a sense a transfer of +control between T1's current continuation and the continuation of T2. This +transfer of control by the scheduler does not cause any {{dynamic-wind}} +before and after thunks to be called. It is only when a thread itself +transfers control to a continuation that {{dynamic-wind}} before and after +thunks are called. + + +==== Time objects and timeouts + +A time object represents a point on the time line. Its resolution is +implementation dependent (implementations are encouraged to implement at +least millisecond resolution so that precise timing is possible). Using +{{time->seconds}} and {{seconds->time}}, a time object can be converted +to and from a real number which corresponds to the number of seconds from +a reference point on the time line. The reference point is implementation +dependent and does not change for a given execution of the program (e.g. +the reference point could be the time at which the program started). + +All synchronization primitives which take a timeout parameter accept three +types of values as a timeout, with the following meaning: + + +* a time object represents an absolute point in time +* an exact or inexact real number represents a relative time in seconds from the moment the primitive was called +* {{#f}} means that there is no timeout + +When a timeout denotes the current time or a time in the past, the +synchronization primitive claims that the timeout has been reached only +after the other synchronization conditions have been checked. Moreover the +thread remains running (it does not enter the blocked state). For example, +{{(mutex-lock! m 0)}} will lock mutex {{m}} and return {{#t}} if {{m}} is +currently unlocked, otherwise {{#f}} is returned because the timeout is +reached. + + +==== Primitives and exceptions + +When one of the primitives defined in this SRFI raises an exception defined +in this SRFI, the exception handler is called with the same continuation +as the primitive (i.e. it is a tail call to the exception handler). This +requirement avoids having to use {{call-with-current-continuation}} to get +the same effect in some situations. + + +==== Primordial thread + +The execution of a program is initially under the control of a single +thread known as the "primordial thread". The primordial thread has an +unspecified name, specific field, dynamic environment, {{dynamic-wind}} +stack, and exception handler. All threads are terminated when the +primordial thread terminates (normally or not). + + +=== Procedures + +<procedure>(current-thread)</procedure><br> + +Returns the current thread. + + + (eq? (current-thread) (current-thread)) ==> #t + + +<procedure>(thread? obj)</procedure><br> + +Returns {{#t}} if {{obj}} is a thread, otherwise returns {{#f}}. + + + (thread? (current-thread)) ==> #t + (thread? 'foo) ==> #f + +<procedure>(make-thread thunk [name])</procedure><br> + +Returns a new thread. This thread is not automatically made runnable +(the procedure {{thread-start!}} must be used for this). + +A thread has the following fields: name, specific, end-result, +end-exception, and a list of locked/owned mutexes it owns. The +thread's execution consists of a call to ''thunk'' with the "initial +continuation". This continuation causes the (then) current thread to +store the result in its end-result field, abandon all mutexes it owns, +and finally terminate. The {{dynamic-wind}} stack of the initial +continuation is empty. The optional {{name}} is an arbitrary Scheme +object which identifies the thread (useful for debugging); it defaults +to an unspecified value. The specific field is set to an unspecified +value. + +The thread inherits the dynamic environment from the current +thread. Moreover, in this dynamic environment the exception handler is +bound to the "initial exception handler" which is a unary procedure +which causes the (then) current thread to store in its end-exception +field an "uncaught exception" object whose "reason" is the argument of +the handler, abandon all mutexes it owns, and finally terminate. + + (make-thread (lambda () (write 'hello))) ==> ''a thread'' + +<procedure>(thread-name thread)</procedure><br> + +Returns the name of the {{thread}}. + + + (thread-name (make-thread (lambda () #f) 'foo)) ==> foo + +<procedure>(thread-specific thread)</procedure><br> + +Returns the content of the {{thread}}'s specific field. + +<procedure>(thread-specific-set! thread obj)</procedure><br> + +Stores {{obj}} into the {{thread}}'s specific field. +{{thread-specific-set!}} returns an unspecified value. + + + (thread-specific-set! (current-thread) "hello") ==> ''unspecified'' + + (thread-specific (current-thread)) ==> "hello" + +<procedure>(thread-start! thread)</procedure><br> + +Makes {{thread}} runnable. The {{thread}} must be a new thread. +{{thread-start!}} returns the {{thread}}. + + + (let ((t (thread-start! (make-thread (lambda () (write 'a)))))) + (write 'b) + (thread-join! t)) ==> ''unspecified'' + ''after writing'' ab ''or'' ba + +NOTE: It is useful to separate thread creation and thread activation to +avoid the race condition that would occur if the created thread tries to +examine a table in which the current thread stores the created thread. See +the last example of {{thread-terminate!}} which contains mutually recursive +threads. + +<procedure>(thread-yield!)</procedure><br> + +The current thread exits the running state as if its quantum had expired. +{{thread-yield!}} returns an unspecified value. + + + ; a busy loop that avoids being too wasteful of the CPU + + (let loop () + (if (mutex-lock! m 0) ; try to lock m but don't block + (begin + (display "locked mutex m") + (mutex-unlock! m)) + (begin + (do-something-else) + (thread-yield!) ; relinquish rest of quantum + (loop)))) + +<procedure>(thread-sleep! timeout)</procedure><br> + +The current thread waits until the timeout is reached. This blocks the +thread only if {{timeout}} represents a point in the future. It is an error +for {{timeout}} to be {{#f}}. {{thread-sleep!}} returns an unspecified +value. + + + ; a clock with a gradual drift: + + (let loop ((x 1)) + (thread-sleep! 1) + (write x) + (loop (+ x 1))) + + ; a clock with no drift: + + (let ((start (time->seconds (current-time))) + (let loop ((x 1)) + (thread-sleep! (seconds->time (+ x start))) + (write x) + (loop (+ x 1)))) + +<procedure>(thread-terminate! thread)</procedure><br> + +Causes an abnormal termination of the {{thread}}. If the {{thread}} +is not already terminated, all mutexes owned by the {{thread}} become +unlocked/abandoned and a "terminated thread exception" object is stored in +the {{thread}}'s end-exception field. If {{thread}} is the current thread, +{{thread-terminate!}} does not return. Otherwise {{thread-terminate!}} +returns an unspecified value; the termination of the {{thread}} will occur +before {{thread-terminate!}} returns. + + + (thread-terminate! (current-thread)) ==> ''does not return'' + + (define (amb thunk1 thunk2) + (let ((result #f) + (result-mutex (make-mutex)) + (done-mutex (make-mutex))) + (letrec ((child1 + (make-thread + (lambda () + (let ((x (thunk1))) + (mutex-lock! result-mutex #f #f) + (set! result x) + (thread-terminate! child2) + (mutex-unlock! done-mutex))))) + (child2 + (make-thread + (lambda () + (let ((x (thunk2))) + (mutex-lock! result-mutex #f #f) + (set! result x) + (thread-terminate! child1) + (mutex-unlock! done-mutex)))))) + (mutex-lock! done-mutex #f #f) + (thread-start! child1) + (thread-start! child2) + (mutex-lock! done-mutex #f #f) + result))) + + +NOTE: This operation must be used carefully because it terminates a +thread abruptly and it is impossible for that thread to perform any kind +of cleanup. This may be a problem if the thread is in the middle of a +critical section where some structure has been put in an inconsistent +state. However, another thread attempting to enter this critical +section will raise an "abandoned mutex exception" because the mutex is +unlocked/abandoned. This helps avoid observing an inconsistent state. Clean +termination can be obtained by polling, as shown in the example below. + + + (define (spawn thunk) + (let ((t (make-thread thunk))) + (thread-specific-set! t #t) + (thread-start! t) + t)) + + (define (stop! thread) + (thread-specific-set! thread #f) + (thread-join! thread)) + + (define (keep-going?) + (thread-specific (current-thread))) + + (define count! + (let ((m (make-mutex)) + (i 0)) + (lambda () + (mutex-lock! m) + (let ((x (+ i 1))) + (set! i x) + (mutex-unlock! m) + x)))) + + (define (increment-forever!) + (let loop () (count!) (if (keep-going?) (loop)))) + + (let ((t1 (spawn increment-forever!)) + (t2 (spawn increment-forever!))) + (thread-sleep! 1) + (stop! t1) + (stop! t2) + (count!)) ==> 377290 + +<procedure>(thread-join! thread [timeout [timeout-val]])</procedure><br> + +The current thread waits until the {{thread}} terminates (normally or +not) or until the timeout is reached if {{timeout}} is supplied. If the +timeout is reached, {{thread-join!}} returns {{timeout-val}} if it is +supplied, otherwise a "join timeout exception" is raised. If the {{thread}} +terminated normally, the content of the end-result field is returned, +otherwise the content of the end-exception field is raised. + + + (let ((t (thread-start! (make-thread (lambda () (expt 2 100)))))) + (do-something-else) + (thread-join! t)) ==> 1267650600228229401496703205376 + + (let ((t (thread-start! (make-thread (lambda () (raise 123)))))) + (do-something-else) + (with-exception-handler + (lambda (exc) + (if (uncaught-exception? exc) + (* 10 (uncaught-exception-reason exc)) + 99999)) + (lambda () + (+ 1 (thread-join! t))))) ==> 1231 + + (define thread-alive? + (let ((unique (list 'unique))) + (lambda (thread) + ; Note: this procedure raises an exception if + ; the thread terminated abnormally. + (eq? (thread-join! thread 0 unique) unique)))) + + (define (wait-for-termination! thread) + (let ((eh (current-exception-handler))) + (with-exception-handler + (lambda (exc) + (if (not (or (terminated-thread-exception? exc) + (uncaught-exception? exc))) + (eh exc))) ; unexpected exceptions are handled by eh + (lambda () + ; The following call to thread-join! will wait until the + ; thread terminates. If the thread terminated normally + ; thread-join! will return normally. If the thread + ; terminated abnormally then one of these two exceptions + ; is raised by thread-join!: + ; - terminated thread exception + ; - uncaught exception + (thread-join! thread) + #f)))) ; ignore result of thread-join! + +<procedure>(mutex? obj)</procedure><br> + +Returns {{#t}} if {{obj}} is a mutex, otherwise returns {{#f}}. + + + (mutex? (make-mutex)) ==> #t + (mutex? 'foo) ==> #f + +<procedure>(make-mutex [name])</procedure><br> + +Returns a new mutex in the unlocked/not-abandoned state. The optional +{{name}} is an arbitrary Scheme object which identifies the mutex (useful +for debugging); it defaults to an unspecified value. The mutex's specific +field is set to an unspecified value. + + + (make-mutex) ==> ''an unlocked/not-abandoned mutex'' + (make-mutex 'foo) ==> ''an unlocked/not-abandoned mutex named'' foo + + +<procedure>(mutex-name mutex)</procedure><br> + +Returns the name of the {{mutex}}. + + + (mutex-name (make-mutex 'foo)) ==> foo + + +<procedure>(mutex-specific mutex)</procedure><br> + +Returns the content of the {{mutex}}'s specific field. + +<procedure>(mutex-specific-set! mutex obj)</procedure><br> + +Stores {{obj}} into the {{mutex}}'s specific field. {{mutex-specific-set!}} +returns an unspecified value. + + + (define m (make-mutex)) + (mutex-specific-set! m "hello") ==> ''unspecified'' + + (mutex-specific m) ==> "hello" + + (define (mutex-lock-recursively! mutex) + (if (eq? (mutex-state mutex) (current-thread)) + (let ((n (mutex-specific mutex))) + (mutex-specific-set! mutex (+ n 1))) + (begin + (mutex-lock! mutex) + (mutex-specific-set! mutex 0)))) + + (define (mutex-unlock-recursively! mutex) + (let ((n (mutex-specific mutex))) + (if (= n 0) + (mutex-unlock! mutex) + (mutex-specific-set! mutex (- n 1))))) + +<procedure>(mutex-state mutex)</procedure><br> + +Returns information about the state of the {{mutex}}. The possible results +are: + + +* '''thread T''': the {{mutex}} is in the locked/owned state and thread T is the owner of the {{mutex}} +* '''symbol {{not-owned}}''': the {{mutex}} is in the locked/not-owned state +* '''symbol {{abandoned}}''': the {{mutex}} is in the unlocked/abandoned state +* '''symbol {{not-abandoned}}''': the {{mutex}} is in the unlocked/not-abandoned state + + + (mutex-state (make-mutex)) ==> not-abandoned + + (define (thread-alive? thread) + (let ((mutex (make-mutex))) + (mutex-lock! mutex #f thread) + (let ((state (mutex-state mutex))) + (mutex-unlock! mutex) ; avoid space leak + (eq? state thread)))) + +<procedure>(mutex-lock! mutex [timeout [thread]])</procedure><br> + +If the {{mutex}} is currently locked, the current thread waits until the +{{mutex}} is unlocked, or until the timeout is reached if {{timeout}} +is supplied. If the timeout is reached, {{mutex-lock!}} returns {{#f}}. +Otherwise, the state of the {{mutex}} is changed as follows: + + +* if {{thread}} is {{#f}} the {{mutex}} becomes locked/not-owned, +* otherwise, let T be {{thread}} (or the current thread if {{thread}} is not supplied), +** if T is terminated the {{mutex}} becomes unlocked/abandoned, +** otherwise {{mutex}} becomes locked/owned with T as the owner. + +After changing the state of the {{mutex}}, an "abandoned mutex exception" +is raised if the {{mutex}} was unlocked/abandoned before the state change, +otherwise {{mutex-lock!}} returns {{#t}}. It is not an error if the +{{mutex}} is owned by the current thread (but the current thread will have +to wait). + + + ; an implementation of a mailbox object of depth one; this + ; implementation does not behave well in the presence of forced + ; thread terminations using thread-terminate! (deadlock can occur + ; if a thread is terminated in the middle of a put! or get! operation) + + (define (make-empty-mailbox) + (let ((put-mutex (make-mutex)) ; allow put! operation + (get-mutex (make-mutex)) + (cell #f)) + + (define (put! obj) + (mutex-lock! put-mutex #f #f) ; prevent put! operation + (set! cell obj) + (mutex-unlock! get-mutex)) ; allow get! operation + + (define (get!) + (mutex-lock! get-mutex #f #f) ; wait until object in mailbox + (let ((result cell)) + (set! cell #f) ; prevent space leaks + (mutex-unlock! put-mutex) ; allow put! operation + result)) + + (mutex-lock! get-mutex #f #f) ; prevent get! operation + + (lambda (msg) + (case msg + ((put!) put!) + ((get!) get!) + (else (error "unknown message")))))) + + (define (mailbox-put! m obj) ((m 'put!) obj)) + (define (mailbox-get! m) ((m 'get!))) + + ; an alternate implementation of thread-sleep! + + (define (sleep! timeout) + (let ((m (make-mutex))) + (mutex-lock! m #f #f) + (mutex-lock! m timeout #f))) + + ; a procedure that waits for one of two mutexes to unlock + + (define (lock-one-of! mutex1 mutex2) + ; this procedure assumes that neither mutex1 or mutex2 + ; are owned by the current thread + (let ((ct (current-thread)) + (done-mutex (make-mutex))) + (mutex-lock! done-mutex #f #f) + (let ((t1 (thread-start! + (make-thread + (lambda () + (mutex-lock! mutex1 #f ct) + (mutex-unlock! done-mutex))))) + (t2 (thread-start! + (make-thread + (lambda () + (mutex-lock! mutex2 #f ct) + (mutex-unlock! done-mutex)))))) + (mutex-lock! done-mutex #f #f) + (thread-terminate! t1) + (thread-terminate! t2) + (if (eq? (mutex-state mutex1) ct) + (begin + (if (eq? (mutex-state mutex2) ct) + (mutex-unlock! mutex2)) ; don't lock both + mutex1) + mutex2)))) + +<procedure>(mutex-unlock! mutex [condition-variable [timeout]])</procedure><br> + +Unlocks the {{mutex}} by making it unlocked/not-abandoned. It is not an +error to unlock an unlocked mutex and a mutex that is owned by any thread. +If {{condition-variable}} is supplied, the current thread is blocked +and added to the {{condition-variable}} before unlocking {{mutex}}; the +thread can unblock at any time but no later than when an appropriate call +to {{condition-variable-signal!}} or {{condition-variable-broadcast!}} +is performed (see below), and no later than the timeout (if {{timeout}} +is supplied). If there are threads waiting to lock this {{mutex}}, +the scheduler selects a thread, the mutex becomes locked/owned or +locked/not-owned, and the thread is unblocked. {{mutex-unlock!}} returns +{{#f}} when the timeout is reached, otherwise it returns {{#t}}. + +NOTE: The reason the thread can unblock at any time (when +{{condition-variable}} is supplied) is to allow extending this SRFI with +primitives that force a specific blocked thread to become runnable. For +example a primitive to interrupt a thread so that it performs a certain +operation, whether the thread is blocked or not, may be useful to handle +the case where the scheduler has detected a serious problem (such as a +deadlock) and it must unblock one of the threads (such as the primordial +thread) so that it can perform some appropriate action. After a thread +blocked on a condition-variable has handled such an interrupt it would be +wrong for the scheduler to return the thread to the blocked state, because +any calls to {{condition-variable-broadcast!}} during the interrupt will +have gone unnoticed. It is necessary for the thread to remain runnable and +return from the call to {{mutex-unlock!}} with a result of {{#t}}. + +NOTE: {{mutex-unlock!}} is related to the "wait" operation on condition +variables available in other thread systems. The main difference is that +"wait" automatically locks {{mutex}} just after the thread is unblocked. +This operation is not performed by {{mutex-unlock!}} and so must be +done by an explicit call to {{mutex-lock!}}. This has the advantages +that a different timeout and exception handler can be specified on the +{{mutex-lock!}} and {{mutex-unlock!}} and the location of all the mutex +operations is clearly apparent. A typical use with a condition variable is: + + + (let loop () + (mutex-lock! m) + (if (condition-is-true?) + (begin + (do-something-when-condition-is-true) + (mutex-unlock! m)) + (begin + (mutex-unlock! m cv) + (loop)))) + +<procedure>(condition-variable? obj)</procedure><br> + +Returns {{#t}} if {{obj}} is a condition variable, otherwise returns +{{#f}}. + + + (condition-variable? (make-condition-variable)) ==> #t + (condition-variable? 'foo) ==> #f + +<procedure>(make-condition-variable [name])</procedure><br> + +Returns a new empty condition variable. The optional {{name}} is an +arbitrary Scheme object which identifies the condition variable (useful for +debugging); it defaults to an unspecified value. The condition variable's +specific field is set to an unspecified value. + + + (make-condition-variable) ==> ''an empty condition variable'' + +<procedure>(condition-variable-name condition-variable)</procedure><br> + +Returns the name of the {{condition-variable}}. + + + (condition-variable-name (make-condition-variable 'foo)) ==> foo + +<procedure>(condition-variable-specific condition-variable)</procedure><br> + +Returns the content of the {{condition-variable}}'s specific field. + +<procedure>(condition-variable-specific-set! condition-variable obj)</procedure><br> + +Stores {{obj}} into the {{condition-variable}}'s specific field. +{{condition-variable-specific-set!}} returns an unspecified value. + + + (define cv (make-condition-variable)) + (condition-variable-specific-set! cv "hello") ==> ''unspecified'' + + (condition-variable-specific cv) ==> "hello" + +<procedure>(condition-variable-signal! condition-variable)</procedure><br> + +If there are threads blocked on the {{condition-variable}}, the scheduler +selects a thread and unblocks it. {{condition-variable-signal!}} returns an +unspecified value. + + + ; an implementation of a mailbox object of depth one; this + ; implementation behaves gracefully when threads are forcibly + ; terminated using thread-terminate! (the "abandoned mutex" + ; exception will be raised when a put! or get! operation is attempted + ; after a thread is terminated in the middle of a put! or get! + ; operation) + + (define (make-empty-mailbox) + (let ((mutex (make-mutex)) + (put-condvar (make-condition-variable)) + (get-condvar (make-condition-variable)) + (full? #f) + (cell #f)) + + (define (put! obj) + (mutex-lock! mutex) + (if full? + (begin + (mutex-unlock! mutex put-condvar) + (put! obj)) + (begin + (set! cell obj) + (set! full? #t) + (condition-variable-signal! get-condvar) + (mutex-unlock! mutex)))) + + (define (get!) + (mutex-lock! mutex) + (if (not full?) + (begin + (mutex-unlock! mutex get-condvar) + (get!)) + (let ((result cell)) + (set! cell #f) ; avoid space leaks + (set! full? #f) + (condition-variable-signal! put-condvar) + (mutex-unlock! mutex)))) + + (lambda (msg) + (case msg + ((put!) put!) + ((get!) get!) + (else (error "unknown message")))))) + + (define (mailbox-put! m obj) ((m 'put!) obj)) + (define (mailbox-get! m) ((m 'get!))) + +<procedure>(condition-variable-broadcast! condition-variable)</procedure><br> + +Unblocks all the threads blocked on the {{condition-variable}}. +{{condition-variable-broadcast!}} returns an unspecified value. + + + (define (make-semaphore n) + (vector n (make-mutex) (make-condition-variable))) + + (define (semaphore-wait! sema) + (mutex-lock! (vector-ref sema 1)) + (let ((n (vector-ref sema 0))) + (if (> n 0) + (begin + (vector-set! sema 0 (- n 1)) + (mutex-unlock! (vector-ref sema 1))) + (begin + (mutex-unlock! (vector-ref sema 1) (vector-ref sema 2)) + (semaphore-wait! sema)))) + + (define (semaphore-signal-by! sema increment) + (mutex-lock! (vector-ref sema 1)) + (let ((n (+ (vector-ref sema 0) increment))) + (vector-set! sema 0 n) + (if (> n 0) + (condition-variable-broadcast! (vector-ref sema 2))) + (mutex-unlock! (vector-ref sema 1)))) + +<procedure>(current-time)</procedure><br> + +Returns the time object corresponding to the current time. + + + (current-time) ==> ''a time object'' + +<procedure>(time? obj)</procedure><br> + +Returns {{#t}} if {{obj}} is a time object, otherwise returns {{#f}}. + + + (time? (current-time)) ==> #t + (time? 123) ==> #f + +<procedure>(time->seconds time)</procedure><br> + +Converts the time object {{time}} into an exact or inexact real number +representing the number of seconds elapsed since some implementation +dependent reference point. + + + (time->seconds (current-time)) ==> 955039784.928075 + +<procedure>(seconds->time x)</procedure><br> + +Converts into a time object the exact or inexact real number {{x}} +representing the number of seconds elapsed since some implementation +dependent reference point. + + + (seconds->time (+ 10 (time->seconds (current-time))) + ==> ''a time object representing 10 seconds in the future'' + + +<procedure>(current-exception-handler)</procedure><br> + +Returns the current exception handler. + + + (current-exception-handler) ==> ''a procedure'' + +<procedure>(with-exception-handler handler thunk)</procedure><br> + +Returns the result(s) of calling {{thunk}} with no arguments. The +{{handler}}, which must be a procedure, is installed as the current +exception handler in the dynamic environment in effect during the call to +{{thunk}}. + + + (with-exception-handler + list + current-exception-handler) ==> ''the procedure'' list + +<procedure>(raise obj)</procedure><br> + +Calls the current exception handler with {{obj}} as the single argument. +{{obj}} may be any Scheme object. + + + (define (f n) + (if (< n 0) (raise "negative arg") (sqrt n)))) + + (define (g) + (call-with-current-continuation + (lambda (return) + (with-exception-handler + (lambda (exc) + (return + (if (string? exc) + (string-append "error: " exc) + "unknown error"))) + (lambda () + (write (f 4.)) + (write (f -1.)) + (write (f 9.))))))) + + (g) ==> ''writes'' 2. ''and returns'' "error: negative arg" + + +<procedure>(join-timeout-exception? obj)</procedure><br> + +Returns {{#t}} if {{obj}} is a "join timeout exception" object, otherwise +returns {{#f}}. A join timeout exception is raised when {{thread-join!}} is +called, the timeout is reached and no {{timeout-val}} is supplied. + +<procedure>(abandoned-mutex-exception? obj)</procedure><br> + +Returns {{#t}} if {{obj}} is an "abandoned mutex exception" object, +otherwise returns {{#f}}. An abandoned mutex exception is raised when the +current thread locks a mutex that was owned by a thread which terminated +(see {{mutex-lock!}}). + +<procedure>(terminated-thread-exception? obj)</procedure><br> + +Returns {{#t}} if {{obj}} is a "terminated thread exception" object, +otherwise returns {{#f}}. A terminated thread exception is raised when +{{thread-join!}} is called and the target thread has terminated as a result +of a call to {{thread-terminate!}}. + +<procedure>(uncaught-exception? obj)</procedure><br> + +Returns {{#t}} if {{obj}} is an "uncaught exception" object, otherwise +returns {{#f}}. An uncaught exception is raised when {{thread-join!}} is +called and the target thread has terminated because it raised an exception +that called the initial exception handler of that thread. + +<procedure>(uncaught-exception-reason exc)</procedure><br> + +{{exc}} must be an "uncaught exception" object. +{{uncaught-exception-reason}} returns the object which was passed to the +initial exception handler of that thread. + + --- Previous: [[Unit srfi-14]] diff --git a/manual/Unit srfi-4 b/manual/Unit srfi-4 index ee1ab668..cbd167fe 100644 --- a/manual/Unit srfi-4 +++ b/manual/Unit srfi-4 @@ -1,45 +1,19 @@ [[tags: manual]] -[[toc:]] == Unit srfi-4 -Homogeneous numeric vectors, see the documentation for [[http://srfi.schemers.org/srfi-4/srfi-4.html|SRFI-4]] -64-bit integer vectors ({{u64vector}} and {{s64vector}} are not supported. - -The basic constructor procedures for number vectors are extended to allow allocating the storage in non garbage -collected memory: - -=== make-XXXvector +Homogeneous numeric vector datatypes. Also see the [[http://srfi.schemers.org/srfi-4/srfi-4.html|original SRFI-4 document]]. -<procedure>(make-XXXvector SIZE [INIT NONGC FINALIZE])</procedure> - -Creates a SRFI-4 homogenous number vector of length {{SIZE}}. If {{INIT}} is given, it specifies the initial -value for each slot in the vector. The optional arguments {{NONGC}} and {{FINALIZE}} define whether the -vector should be allocated in a memory area not subject to garbage collection and whether the associated storage -should be automatically freed (using finalization) when there are no references from Scheme variables and data. -{{NONGC}} defaults to {{#f}} (the vector will be located in normal garbage collected memory) and -{{FINALIZE}} defaults to {{#t}}. Note that the {{FINALIZE}} argument is only used when {{NONGC}} -is true. +[[toc:]] +== Chicken implementation -Additionally, the following procedures are provided: +* Procedures for blob conversion, subvectors and vector I/O are provided. +* SRFI-17 setters for {{XXXvector-ref}} are defined. +* Constructors allow allocating the storage in non garbage collected memory. +* 64-bit integer vectors ({{u64vector}} and {{s64vector}}) are not supported. -=== u8vector->blob -=== s8vector->blob -=== u16vector->blob -=== s16vector->blob -=== u32vector->blob -=== s32vector->blob -=== f32vector->blob -=== f64vector->blob -=== u8vector->blob/shared -=== s8vector->blob/shared -=== u16vector->blob/shared -=== s16vector->blob/shared -=== u32vector->blob/shared -=== s32vector->blob/shared -=== f32vector->blob/shared -=== f64vector->blob/shared +=== Blob conversions <procedure>(u8vector->blob U8VECTOR)</procedure><br> <procedure>(s8vector->blob S8VECTOR)</procedure><br> @@ -56,31 +30,13 @@ Additionally, the following procedures are provided: <procedure>(u32vector->blob/shared U32VECTOR)</procedure><br> <procedure>(s32vector->blob/shared S32VECTOR)</procedure><br> <procedure>(f32vector->blob/shared F32VECTOR)</procedure><br> -<procedure>(f64vector->blob/shared F64VECTOR)</procedure> +<procedure>(f64vector->blob/shared F64VECTOR)</procedure><br> Each of these procedures return the contents of the given vector as a 'packed' blob. The byte order in that vector is platform-dependent (for example little-endian on an '''Intel''' processor). The {{/shared}} variants return a blob that shares memory with the contents of the vector. - -=== blob->u8vector -=== blob->s8vector -=== blob->u16vector -=== blob->s16vector -=== blob->u32vector -=== blob->s32vector -=== blob->f32vector -=== blob->f64vector -=== blob->u8vector/shared -=== blob->s8vector/shared -=== blob->u16vector/shared -=== blob->s16vector/shared -=== blob->u32vector/shared -=== blob->s32vector/shared -=== blob->f32vector/shared -=== blob->f64vector/shared - <procedure>(blob->u8vector BLOB)</procedure><br> <procedure>(blob->s8vector BLOB)</procedure><br> <procedure>(blob->u16vector BLOB)</procedure><br> @@ -96,22 +52,14 @@ variants return a blob that shares memory with the contents of the vector. <procedure>(blob->u32vector/shared BLOB)</procedure><br> <procedure>(blob->s32vector/shared BLOB)</procedure><br> <procedure>(blob->f32vector/shared BLOB)</procedure><br> -<procedure>(blob->f64vector/shared BLOB)</procedure> +<procedure>(blob->f64vector/shared BLOB)</procedure><br> Each of these procedures return a vector where the argument {{BLOB}} is taken as a 'packed' representation of the contents of the vector. The {{/shared}} variants return a vector that shares memory with the contents of the blob. - -=== subu8vector -=== subu16vector -=== subu32vector -=== subs8vector -=== subs16vector -=== subs32vector -=== subf32vector -=== subf64vector +=== Subvectors <procedure>(subu8vector U8VECTOR FROM TO)</procedure><br> <procedure>(subu16vector U16VECTOR FROM TO)</procedure><br> @@ -120,42 +68,242 @@ shares memory with the contents of the blob. <procedure>(subs16vector S16VECTOR FROM TO)</procedure><br> <procedure>(subs32vector S32VECTOR FROM TO)</procedure><br> <procedure>(subf32vector F32VECTOR FROM TO)</procedure><br> -<procedure>(subf64vector F64VECTOR FROM TO)</procedure> +<procedure>(subf64vector F64VECTOR FROM TO)</procedure><br> Creates a number vector of the same type as the argument vector with the elements at the positions {{FROM}} up to but not including {{TO}}. -SRFI-17 Setters for {{XXXvector-ref}} are defined. - - -=== read-u8vector +=== Vector I/O <procedure>(read-u8vector LENGTH [PORT])</procedure> Reads {{LENGTH}} bytes from the {{PORT}} and returns a fresh {{u8vector}} or less if end-of-file is encountered. {{PORT}} defaults to the value of {{(current-input-port)}}. -If {{LENGTH}} is {{#f}}, the vector will be filled completely until end-of-file is reached. - -=== read-u8vector! +If {{LENGTH}} is {{#f}}, the vector will be filled completely until end-of-file is reached. <procedure>(read-u8vector! LENGTH U8VECTOR [PORT [START]])</procedure> Reads {{LENGTH}} bytes from the {{PORT}} writing the read input into {{U8VECTOR}} beginning at {{START}} (or 0 if not given). {{PORT}} defaults to the value of {{(current-input-port)}}. + If {{LENGTH}} is {{#f}}, the vector will be filled completely until end-of-file is reached. This procedure returns the number of bytes read. - -=== write-u8vector - <procedure>(write-u8vector U8VECTOR [PORT [START [END]]])</procedure> Writes the bytes {{U8VECTOR}} between the indices {{START}} (inclusive) and {{END}} (exclusive) to {{PORT}}. + {{PORT}} defaults to the value of {{(current-output-port)}}. +== SRFI-4 specification + +SRFI-4 describes a set of datatypes for vectors whose elements are +of the same numeric type (signed or unsigned exact integer or inexact +real of a given precision). These datatypes support operations analogous +to the Scheme vector type, but they are distinct datatypes. An external +representation is specified which must be supported by the {{read}} and +{{write}} procedures and by the program parser (i.e. programs can contain +references to literal homogeneous vectors). + +=== Datatypes + +There are 8 datatypes of exact integer homogeneous vectors (which will be +called integer vectors): + +<table> +<tr><th>Datatype</th><th>Type of elements</th></tr> +<tr><td>{{s8vector}}</td><td>signed exact integer in the range -(2^7) to (2^7)-1</td></tr> +<tr><td>{{u8vector}}</td><td>unsigned exact integer in the range 0 to (2^8)-1</td></tr> +<tr><td>{{s16vector}}</td><td>signed exact integer in the range -(2^15) to (2^15)-1</td></tr> +<tr><td>{{u16vector}}</td><td>unsigned exact integer in the range 0 to (2^16)-1</td></tr> +<tr><td>{{s32vector}}</td><td>signed exact integer in the range -(2^31) to (2^31)-1</td></tr> +<tr><td>{{u32vector}}</td><td>unsigned exact integer in the range 0 to (2^32)-1</td></tr> +<tr><td>{{s64vector}}</td><td>signed exact integer in the range -(2^63) to (2^63)-1</td></tr> +<tr><td>{{u64vector}}</td><td>unsigned exact integer in the range 0 to (2^64)-1</td></tr></table> + +There are 2 datatypes of inexact real homogeneous vectors (which will be +called float vectors): + +<table> +<tr><th>Datatype</th><th>Type of elements</th></tr> +<tr><td>{{f32vector}}</td><td>inexact real</td></tr> +<tr><td>{{f64vector}}</td><td>inexact real</td></tr></table> + +The only difference between the two float vector types is that +{{f64vector}}s preserve at least as much precision as {{f32vector}}s. + +Each homogeneous vector datatype has an external representation which +is supported by the {{read}} and {{write}} procedures and by the program +parser. Each datatype also has a set of associated predefined procedures +analogous to those available for Scheme's heterogeneous vectors. + +=== External representation + +<read>#u8</read><br> +<read>#u16</read><br> +<read>#u32</read><br> +<read>#s8</read><br> +<read>#s16</read><br> +<read>#s32</read><br> +<read>#f32</read><br> +<read>#f64</read><br> + +The external representation of instances of the datatype {{XXXvector}} +is {{#XXX( ...elements... )}}. + +For example, + + #u8(0 #e1e2 #xff)}} ; a {{u8vector}} of length 3 containing 0, 100, 255 + #f64(-1.5) ; a {{f64vector}} of length 1 containing -1.5. + +This external representation is also available in program source code. For example, + + (set! x '#u8(1 2 3)) + +will set {{x}} to the object {{#u8(1 2 3)}}. Literal homogeneous vectors must be quoted just like heterogeneous vectors must be. Homogeneous vectors can appear in quasiquotations but must not contain {{unquote}} or {{unquote-splicing}} forms. ''I.e.'', + + `(,x #u8(1 2)) ; legal + `#u8(1 ,x 2) ; illegal + +=== Predicates + +<procedure>(u8vector? OBJ)</procedure><br> +<procedure>(s8vector? OBJ)</procedure><br> +<procedure>(u16vector? OBJ)</procedure><br> +<procedure>(s16vector? OBJ)</procedure><br> +<procedure>(u32vector? OBJ)</procedure><br> +<procedure>(s32vector? OBJ)</procedure><br> +<procedure>(f32vector? OBJ)</procedure><br> +<procedure>(f64vector? OBJ)</procedure><br> + +Return {{#t}} if {{obj}} is an object of the specified type or {{#f}} if not. + +=== Constructors + +<procedure>(make-u8vector N [U8VALUE NONGC FINALIZE])</procedure><br> +<procedure>(make-s8vector N [S8VALUE NONGC FINALIZE])</procedure><br> +<procedure>(make-u16vector N [U16VALUE NONGC FINALIZE])</procedure><br> +<procedure>(make-s16vector N [S16VALUE NONGC FINALIZE])</procedure><br> +<procedure>(make-u32vector N [U32VALUE NONGC FINALIZE])</procedure><br> +<procedure>(make-s32vector N [S32VALUE NONGC FINALIZE])</procedure><br> +<procedure>(make-f32vector N [F32VALUE NONGC FINALIZE])</procedure><br> +<procedure>(make-f64vector N [F64VALUE NONGC FINALIZE])</procedure><br> + +Return a newly-allocated SRFI-4 homogeneous number vector of length N. + +If the optional fill VALUE is specified, it specifies the initial +value for each slot in the vector. If not, the content of the vector +is unspecified but individual elements of the vector are guaranteed to +be in the range of values permitted for that type of vector. + +The type of the fill value must be compatible with the elements of the +vector datatype. It is an error if otherwise -- for example, if an +inexact integer is passed to {{make-u8vector}}. + +On Chicken, these procedures have been extended to allow allocating +the storage in non-garbage collected memory, as follows: + +The optional arguments {{NONGC}} and {{FINALIZE}} define whether the +vector should be allocated in a memory area not subject to garbage +collection and whether the associated storage should be automatically +freed (using finalization) when there are no references from Scheme +variables and data. {{NONGC}} defaults to {{#f}} (the vector will be +located in normal garbage collected memory) and {{FINALIZE}} defaults +to {{#t}}. Note that the {{FINALIZE}} argument is only used when +{{NONGC}} is true. + +<procedure>(u8vector U8VALUE ...)</procedure><br> +<procedure>(s8vector S8VALUE ...)</procedure><br> +<procedure>(u16vector U16VALUE ...)</procedure><br> +<procedure>(s16vector S16VALUE ...)</procedure><br> +<procedure>(u32vector U32VALUE ...)</procedure><br> +<procedure>(s32vector S32VALUE ...)</procedure><br> +<procedure>(f32vector F32VALUE ...)</procedure><br> +<procedure>(f64vector F64VALUE ...)</procedure><br> + +Return a newly-allocated SRFI-4 homogeneous number vector of the specified +type, composed of the arguments. + +=== Length + +<procedure>(u8vector-length U8VECTOR)</procedure><br> +<procedure>(s8vector-length S8VECTOR)</procedure><br> +<procedure>(u16vector-length U16VECTOR)</procedure><br> +<procedure>(s16vector-length S16VECTOR)</procedure><br> +<procedure>(u32vector-length U32VECTOR)</procedure><br> +<procedure>(s32vector-length S32VECTOR)</procedure><br> +<procedure>(f32vector-length F32VECTOR)</procedure><br> +<procedure>(f64vector-length F64VECTOR)</procedure><br> + +Returns the length of the SRFI-4 homogeneous number VECTOR. + +=== Getters + +<procedure>(u8vector-ref U8VECTOR I)</procedure><br> +<procedure>(s8vector-ref S8VECTOR i)</procedure><br> +<procedure>(u16vector-ref U16VECTOR I)</procedure><br> +<procedure>(s16vector-ref S16VECTOR I)</procedure><br> +<procedure>(u32vector-ref U32VECTOR I)</procedure><br> +<procedure>(s32vector-ref S32VECTOR I)</procedure><br> +<procedure>(f32vector-ref F32VECTOR I)</procedure><br> +<procedure>(f64vector-ref F64VECTOR I)</procedure><br> + +Return the value of the ''i''th element of the SRFI-4 homogeneous +number vector, where {{I}} is a nonnegative exact integer less +than the length of the vector. + +=== Setters + +<procedure>(u8vector-set! U8VECTOR I U8VALUE)</procedure><br> +<procedure>(s8vector-set! S8VECTOR I S8VALUE)</procedure><br> +<procedure>(u16vector-set! U16VECTOR I U16VALUE)</procedure><br> +<procedure>(s16vector-set! S16VECTOR I S16VALUE)</procedure><br> +<procedure>(u32vector-set! U32VECTOR I U32VALUE)</procedure><br> +<procedure>(s32vector-set! S32VECTOR I S32VALUE)</procedure><br> +<procedure>(f32vector-set! F32VECTOR I F32VALUE)</procedure><br> +<procedure>(f64vector-set! F64VECTOR I F64VALUE)</procedure><br> + +Set the {{i}}th element of the SRFI-4 homogeneous number VECTOR to +VALUE. {{I}} is a nonnegative exact integer less than the length of +the vector and VALUE must be the same type as the elements of the +vector datatype. + +Additionally, SRFI-17 setters are defined on all {{xxxvector-ref}} +procedures. For example, to set the {{i}}th element of SRFI-4 +{{u8vector}} to {{u8value}}: + + (set! (u8vector-ref u8vector i) u8value) + +=== Conversions + +<procedure>(u8vector->list U8VECTOR)</procedure><br> +<procedure>(s8vector->list S8VECTOR)</procedure><br> +<procedure>(u16vector->list U16VECTOR)</procedure><br> +<procedure>(s16vector->list S16VECTOR)</procedure><br> +<procedure>(u32vector->list U32VECTOR)</procedure><br> +<procedure>(s32vector->list S32VECTOR)</procedure><br> +<procedure>(f32vector->list F32VECTOR)</procedure><br> +<procedure>(f64vector->list F64VECTOR)</procedure><br> + +Return a list consisting of the elements of SRFI-4 homogeneous number +VECTOR. + +<procedure>(list->u8vector U8LIST)</procedure><br> +<procedure>(list->s8vector S8LIST)</procedure><br> +<procedure>(list->u16vector U16LIST)</procedure><br> +<procedure>(list->s16vector S16LIST)</procedure><br> +<procedure>(list->u32vector U32LIST)</procedure><br> +<procedure>(list->s32vector S32LIST)</procedure><br> +<procedure>(list->f32vector F32LIST)</procedure><br> +<procedure>(list->f64vector F64LIST)</procedure><br> + +Return a newly-allocated SRFI-4 homogeneous number VECTOR consisting +of the elements of LIST. Each element of LIST must be compatible +with the datatype of VECTOR. + --- Previous: [[Unit srfi-1]]Trap