~ chicken-core (chicken-5) e18379d79abf0b76d88be5fbd45187b6ff500c15
commit e18379d79abf0b76d88be5fbd45187b6ff500c15
Author: LemonBoy <thatlemon@gmail.com>
AuthorDate: Thu Nov 9 13:29:08 2017 +0100
Commit: Evan Hanson <evhan@foldling.org>
CommitDate: Sun Nov 12 09:40:38 2017 +1300
Fix an error in unicode-range->utf8-pattern
The sequence generated for a utf8 character class contained an
unintended trailing '(), causing the code to fail when
`sre-length-ranges' is called.
Reported by Chunyang Xu at CHICKEN-users.
Signed-off-by: Peter Bex <peter@more-magic.net>
Signed-off-by: Evan Hanson <evhan@foldling.org>
diff --git a/NEWS b/NEWS
index 212f40b2..3b36ebde 100644
--- a/NEWS
+++ b/NEWS
@@ -19,6 +19,8 @@
on s8vectors (thanks to Kristian Lein-Mathisen).
- Large literals no longer crash with "invalid encoded numeric literal"
on mingw-64 (#1344, thanks to Lemonboy).
+ - Unit irregex: Fix bug that prevented multibyte UTF-8 character sets
+ from being matched correctly (Thanks to Lemonboy and Chunyang Xu).
- Runtime system:
- The profiler no longer uses malloc from a signal handler which may
diff --git a/irregex-core.scm b/irregex-core.scm
index 7ac043d3..ba6d1f72 100644
--- a/irregex-core.scm
+++ b/irregex-core.scm
@@ -1407,12 +1407,11 @@
(unicode-range-up-to hi-ls)))
(let lp ((lo-ls lo-ls) (hi-ls hi-ls))
(cond
- ((null? lo-ls)
- '())
((= (car lo-ls) (car hi-ls))
(sre-sequence
- (list (integer->char (car lo-ls))
- (lp (cdr lo-ls) (cdr hi-ls)))))
+ (cons (integer->char (car lo-ls))
+ (if (null? (cdr lo-ls)) '()
+ (cons (lp (cdr lo-ls) (cdr hi-ls)) '())))))
((= (+ (car lo-ls) 1) (car hi-ls))
(sre-alternate (list (unicode-range-up-from lo-ls)
(unicode-range-up-to hi-ls))))
diff --git a/tests/test-irregex.scm b/tests/test-irregex.scm
index 1a460549..9a5402c4 100644
--- a/tests/test-irregex.scm
+++ b/tests/test-irregex.scm
@@ -538,5 +538,7 @@
(test-assert (not (irregex-search "(?u:<[^あ-ん語]*>)" "<ひらがな>")))
(test-assert (not (irregex-search "(?u:<[^あ-ん語]*>)" "<語>")))
+(test-assert (not (irregex-search (irregex "[一二]" 'utf8 #t) "三四")))
+
(test-end)
Trap