~ chicken-core (chicken-5) 358334198d5ee507812c686b330fe884bda79848
commit 358334198d5ee507812c686b330fe884bda79848
Author: LemonBoy <thatlemon@gmail.com>
AuthorDate: Thu Nov 9 13:29:08 2017 +0100
Commit: Evan Hanson <evhan@foldling.org>
CommitDate: Sun Nov 12 09:40:31 2017 +1300
Fix an error in unicode-range->utf8-pattern
The sequence generated for a utf8 character class contained an
unintended trailing '(), causing the code to fail when
`sre-length-ranges' is called.
Reported by Chunyang Xu at CHICKEN-users.
Signed-off-by: Peter Bex <peter@more-magic.net>
Signed-off-by: Evan Hanson <evhan@foldling.org>
diff --git a/NEWS b/NEWS
index 7a495951..10d677b9 100644
--- a/NEWS
+++ b/NEWS
@@ -138,6 +138,8 @@
on s8vectors (thanks to Kristian Lein-Mathisen).
- Large literals no longer crash with "invalid encoded numeric literal"
on mingw-64 (#1344, thanks to Lemonboy).
+ - Unit irregex: Fix bug that prevented multibyte UTF-8 character sets
+ from being matched correctly (Thanks to Lemonboy and Chunyang Xu).
- Runtime system:
- The profiler no longer uses malloc from a signal handler which may
diff --git a/irregex-core.scm b/irregex-core.scm
index c83aff9b..bef8336e 100644
--- a/irregex-core.scm
+++ b/irregex-core.scm
@@ -1402,12 +1402,11 @@
(unicode-range-up-to hi-ls)))
(let lp ((lo-ls lo-ls) (hi-ls hi-ls))
(cond
- ((null? lo-ls)
- '())
((= (car lo-ls) (car hi-ls))
(sre-sequence
- (list (integer->char (car lo-ls))
- (lp (cdr lo-ls) (cdr hi-ls)))))
+ (cons (integer->char (car lo-ls))
+ (if (null? (cdr lo-ls)) '()
+ (cons (lp (cdr lo-ls) (cdr hi-ls)) '())))))
((= (+ (car lo-ls) 1) (car hi-ls))
(sre-alternate (list (unicode-range-up-from lo-ls)
(unicode-range-up-to hi-ls))))
diff --git a/tests/test-irregex.scm b/tests/test-irregex.scm
index 3981131c..8626b82c 100644
--- a/tests/test-irregex.scm
+++ b/tests/test-irregex.scm
@@ -539,6 +539,8 @@
(test-assert (not (irregex-search "(?u:<[^あ-ん語]*>)" "<ひらがな>")))
(test-assert (not (irregex-search "(?u:<[^あ-ん語]*>)" "<語>")))
+(test-assert (not (irregex-search (irregex "[一二]" 'utf8 #t) "三四")))
+
(test-end)
(test-exit)
Trap