~ chicken-core (chicken-5) e2602c4712f0a4f2f444f8dc384572b17eaf0011


commit e2602c4712f0a4f2f444f8dc384572b17eaf0011
Author:     Peter Bex <peter@more-magic.net>
AuthorDate: Mon Jul 5 11:38:43 2021 +0200
Commit:     felix <felix@call-with-current-continuation.org>
CommitDate: Mon Jul 5 14:38:11 2021 +0200

    Update irregex to upstream 960fa22b, fixing a group matching issue
    
    When a kleene star is used around an alternative containing
    submatches, in some circumstances the DFA compilation would emit
    reordering commands which would cause the regex capturing to go wrong,
    returning faulty matches.
    
    This would go wrong because the ordering commands would read from a
    memory slot and write to a target memory slot.
    
    For example, the following set of reordering commands has no "correct"
    order in which they can be executed:
    
    p[0] <- p[1]
    p[1] <- p[0]
    
    After executing both of them in either order, both of the slots will
    contain the same value, instead of swapping them as was the intention.
    This is fixed by executing the ordering commands after first fetching
    the old memory slot locations into a closure.
    
    Fixes upstream issue #27
    
    Signed-off-by: felix <felix@call-with-current-continuation.org>

diff --git a/NEWS b/NEWS
index 46af9bd1..53a40f0f 100644
--- a/NEWS
+++ b/NEWS
@@ -10,9 +10,11 @@
     of irregex-replace/all with positive lookbehind so all matches are
     replaced instead of only the first (reported by Kay Rhodes), and
     a regression regarding replacing empty matches which was introduced
-    by the fixes in 0.9.7 (reported by Sandra Snan).  Finally, the
+    by the fixes in 0.9.7 (reported by Sandra Snan).  Also, the
     http-url shorthand now allows any top-level domain and the old
     "top-level-domain" now also supports "edu" (fixed by Sandra Snan).
+    Finally, a problem was fixed with capturing groups inside a kleene
+    star, which could sometimes return incorrect parts of the match.
   - current-milliseconds has been deprecated in favor of the name
     current-process-milliseconds, to avoid confusion due to naming
     of current-milliseconds versus current-seconds, which do something
diff --git a/irregex-core.scm b/irregex-core.scm
index 8f672333..a8e7c97f 100644
--- a/irregex-core.scm
+++ b/irregex-core.scm
@@ -2235,12 +2235,18 @@
                                         (chunk&position (cons src (+ i 1))))
                                     (vector-set! slot (car s) chunk&position)))
                                 (cdr cmds))
-                      (for-each (lambda (c)
-                                  (let* ((tag (vector-ref c 0))
-                                         (ss (vector-ref memory (vector-ref c 1)))
-                                         (ds (vector-ref memory (vector-ref c 2))))
-                                    (vector-set! ds tag (vector-ref ss tag))))
-                                (car cmds)))))
+		      ;; Reassigning commands may be in an order which
+                      ;; causes memory cells to be clobbered before
+                      ;; they're read out.  Make 2 passes to maintain
+                      ;; old values by copying them into a closure.
+                      (for-each (lambda (execute!) (execute!))
+                                (map (lambda (c)
+                                       (let* ((tag (vector-ref c 0))
+                                              (ss (vector-ref memory (vector-ref c 1)))
+                                              (ds (vector-ref memory (vector-ref c 2)))
+                                              (value-from (vector-ref ss tag)))
+                                         (lambda () (vector-set! ds tag value-from))))
+                                     (car cmds))))))
                   (if new-finalizer
                       (lp2 (+ i 1) next src (+ i 1) new-finalizer)
                       (lp2 (+ i 1) next res-src res-index #f))))
diff --git a/tests/re-tests.txt b/tests/re-tests.txt
index 7a56edb7..39a747e6 100644
--- a/tests/re-tests.txt
+++ b/tests/re-tests.txt
@@ -171,3 +171,4 @@ multiple words	multiple words, yeah	y	&	multiple words
 (a([^a])*)*	abcaBC	y	&-\1-\2	abcaBC-aBC-C
 ([Aa]b).*\1	abxyzab	y	&-\1	abxyzab-ab
 a([\/\\]*)b	a//\\b	y	&-\1	a//\\b-//\\
+(?:[[:alnum:]]|(@[[:alnum:]]))*	oeh@2tu@2n342	y	\1	@2
Trap