← All issues

[JSC] Implement Variable Count Parentheses in YarrJIT

6646c49

// JSTests/stress/regexp-greedy-nested-quantifier-backtrack.js
test(/((a+){2,3}){2,3}$/, "aaaaaa", "aaaaaa");
test(/((a+){2,4}){2,3}$/, "aaaaaa", "aaaaaa");
test(/(((a+){2}){2}){1,2}$/, "aaaa", "aaaa");
test(/((a+){2,3}){2,3}$/, "aaa", null);  // must correctly fail

YarrJIT compiles ECMAScript regular expressions directly to native machine code. It operates on ParenContext objects that save and restore capture group state during backtracking. Previously, variable-count groups with a non-zero minimum (e.g., {3,5}) triggered a JIT compile failure, silently routing execution to the slower interpreter.

This commit extends YarrJIT to JIT-compile {m,n} parentheses quantifiers with m > 0. The new count-enforcement backtracking path: when the iteration count drops below the minimum during backtracking, the engine re-enters the latest iteration's content at End.contentBacktrackEntryLabel to try alternative branches rather than immediately failing. A zero-length-match guard punts back to the interpreter. A parallel correctness fix is applied to YarrInterpreter.cpp.

New JIT-generated native code replaces a safe interpreter fallback for a non-trivial backtracking state machine — historically one of the most productive areas to find correctness bugs and occasionally memory-safety issues in regex engines.

🔒

New JIT backtracking state machine with multi-level quantifier nesting — edge cases in count enforcement and capture group save/restore are worth auditing.

Subscribe to read more