← All issues

[JSC] Move RegExp.prototype[Symbol.match] to C++

e922a2c

JSTests/stress/regexp-prototype-symbol-match-watchpoint-invalidation.js

+function match(re, str) { return re[Symbol.match](str); }
+...
+(function() {
+ // Override exec after specialization
+ custom.exec = function(...) { ... };
+ shouldThrow(() => match(re, "hello"), ...);
+})();
+...
+(function() {
+ // Override RegExp.prototype.exec globally
+ RegExp.prototype.exec = function() { throw new Error("custom exec"); };
+ shouldThrow(() => match(re, "hello"), ...);
+})();

RegExp.prototype[Symbol.match] is the entry point for all String.prototype.match calls and direct re[Symbol.match](str) calls. Its spec algorithm is deceptively complex: for global/sticky regexps it must repeatedly call RegExpExec, coerce lastIndex with ToLength, advance past empty matches using AdvanceStringIndex (which is Unicode-aware and must step over surrogate pairs), and honour any observable side effects from overridden exec, flags, or lastIndex accessors. JSC previously implemented this in JS builtins; this commit ports it to native C++.

The patch removes the hasObservableSideEffectsForRegExpMatch and matchSlow JS helpers and adds a new addRegExpMatchPrimordialChecks function in the DFG. The C++ implementation adds a dedicated advanceStringIndex helper and a regExpMatchSlow fallback path for observable-side-effect scenarios. The new primordial-check system in the DFG compiler gates the fast C++ path: when all relevant built-ins are unmodified the engine takes a fast C++ exec loop, otherwise it falls back to regExpMatchSlow.

Before:                              After:
  str.match(re)                        str.match(re)
    └─► String.prototype.match (C++)     └─► String.prototype.match (C++)
          └─► RegExp.prototype                 └─► RegExp.prototype
              [Symbol.match]  ← JS builtin         [Symbol.match]  ← C++ host
                ├─ hasObservableSideEffects        ├─ addRegExpMatchPrimordialChecks
                ├─ matchSlow()                     │     ├─ intact → fast C++ exec loop
                └─ exec() loop                     │     └─ modified → regExpMatchSlow()
                                                   └─ advanceStringIndex() ← new C++

Beyond the 5–25% performance gains, this moves complex spec-mandated logic — lastIndex coercion, empty-match advancement, Unicode surrogate handling, and observable side-effect detection — from interpreter-level JS builtins into C++, where implementation divergences from the spec are harder to spot and historically have produced exploitable bugs. The correctness of the new primordial checks now determines whether spec-mandated observable side effects are actually observed.

🔒

The new primordial-check gate and Unicode advancement path each have edge cases worth security investigation.

Subscribe to read more