← All issues

[2] YARR RegularExpression heap overflow on duplicate named captures

Severity: High | Component: JSC YARR | e536815

Rated High because the diff shows the caller's offset-vector size is recomputed from a stale (numSubpatterns + 1) * 2 formula while the interpreter writes the larger m_offsetsSize slot count, producing a deterministic OOB unsigned write of attacker-influenced indices on any UnicodeSets pattern with duplicate named captures; the four committed tests confirm the OOB is reachable through the public RegularExpression API.

RegularExpression's offsets vector allocation size is incorrect: the formula was updated when named captures were added, but RegularExpression's computation was not updated correctly. This patch fixes it.

Source/JavaScriptCore/yarr/RegularExpression.cpp

int RegularExpression::match(StringView str, unsigned startFrom, int* matchLength) const
{
if (!d->m_regExpByteCode)
return -1;
 
if (str.isNull())
return -1;
 
- int offsetVectorSize = (d->m_numSubpatterns + 1) * 2;
+ int offsetVectorSize = d->m_regExpByteCode->m_offsetsSize;
unsigned* offsetVector;
Vector<unsigned, 32> nonReturnedOvector;
 
nonReturnedOvector.grow(offsetVectorSize);
offsetVector = nonReturnedOvector.mutableSpan().data();
...
result = interpret(d->m_regExpByteCode.get(), str, startFrom, offsetVector);

Tools/TestWebKitAPI/Tests/JavaScriptCore/RegularExpression.cpp

+TEST(JavaScriptCore_RegularExpression, DuplicateNamedCaptureGroupSimple)
+{
+ JSC::initialize();
+ RegularExpression re("(?<a>x)|(?<a>y)"_s, { JSC::Yarr::Flags::UnicodeSets });
+ EXPECT_TRUE(re.isValid());
+ int matchLength = 0;
+ EXPECT_EQ(0, re.match("x"_s, 0, &matchLength));
+ ...
+}

RegularExpression::match previously derived offsetVectorSize locally as (d->m_numSubpatterns + 1) * 2 and grew nonReturnedOvector to that count before calling interpret(). The patch replaces that formula with the bytecode's authoritative d->m_regExpByteCode->m_offsetsSize. Four new TestWebKitAPI tests (DuplicateNamedCaptureGroupSimple, Multiple, NoMatch, SearchRev) construct a RegularExpression with patterns like (?<a>x)|(?<a>y) under Flags::UnicodeSets and exercise the previously-corrupting match/searchRev paths.

Caller-side allocation size formula left out of sync with a callee-side contract after the feature extension that added duplicate named capture groups.

YARR is JavaScriptCore's regex engine. RegularExpression (in yarr/RegularExpression.cpp) is a thin C++ wrapper used by non-JS WebKit code that needs regex matching — text search, find-in-page, content-extension matching, and similar internal needs — distinct from the JavaScript RegExp object's own match path. It compiles a pattern to a BytecodePattern and runs interpret() against it.

The offsets vector is an array of unsigned indices used by the YARR interpreter to record where each capture group (and the whole match) starts and ends in the input string. Classically its size is (numSubpatterns + 1) * 2 — two indices, start and end, per subpattern, plus one for the overall match.

ES2024 introduced duplicate named capture groups: a pattern like (?<a>x)|(?<a>y) is allowed under the v flag (Unicode sets, surfaced in WebKit as Flags::UnicodeSets), letting the same group name appear in different alternation branches. To support this, the YARR bytecode compiler stores the authoritative offsets-vector size in BytecodePattern::m_offsetsSize.

Vector<unsigned, 32> is a WTF dynamic array with 32 elements of inline storage — small offset vectors live inside the Vector struct itself, larger ones spill to the heap.

The bug is a caller/callee size-contract disagreement. The YARR interpreter writes to the offsets array using the bytecode's expected layout — every slot up to m_offsetsSize — but RegularExpression::match only allocated the smaller (m_numSubpatterns + 1) * 2 slots. When the bytecode compiler started reserving additional slots so each duplicate same-named capture group could be tracked independently, m_offsetsSize became the source of truth and the local formula in the caller became stale. The mismatch produces a heap (or inline-buffer stack) out-of-bounds write of unsigned values whenever the bytecode steps reach the extra duplicate-group slots.

🔒

How a one-line constant became an attacker-influenced OOB write — and how far that write can be steered under realistic conditions.

Subscribe to read more

🔒

Multiple reusable audit patterns covering YARR's size-contract surface and inline-vector OOB pitfalls, with concrete grep targets for variant discovery.

Subscribe to read more