[2] YARR RegularExpression heap overflow on duplicate named captures
Severity: High | Component: JSC YARR | e536815
Rated High because the diff shows the caller's offset-vector size is recomputed from a stale (numSubpatterns + 1) * 2 formula while the interpreter writes the larger m_offsetsSize slot count, producing a deterministic OOB unsigned write of attacker-influenced indices on any UnicodeSets pattern with duplicate named captures; the four committed tests confirm the OOB is reachable through the public RegularExpression API.
RegularExpression's offsets vector allocation size is incorrect: the formula was updated when named captures were added, but RegularExpression's computation was not updated correctly. This patch fixes it.
Source/JavaScriptCore/yarr/RegularExpression.cpp
int RegularExpression::match(StringView str, unsigned startFrom, int* matchLength) const
{
if (!d->m_regExpByteCode)
return -1;
if (str.isNull())
return -1;
- int offsetVectorSize = (d->m_numSubpatterns + 1) * 2;
+ int offsetVectorSize = d->m_regExpByteCode->m_offsetsSize;
unsigned* offsetVector;
Vector<unsigned, 32> nonReturnedOvector;
nonReturnedOvector.grow(offsetVectorSize);
offsetVector = nonReturnedOvector.mutableSpan().data();
...
result = interpret(d->m_regExpByteCode.get(), str, startFrom, offsetVector);
Tools/TestWebKitAPI/Tests/JavaScriptCore/RegularExpression.cpp
+TEST(JavaScriptCore_RegularExpression, DuplicateNamedCaptureGroupSimple)
+{
+ JSC::initialize();
+ RegularExpression re("(?<a>x)|(?<a>y)"_s, { JSC::Yarr::Flags::UnicodeSets });
+ EXPECT_TRUE(re.isValid());
+ int matchLength = 0;
+ EXPECT_EQ(0, re.match("x"_s, 0, &matchLength));
+ ...
+}
Patch Details
RegularExpression::match previously derived offsetVectorSize locally as (d->m_numSubpatterns + 1) * 2 and grew nonReturnedOvector to that count before calling interpret(). The patch replaces that formula with the bytecode's authoritative d->m_regExpByteCode->m_offsetsSize. Four new TestWebKitAPI tests (DuplicateNamedCaptureGroupSimple, Multiple, NoMatch, SearchRev) construct a RegularExpression with patterns like (?<a>x)|(?<a>y) under Flags::UnicodeSets and exercise the previously-corrupting match/searchRev paths.
Caller-side allocation size formula left out of sync with a callee-side contract after the feature extension that added duplicate named capture groups.
Background
YARR is JavaScriptCore's regex engine. RegularExpression (in yarr/RegularExpression.cpp) is a thin C++ wrapper used by non-JS WebKit code that needs regex matching — text search, find-in-page, content-extension matching, and similar internal needs — distinct from the JavaScript RegExp object's own match path. It compiles a pattern to a BytecodePattern and runs interpret() against it.
The offsets vector is an array of unsigned indices used by the YARR interpreter to record where each capture group (and the whole match) starts and ends in the input string. Classically its size is (numSubpatterns + 1) * 2 — two indices, start and end, per subpattern, plus one for the overall match.
ES2024 introduced duplicate named capture groups: a pattern like (?<a>x)|(?<a>y) is allowed under the v flag (Unicode sets, surfaced in WebKit as Flags::UnicodeSets), letting the same group name appear in different alternation branches. To support this, the YARR bytecode compiler stores the authoritative offsets-vector size in BytecodePattern::m_offsetsSize.
Vector<unsigned, 32> is a WTF dynamic array with 32 elements of inline storage — small offset vectors live inside the Vector struct itself, larger ones spill to the heap.
Analysis
The bug is a caller/callee size-contract disagreement. The YARR interpreter writes to the offsets array using the bytecode's expected layout — every slot up to m_offsetsSize — but RegularExpression::match only allocated the smaller (m_numSubpatterns + 1) * 2 slots. When the bytecode compiler started reserving additional slots so each duplicate same-named capture group could be tracked independently, m_offsetsSize became the source of truth and the local formula in the caller became stale. The mismatch produces a heap (or inline-buffer stack) out-of-bounds write of unsigned values whenever the bytecode steps reach the extra duplicate-group slots.
Aaa Aaaaaaa Aa Aaaaaaaaaaa Aaa Aaaaaaaaa Aaaa Aaaaa Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa a Aaaaaaaaaaa Aaa Aaa Aaaa Aaaaaaa Aa Aaa Aaaaaa Aaa Aaaaaa Aaaaaaa Aaa Aaa Aaaaaaaaaa Aaaaaaa Aaa Aaaaaaaaaaa Aaaaaa Aaa Aaaa Aaaaaaaaaaaaaa Aaaa a Aaaaaa Aaaaaaa Aaaa Aaa Aaaaa Aaa Aaaaaaaaaaaaaaa Aaaaaaaaaa Aa Aaa Aaaaaaaaaaa Aaaaaa Aaaaa Aa Aaaaaa Aaaaaaaa Aaa Aaaaa Aaaaa Aaaaaaa Aaaaaaaaaaaaaa Aaaaaa Aa Aaa Aaaaaaaaa Aaaaaa Aa Aa Aaaaaa Aa Aaa Aaaaa Aaa Aaaaa Aaaaa Aaaaaaa Aaaaaaaa Aaaa Aaaaaaa Aaaa Aaaaaaaa Aaaaaaaaaaa Aaaaaaaa Aaa Aaaaa Aaaaa Aaaaaaa Aaa Aaaaaaaa Aaaaa Aaaaaa Aaaaaaa Aaa Aaaaaaaaaaa Aaaaaa Aaa Aaaaaaa Aaaaaaaa Aaaaaa a Aaaaaaaaaa Aaaaa Aa a Aaaaa Aaa Aa Aaaaaaa Aaaaaa Aaaa Aaaaaaaa Aaaaaaa Aaaaa Aaaa Aa Aaaaaaa Aa Aaaaaaaaaaaaaa a Aaaaaaaaaaaaaaaaa a Aa a Aaa Aaaaa Aaaaaa Aaaa Aaa Aaaaaa Aa Aaaaaaaaa Aaaaa Aaaaaa Aa Aaa Aaaaaaaa
Aaaaaaaaaaaa Aaaa Aaaaaaaaa Aaa Aaaaaaa Aaaaaaa Aa Aaa Aaaaaaaa Aaaaaa Aaaaaa Aa Aaaaaaaaaaaaaaaaaaaa Aaaaa Aaa Aaaa Aaaa Aaa Aaa Aaaaa Aaa Aaaaaaaa Aaaaaaa Aa Aaaaaaaaa Aaaa Aaaaaa Aaaaaaaaaaaaaaaaaaa Aaaaaa Aaaa Aa Aaaaaaaaaaaaaaaaaaa Aaaaaaa a Aaaa Aaaaaaaa Aaaaaaaaaaa Aaaa Aaaaaaaaaaaaa Aaaaaaa Aaaaaa Aaaaaa Aaaaaa Aaaa Aaaaaaaa Aaa Aaaaaaaaa Aaaa Aaa Aaaaaaa Aaaaaa Aaaaaa Aa Aaaaaaaaaaaaa Aa Aaaaa Aa Aaa Aaaaaaaaaa Aaaaaaaaaaaaaaa Aaa Aaaaaaaa Aaaaaaaa Aaaaaaaaaaaaa Aaaaaaaa Aaaa Aaaaaaaaa Aaaaa Aaaaaaa Aaaaaaa Aaa Aa Aaaaaaaa Aaa Aaa Aaaaaa Aaaa a Aaaaaaa Aaaaa Aaaaaaa Aaaaaaaa Aaaaa Aa Aaaa Aaaaaaa
Aaaa Aa a Aaaaaaa Aaaa Aaaaaaaa Aaa Aaa Aaaa Aaaaa Aaaa Aaa Aaa Aaaaaaaa Aaaa Aaaa Aaaaaa Aaaaaaaaaaaaaaaaaaaaaaa Aaaaaaa Aaaaaaa Aaaaaaaaaaaaaaa Aaa Aaaaa Aa Aaa Aaaaaa Aa Aaaaa a Aaa a Aaaaaaa Aaaaaa Aaaa Aaa Aaaaaa Aaaaaaaaaaaaaaaa a Aa a Aa Aaaaaaaa Aaaaaaaa Aaaa Aaaaaaa a Aaaa Aaa Aaaaaa Aa Aaaaaaa Aaaaaa Aaaa Aaaaaaaa Aa Aaaaaaaa Aaaaaaaa Aaa Aaaaaaa Aaaa Aaaaaaa Aaaaaaaaaaaaaaa Aaaaaa Aaaa Aaaaaaaaaa
🔒How a one-line constant became an attacker-influenced OOB write — and how far that write can be steered under realistic conditions.
Subscribe to read more
Audit directions
a Aaaaaaaaaaaaa Aaaaaaaaaa Aaaaaaaa Aaaa Aaaaaaaaa a Aaaaaaaaaaa Aaaa Aaaaaaaa Aaaaa Aaa Aa Aaaa Aaaa Aaa Aaaaaaaa Aaaa Aaaaa Aa Aaaaaaaaaaa Aaaaa Aaaaa Aaaaaaaa Aa Aaaaaa Aaaaaaaaaaaaa Aaa Aaaaa Aaaaa Aaaa Aaaaaaaaa Aa Aaaaaaa Aaaaaa Aaa Aaaaa Aaa Aaaaaa Aaaa Aaa Aaaaa Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Aaaaaa Aaaa Aaaaaaaaaaa Aaaa Aaaaaaaaaaaaaaaaa Aaaa Aaaaaaaaaaaaaaaaaaaaaaaaaaaaa Aaa Aaaaaaaaaaaaaaaa Aaaaaaaaaa Aaaa Aa a Aaaaaa Aaaa Aaa Aaaaaaa Aaaa Aaaaaaaa Aa Aaaaaaaaaaaaaaaaaaaaaaaaaaaaa
a Aaaaaaaaaa Aaaaa Aaaaaa a Aaaa Aaaaa Aaaaaaaaaaaaaaaaaa Aaaa Aaaaaa Aaaaaaaaaaa Aaaaaaaaaaaa Aaaaa Aaaaaaa Aaaaaa Aaaaaaa Aaaa Aaaaa Aaaaaaaa Aaaaaaaaaaaa Aaaa Aaaaa Aaaaaaaaaaaaa Aaaaa Aaaaaa a Aaaaaaa Aaaa Aaaa Aaa Aaaaaaa Aaaaa Aaaaa Aaaa Aaaaaaa Aaaaaaaaaaaaaaaaaaaaa Aaaaaaa Aaaaaaaaaaa Aaaaa Aaaaaaaaa Aaaaaaaaaaaa Aaaaa Aaaaa Aa Aaaaaaaaaaaa Aaaaaaaa Aaaa Aaaaaaaaaaaaa Aaaa Aaa Aaaaaaaa Aa Aaaaaaa Aaaaaaaaaa Aaaaa Aaaaaaa Aaa Aaaaaaaaa Aaaaaaaaaaaaaaaaaa Aaa Aaaaaaaa Aaaa Aaaaaa Aaaaa Aaaaa Aaaaaaa Aaa Aaaaaaaaaa Aaaaaaaaaaaaa
a Aaaaaaaaaaaa Aaa Aaaa Aaaaaa Aaaaaaa Aa Aaaa a Aaa Aaaa Aaa Aaaaaa Aaaa Aaa Aaaaaaaaaa Aaaaa Aaaaaa Aaaaaa Aaaaaaaa Aaa Aaaaa Aaaaaaaa Aaaaaaaa Aaaaaa Aa Aaa Aaaaaaaa Aa Aaaaaaaaaaaaaa Aaaaaaaaa Aaaaaa Aaa Aaaaaaaaaaaaaaaaa Aaaa Aaaa Aaaaaaa Aaaaaaaaaaaaaaa Aaaaaaaa Aa Aaaa Aaa Aaa Aaa Aaaaa Aaaaa Aaaaaaaaa Aa Aaaaaa Aaaa Aa Aaaaaaaaaaaaaaaaaa Aaa Aaa Aaaaaaaaaa Aaaaaa Aaa Aaaaaaaaa Aa a Aaaaaa Aaaa Aaaa a Aaaaaaaaa Aaaa Aaaaaaa Aaaaa Aaaa Aaaaaaaaaaaaaaaaaaaaaaaaaaaaa Aaa Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
a Aaaaaaaaaa Aaaa Aaaaaaaaaaa Aaaaa a Aaaaaaaaaaaa Aaaaaaaaaaaa Aaaaaaaaaaaa Aaaa Aaaaaaaa Aaa Aa Aaaaaaaa Aaa Aaaaaaaaaaaaaaaaaa a Aa a Aaa Aaaaaaaaaaaaaaaa a Aa a Aaa Aaa Aaaaaaaaaa Aaaaaaaaaaa Aaaaaa Aaa Aaa Aaaaaaa a Aaaaa Aaaaa Aa a Aaaaaaaaa Aaa Aaa Aaaa Aaaaaa
🔒Multiple reusable audit patterns covering YARR's size-contract surface and inline-vector OOB pitfalls, with concrete grep targets for variant discovery.
Subscribe to read more