[2] Distinguish named and numbered backreferences in Yarr
Severity: High | Component: JSC Yarr regex engine | 12ef604
Rated High because the observable effect is an out-of-bounds access on the heap-allocated backtrack stack reachable from web content via a crafted regex, and the primitive (controlled OOB read/write with attacker-influenced offset magnitude) is projected with confidence 0.82 from the frame-size mismatch mechanism described in the commit message and the gating fix shown in the diff.
The duplicate named capture group feature added the ability to have duplicate capture group names so long as they're in different disjuncts. When a backreference was explicitly numbered (e.g. \1), the engine incorrectly resolved it through the duplicate named group indirection table. This misresolution can be recursive — the commit message explains that this recursion manifests as out-of-bounds when doing backtracking, because you can end up matching a group of size N, match a disjunct of size M, then backtrack expecting to undo size N, but end up rewinding size M.
The fix distinguishes numbered and named backreferences by splitting the PatternTerm::Type enum and gating the duplicate named group resolution on the new isNamed parameter.
Source/JavaScriptCore/yarr/YarrInterpreter.cpp
- void atomBackReference(unsigned subpatternId, MatchDirection matchDirection, unsigned inputPosition, unsigned frameLocation, Checked<unsigned> quantityMaxCount, QuantifierType quantityType, OptionSet<Flags> flags)
+ void atomBackReference(bool isNamed, unsigned subpatternId, MatchDirection matchDirection, unsigned inputPosition, unsigned frameLocation, Checked<unsigned> quantityMaxCount, QuantifierType quantityType, OptionSet<Flags> flags)
{
ASSERT(subpatternId);
m_bodyDisjunction->terms.append(ByteTerm::BackReference(subpatternId, matchDirection, inputPosition, flags));
- if (m_pattern.hasDuplicateNamedCaptureGroups()) {
+ if (isNamed && m_pattern.hasDuplicateNamedCaptureGroups()) {
auto duplicateNamedGroupId = m_pattern.m_duplicateNamedGroupForSubpatternId[subpatternId];
if (duplicateNamedGroupId)
m_bodyDisjunction->terms.last().atom.parenIds.duplicateNamedGroupId = duplicateNamedGroupId;
}
Source/JavaScriptCore/yarr/YarrPattern.h
- BackReference,
- ForwardReference,
+ NumberedBackReference,
+ NamedBackReference,
+ NumberedForwardReference,
+ NamedForwardReference,
JSTests/stress/regexp-duplicate-named-captures.js
+// Test named capture groups backtracking doesn't confuse numbered backreferences
+testRegExp(/(?<l>x)|((?<l>(|)\1?.){2})x/, "caffeeeee", null);
Patch Details
The fix splits PatternTerm::Type::BackReference and ForwardReference into four variants: NumberedBackReference, NamedBackReference, NumberedForwardReference, and NamedForwardReference. The critical change is in ByteCompiler::atomBackReference() and YarrGenerator::generateBackReference(), where the duplicate named capture group resolution logic — looking up m_duplicateNamedGroupForSubpatternId — is now gated on the new isNamed parameter. The same distinction is propagated through YarrPatternConstructor, with conversion functions (convertToNumberedBackreference / convertToNamedBackreference) ensuring forward references resolve to the correct typed variant.
Failure to distinguish named from numbered backreferences causes incorrect duplicate-group indirection, corrupting backtrack frame accounting.
Background
ES2025 allows multiple capture groups to share the same name, provided they appear in different alternatives (disjuncts) of the pattern. For example, /(?<k>a)|(?<k>b)/ is valid — at most one k can match at any time. When duplicate named groups exist, Yarr assigns a special duplicateNamedGroupId and builds an indirection table (m_duplicateNamedGroupForSubpatternId) that maps a subpattern ID to its duplicate group ID. At match time, named backreferences use this indirection to check all groups sharing the same name.
Named and numbered backreferences have fundamentally different semantics. \k<name> is resolved by group name and must check all groups sharing that name. \1 is resolved by capture group index and always refers to exactly one group. Before this fix, both were compiled into the same PatternTerm::Type::BackReference, with named groups pre-resolved to their numeric index.
Yarr's interpreter maintains a backtrack stack where each disjunct or group match pushes a frame whose size depends on what was matched. During backtracking, the engine pops frames expecting sizes that correspond to the match path actually taken. The stack is heap-allocated via BumpPointerAllocator.
Analysis
Before the fix, Yarr compiled both named and numbered backreferences into identical BackReference terms. When a pattern contained duplicate named capture groups, atomBackReference unconditionally resolved any backreference's subpattern ID through the m_duplicateNamedGroupForSubpatternId table. In the pattern /(?<l>x)|((?<l>(|)\1?.){2})x/, the numbered backreference \1 inside the second disjunct was incorrectly resolved as if it referred to the duplicate named group l (spanning both disjuncts), rather than to just the first capture group.
The commit message explains the downstream consequence: this misresolution can be recursive, and the recursion corrupts backtrack frame accounting. The engine matches a group of size N, then matches a disjunct of size M, but during backtracking expects to rewind size N while actually rewinding size M — resulting in out-of-bounds access on the backtrack stack. The trigger pattern — likely discovered through regex fuzzing with duplicate named capture group grammar rules — combines duplicate named groups with a numbered backreference in a nested quantified group, which is the kind of unusual structural combination that grammar-based fuzzers produce.
Aaa Aaaa Aaaa Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Aaaaaaa Aaaaaaaaaaaaa Aaaaaaaaaaaa Aaa Aaaaaaaa Aaaaa a Aaaaaaaaaaa Aaa Aaaaa Aaa Aaaaa Aaaaaaaaaaa Aaaaa Aaa Aaaa Aaaa Aaa Aaaaaaaa Aaaaaaaaaaaaa Aaaa Aaaaaa Aaaaa Aaaa Aa Aaaaa Aa Aaa Aaa Aaaaaaa Aaaa Aaaaaaaa Aa Aaaaaaa Aaa Aaaaaaaaa Aaaaa Aaaaa Aaaaaaaaaaa Aaa Aaaa Aaaaaa Aa Aaaaa Aa Aaaa Aaaaaaaaaa Aaaa Aaa Aaaaaa Aaaaaa Aaa Aaaaaa Aaaaaaaa Aaa Aaaaaaa Aaaaa Aa Aaaaaa Aaaa Aaa Aaaaaaa Aaa Aaaaa Aaaaaaaaaa Aaaa Aaaaaaa Aaaaaaaaa Aaaaaa Aaaaa Aaaaa Aaa Aaaaaaaa Aaaaaaa Aaa Aaaaa Aaaaa a Aaa Aaaaaa Aaaaaaa Aaaaaa Aaa Aaa Aaaa Aaaaaaaaaaaaaaa Aaaaa Aaaa Aaa Aaaaaaaaaa Aaaaaaaaa Aaa Aaaaaaaaaaaa Aaaaa Aaaaa Aaaa Aaaaa Aaaa Aaaaaaaa Aaaaaa Aaa Aaaaaaaaa Aaaaa Aaaaaaa Aa Aaaa Aaaa Aaa Aaaaaa Aa Aaa Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Aaaaaaaaaaa
Aaa Aaa Aaaaaa Aa Aaaaaa Aaaaaaaaaaa Aaa Aaaaaaaaaa Aaaaaaaaaa Aaaaaaaa Aa Aaa Aaaaaaaaa Aaaaa Aaaaaaaaaaa Aaa Aaaaaaaa Aaaaaaaa Aaaa Aaa Aaaaaaa Aaaaaaaaa Aaaaaa Aaaaaaaaaa Aaa Aaaaa Aaaa Aaaaaaaaaaaa a Aa Aa Aaa Aaa Aaaaa Aaaaaa Aaaaaa Aaaaaaaaaa Aaaaa Aaaaaaaaaaaaaaa Aaaaa Aaa Aaaaaaa Aaaaaa Aaaaaaaaaa Aaaaaaa Aaaa Aaa Aaa Aaaaaaa Aaa Aaaaaaaaa Aaaaa Aaaaa Aa Aaaa Aaaaaaa Aa Aaa Aaaaaa Aaaaa Aaaa Aa Aaaaaaaa Aaaa Aaaaaaaaaaaa Aa Aaaaaaaaaaaa Aaaaaaaa Aaaa Aaaaaaaaaa Aaaaa Aaaaa Aa Aaaa Aaaaaaaaa Aa Aaaaaaaa Aaaaa Aaaaaaaaaaa Aaaaaaaaa Aaaa Aaa Aaaaaaaaa Aaa Aaaaaaaaa Aa Aaa Aaa Aaaaaaa Aaa Aaa Aaaa Aaaaaaaaaaaaaaa Aaaaaaaa Aaa Aaaa Aaaaaaaaa Aaaaa Aaa Aa Aaaaaaaaaaaaaaaaaaaaaaaa Aaa Aaa Aaaaa Aaaaaaaaaaaa Aaaaaaa Aaaaaaaaaaaaaaaaaaaaaaaaaa Aaa Aaaa Aaaaaaaaa Aaaaa Aaaaaa Aaaaaaaaaaa Aaaaa Aaaaaaaaaa
Aaaa Aaaaaaaaaaaaa Aaaaaaa Aaaaaa Aaaaaa Aaaaaa Aaa Aaaaaaaa Aaaaaaaa Aaaaaaaaaaaa Aaaaaaaa Aa Aaa Aaaaa Aaaaa a Aaa Aaa Aaaa Aaa Aaaaaaa Aaa a Aaaaaaa Aaaaaaa Aaaaaaaaaa Aaaaa Aaaaaaa Aaa Aaaaaaaaa Aaaaaa Aaaaaaaaaaa Aaaaaaa Aa Aaaaaaaaaaa Aaaa Aa Aaaaa Aaaaaaaaa Aaaaaa Aaa Aaaaaaaaaa Aaaaaaaa
Aaaaaaaaa Aaaa Aaaaaaa Aaaaa Aaa Aaaaaaaaa Aaaaa Aaaaaaaaaa Aaaaaaaaa Aaa Aaaaaaaa Aaaa Aaa Aaaaaa Aaaaaaa Aaaaaaaaaaa Aaa Aaa Aaaaaa Aaa Aaaaaa Aaaa Aaaaaaaa Aaaaaaaa Aa Aaa Aaaaa Aaa Aaaa Aaaaaaaaa Aaaa a Aaaaaaaa Aaaaaaaaaaaaa Aaaaaaaaaaa Aaaaaaaa Aaaaaaa Aaaaaaaaa Aaaaa Aaaaa Aaaaaaaaaaa a Aa Aaaaaaaa Aaaaaaaaa Aa Aaa Aaaaaa
🔒Explores the backtrack stack corruption mechanism and how the frame size mismatch translates to a heap-relative OOB primitive
Subscribe to read more
Audit directions
a Aaaaaaa Aa Aaaaaaaaaa Aaaaaaa Aaaaaa a Aaaaaaa Aaaaaaaa Aaa Aaa Aaaa Aaaaaaaaa Aaaaaaaaaaaa Aaaaa Aaaaa Aaaa Aaaaaaaaaaaaaaaaaaa Aaaaaa Aaa Aaaaaaaaaa Aaaaa Aaa Aaaaa Aaaaa a Aaaaaa Aaaa Aaaaaa Aaaaaaaa Aaaaaaaa Aaaaaaaa Aaaa Aaaaaa Aa Aaaaaaaaaaaaaa Aa Aaaaaaaaaaa Aaaaa Aaaaaaa Aaaaaaaaaaaaaaaaaaaaaaa Aaaaaaaaa Aaaaaaa Aaa Aaaaaaaaaaaa Aaaaaaa Aaaaaaaaaaaaaaaaaaaaaaaa Aaaaaaaaaaaaaa Aaa Aaaaaaaaaaaaaaaaa Aaaaaa Aaaa Aaaaaaaaa Aaaaa Aaaaaa Aaa Aaaaaaaaa Aaaaa Aaaa Aaaaaaaaaaaaaaaaaaaaaaaa Aaaaaaa Aaaa Aaaaaa Aaaaaaaaaaaaa Aaaaaaaaaa
a Aaaaaaaaaaa Aaaaa Aaaa Aaaaaaaaaaa Aaaa Aaa Aa Aaaaaaaa Aa Aaaaaaaaaaaaaa Aaaaaaaaa Aaaaaaaaaaaa Aaaaa Aaaaaa Aaaaaaaaa Aaaaa Aaaaaaaaaa Aa Aaaaaaaaaaaaaaaaaaaaa a Aaaaaaaaaaaa Aaa Aaaaaaaaaaaaaaaaaaaaaaaaaa Aaa Aaaaa Aaaa Aaaaaaaaaaaa Aa Aaaaaaaaaaaaaaaaaaaaaaa Aaaaaa Aaaa Aaaaa Aaaa Aaaaaaa Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Aaaa Aaaaaaa Aaa Aaaaa Aaaaa Aaaa Aaaa Aaaaaaa Aaaaaaaaaa Aaaa Aaaaaaaaa Aaaaa Aaaaaa Aa Aaaaaa Aaaaaaaaaa Aaaaaa Aaa Aaaaaaaaa Aaaa Aaa Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Aa Aaaa Aaa Aaaaa Aaaa Aaaaaaaaaa
a Aaaaaaaaaaaaa Aaaaa Aaaaaaa Aaaaaaa Aa Aaaaaaaaaa Aa Aaaaaaa Aaaaaaaaaa Aaa Aaaa Aaaaaa Aaaaaaaaaa Aaa Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Aaaaa Aa Aaaaaaa Aa Aaaaaaaaaa Aa Aaa Aaa Aaaaaaaa Aaa Aaa Aaaa Aaaaaa Aaaaa Aaaa Aaaa Aaaaaa Aaa Aaaaa Aaaaa Aaaa Aa Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Aaa Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Aaaaaa Aaaaaaaaaaaaaaaaaaaaa Aaa Aaaaaaaaaaaaa Aa Aaaaaaa Aaa Aaaaaaa Aaaaaaaa Aaaa Aaaaaa Aa Aaaaaaa Aaa Aaaaaaaaa Aa Aaaaaa
a Aaaaa Aaaa Aaaaaaaaa Aaaaa Aaaaaaaaa Aaa Aaa Aaaa Aaa Aaaaaaaaa Aaaaa Aa Aaaaaaaaaaaaaaaaaaaaaaa Aa Aaaaaaaaaaaaaa Aaa Aaa Aaaaa Aaaaaaaaaaaa Aaaaaaa Aaaaaaaaaaaaaaaaaaaaaaaaaa Aaa Aaaa Aaaaaaaaa Aaaaa Aaaaaa Aaaaaaaaaaaa Aaaaaa Aaaa Aaa Aaaaaaaaaaaa Aaaaaaaaa Aaaa Aaaaaaaaa Aaaaaaa Aaaaa Aaaa Aaaaaaaaaa Aaa Aaaa Aaaaaaaa Aaa Aaaaa Aaaaaaaaaaaaaa Aaaaa Aaaa Aaaa
🔒Multiple audit patterns identified across Yarr's IR type system and backtrack stack accounting, with concrete starting points for variant discovery
Subscribe to read more