← All issues

[2] Distinguish named and numbered backreferences in Yarr

Severity: High | Component: JSC Yarr regex engine | 12ef604

Rated High because the observable effect is an out-of-bounds access on the heap-allocated backtrack stack reachable from web content via a crafted regex, and the primitive (controlled OOB read/write with attacker-influenced offset magnitude) is projected with confidence 0.82 from the frame-size mismatch mechanism described in the commit message and the gating fix shown in the diff.

The duplicate named capture group feature added the ability to have duplicate capture group names so long as they're in different disjuncts. When a backreference was explicitly numbered (e.g. \1), the engine incorrectly resolved it through the duplicate named group indirection table. This misresolution can be recursive — the commit message explains that this recursion manifests as out-of-bounds when doing backtracking, because you can end up matching a group of size N, match a disjunct of size M, then backtrack expecting to undo size N, but end up rewinding size M.

The fix distinguishes numbered and named backreferences by splitting the PatternTerm::Type enum and gating the duplicate named group resolution on the new isNamed parameter.

Source/JavaScriptCore/yarr/YarrInterpreter.cpp

- void atomBackReference(unsigned subpatternId, MatchDirection matchDirection, unsigned inputPosition, unsigned frameLocation, Checked<unsigned> quantityMaxCount, QuantifierType quantityType, OptionSet<Flags> flags)
+ void atomBackReference(bool isNamed, unsigned subpatternId, MatchDirection matchDirection, unsigned inputPosition, unsigned frameLocation, Checked<unsigned> quantityMaxCount, QuantifierType quantityType, OptionSet<Flags> flags)
{
ASSERT(subpatternId);
 
m_bodyDisjunction->terms.append(ByteTerm::BackReference(subpatternId, matchDirection, inputPosition, flags));
 
- if (m_pattern.hasDuplicateNamedCaptureGroups()) {
+ if (isNamed && m_pattern.hasDuplicateNamedCaptureGroups()) {
auto duplicateNamedGroupId = m_pattern.m_duplicateNamedGroupForSubpatternId[subpatternId];
if (duplicateNamedGroupId)
m_bodyDisjunction->terms.last().atom.parenIds.duplicateNamedGroupId = duplicateNamedGroupId;
}

Source/JavaScriptCore/yarr/YarrPattern.h

- BackReference,
- ForwardReference,
+ NumberedBackReference,
+ NamedBackReference,
+ NumberedForwardReference,
+ NamedForwardReference,

JSTests/stress/regexp-duplicate-named-captures.js

+// Test named capture groups backtracking doesn't confuse numbered backreferences
+testRegExp(/(?<l>x)|((?<l>(|)\1?.){2})x/, "caffeeeee", null);

The fix splits PatternTerm::Type::BackReference and ForwardReference into four variants: NumberedBackReference, NamedBackReference, NumberedForwardReference, and NamedForwardReference. The critical change is in ByteCompiler::atomBackReference() and YarrGenerator::generateBackReference(), where the duplicate named capture group resolution logic — looking up m_duplicateNamedGroupForSubpatternId — is now gated on the new isNamed parameter. The same distinction is propagated through YarrPatternConstructor, with conversion functions (convertToNumberedBackreference / convertToNamedBackreference) ensuring forward references resolve to the correct typed variant.

Failure to distinguish named from numbered backreferences causes incorrect duplicate-group indirection, corrupting backtrack frame accounting.

ES2025 allows multiple capture groups to share the same name, provided they appear in different alternatives (disjuncts) of the pattern. For example, /(?<k>a)|(?<k>b)/ is valid — at most one k can match at any time. When duplicate named groups exist, Yarr assigns a special duplicateNamedGroupId and builds an indirection table (m_duplicateNamedGroupForSubpatternId) that maps a subpattern ID to its duplicate group ID. At match time, named backreferences use this indirection to check all groups sharing the same name.

Named and numbered backreferences have fundamentally different semantics. \k<name> is resolved by group name and must check all groups sharing that name. \1 is resolved by capture group index and always refers to exactly one group. Before this fix, both were compiled into the same PatternTerm::Type::BackReference, with named groups pre-resolved to their numeric index.

Yarr's interpreter maintains a backtrack stack where each disjunct or group match pushes a frame whose size depends on what was matched. During backtracking, the engine pops frames expecting sizes that correspond to the match path actually taken. The stack is heap-allocated via BumpPointerAllocator.

Before the fix, Yarr compiled both named and numbered backreferences into identical BackReference terms. When a pattern contained duplicate named capture groups, atomBackReference unconditionally resolved any backreference's subpattern ID through the m_duplicateNamedGroupForSubpatternId table. In the pattern /(?<l>x)|((?<l>(|)\1?.){2})x/, the numbered backreference \1 inside the second disjunct was incorrectly resolved as if it referred to the duplicate named group l (spanning both disjuncts), rather than to just the first capture group.

The commit message explains the downstream consequence: this misresolution can be recursive, and the recursion corrupts backtrack frame accounting. The engine matches a group of size N, then matches a disjunct of size M, but during backtracking expects to rewind size N while actually rewinding size M — resulting in out-of-bounds access on the backtrack stack. The trigger pattern — likely discovered through regex fuzzing with duplicate named capture group grammar rules — combines duplicate named groups with a numbered backreference in a nested quantified group, which is the kind of unusual structural combination that grammar-based fuzzers produce.

🔒

Explores the backtrack stack corruption mechanism and how the frame size mismatch translates to a heap-relative OOB primitive

Subscribe to read more

🔒

Multiple audit patterns identified across Yarr's IR type system and backtrack stack accounting, with concrete starting points for variant discovery

Subscribe to read more