← All issues

[12] RegExp::byteCodeCompileIfNecessary not thread-safe vs concurrent compiler thread

Severity: Medium | Component: JSC Yarr regex engine | dcf25ed

Rated Medium because the diff fixes a TOCTOU race between the mutator thread and a JSC concurrent compiler thread on a std::unique_ptr<BytecodePattern> field of RegExp; under interleaving the race produces undefined behavior, but the trigger requires very narrow timing the commit message confirms is not reachable without artificial sleeps.

RegExp::matchConcurrently shouldn't cause any compilation, but bail out if JIT code for the regexp doesn't already exist. However, it is possible that a RegExp has JIT code but no bytecode, in which case matchConcurrently can incorrectly racily attempt to compile bytecode.

Source/JavaScriptCore/runtime/RegExp.cpp

void RegExp::byteCodeCompileIfNecessary(VM* vm)
{
+ Locker locker { cellLock() };
+
if (m_regExpBytecode)
return;

Source/JavaScriptCore/runtime/RegExpInlines.h

if (result == static_cast<int>(Yarr::JSRegExpResult::JITCodeFailure)) {
- // JIT'ed code couldn't handle expression, so punt back to the interpreter.
- byteCodeCompileIfNecessary(&vm);
- if (m_state == ParseError)
- return throwError();
+ // Only the mutator may compile bytecode; the compiler thread must use the
+ // bytecode that already exists, and bails out if there is no bytecode.
+ if constexpr (matchFrom == Yarr::MatchFrom::VMThread) {
+ byteCodeCompileIfNecessary(&vm);
+ if (m_state == ParseError)
+ return throwError();
+ }
+ if (!m_regExpBytecode)
+ return -1;

Two changes: byteCodeCompileIfNecessary now takes the cell lock (Locker locker { cellLock() }) before checking and writing m_regExpBytecode. In both overloads of matchInline, the JIT-failure fallback path is restructured: the call to byteCodeCompileIfNecessary is now gated by if constexpr (matchFrom == Yarr::MatchFrom::VMThread), and an additional if (!m_regExpBytecode) return -1; bails the compiler thread out when bytecode has not been produced. The commit message states the in-matchInline check intentionally does not take the cell lock because the caller matchConcurrently already holds it.

TOCTOU race on a lazily-compiled cell member between the mutator thread and a JSC concurrent compiler thread that was supposed to be read-only.

JSC compiles a regular expression lazily into two possible forms: Yarr JIT machine code (m_regExpJITCode) for patterns the JIT can handle, and a Yarr bytecode pattern (m_regExpBytecode) interpreted by Yarr::interpret. At runtime, JIT-compiled regex code can return Yarr::JSRegExpResult::JITCodeFailure for inputs it cannot handle, at which point execution punts to the bytecode interpreter. RegExp::matchConcurrently is used by JSC's concurrent compiler threads (DFG/FTL) — when the optimising compiler tries to constant-fold or speculate on a regex match during compilation, it may invoke matching from a thread other than the main VM thread. The convention is that concurrent compiler threads observe heap state without mutating it. cellLock() is a per-cell lock used to synchronise rare cases where mutator-side mutation must be visible to compiler-thread reads. MatchFrom is a template parameter distinguishing VMThread from CompilerThread calling contexts.

RegExp::matchConcurrently runs on a JSC concurrent compiler thread and is intended to read pre-existing JIT code only, not to mutate the RegExp cell. However, a RegExp can be in a state where JIT code exists but the bytecode does not. If the JIT returns JSRegExpResult::JITCodeFailure, the previous code unconditionally called byteCodeCompileIfNecessary on whichever thread was running — including the compiler thread. Meanwhile, the mutator thread executing a regular regex match on the same RegExp could be entering the same path. Two threads could thus check if (m_regExpBytecode) simultaneously, both see null, and both proceed to allocate and assign a new BytecodePattern to the std::unique_ptr member.

🔒

Detailed thread-interleaving analysis and an assessment of whether this race is reachable and corruptible from web content.

Subscribe to read more

🔒

Multiple reusable audit patterns identified across JSC's lazy-compilation and compiler-thread surfaces, with concrete grep targets for variant discovery.

Subscribe to read more