← All issues

SIMD shuffle strength-reduction in JSC's B3 backend

8eb745f

Source/JavaScriptCore/jit/SIMDShuffle.h

+static std::optional<SIMDShuffleVector> composeShuffle(SIMDShuffleVector lhsMask, SIMDShuffleVector rhsMask)
+{
+ // Compose two shuffle permutations: result[i] = lhsMask[rhsMask[i]]
+ // If any index in rhsMask points into lhsMask out-of-range, bail.
+ SIMDShuffleVector result;
+ for (unsigned i = 0; i < 16; ++i) {
+ uint8_t idx = rhsMask[i];
+ if (idx >= 32)
+ return std::nullopt;
+ result[i] = lhsMask[idx];
+ }
+ return result;
+}

Source/JavaScriptCore/b3/B3ReduceSIMDShuffle.cpp

+void reduceSIMDShuffle(Procedure& proc)
+{
+ // Analyze VectorSwizzle(VectorSwizzle(...)) chains and attempt
+ // to collapse them using composeShuffle, then re-run canonical
+ // pattern recognition on the composed mask.
+ ...
+}

B3 (Bare Bones Backend) is JSC's high-level JIT IR, sitting between the Wasm/JS frontend and the final Air (assembly-level) backend. WebAssembly's i8x16.shuffle selects 16 output bytes from two 128-bit input vectors using a 16-byte index mask; B3 represents this as VectorSwizzle. The naive lowering emits ARM64's tbl instruction (vector table lookup), which is general but slow. ARM64 NEON has specialized instructions for common permutation patterns — UZP (deinterleave), ZIP (interleave), TRN (transpose), EXT (extract/rotate), and REV (reverse) — each executing in fewer cycles than a table lookup.

This commit adds a new B3ReduceSIMDShuffle phase that inspects VectorSwizzle byte-index patterns at compile time and substitutes the equivalent NEON instruction. Pattern matchers (tryMatchCanonicalBinary, tryMatchCanonicalUnary) test the 16-byte mask against known instruction semantics. The phase also handles VectorSwizzle chains: when a VectorSwizzle feeds another VectorSwizzle, composeShuffle algebraically folds the two permutation tables into one (result[i] = lhsMask[rhsMask[i]]) and re-runs pattern recognition on the composed result. A separate ARM64 SHA3 XAR lowering rule is introduced for XOR-rotate patterns, gated on isARM64_SHA3().

Wasm i8x16.shuffle
        │
        ▼
  B3 VectorSwizzle(a, b, mask[16])
        │
        ├─► B3ReduceSIMDShuffle (new phase):
        │     ├─ VectorSwizzle(VectorSwizzle(a,b,m1), c, m2)
        │     │      └─► composeShuffle(m1, m2) → single VectorSwizzle
        │     └─► tryMatchCanonical*(mask) → specialized opcode
        │           UZP1/2, ZIP1/2, TRN1/2, EXT, REV, DupElement
        │
        └─► B3LowerToAir → ARM64 NEON instruction
              tbl (generic fallback) or uzp1/zip1/ext/rev*/...

This adds substantial new JIT code generation paths for WebAssembly SIMD on ARM64, replacing generic table lookups with specialized NEON instructions — pattern-matching JIT transforms are historically rich ground for miscompilation bugs that can yield incorrect computation or security-relevant type confusion.

🔒

New JIT pattern-matching for SIMD reductions introduces several correctness-sensitive transforms — audit directions for the fold and recognition logic are included.

Subscribe to read more