SIMD shuffle strength-reduction in JSC's B3 backend

8eb745f

Source/JavaScriptCore/jit/SIMDShuffle.h

+static std::optional<SIMDShuffleVector> composeShuffle(SIMDShuffleVector lhsMask, SIMDShuffleVector rhsMask)
+{
+    // Compose two shuffle permutations: result[i] = lhsMask[rhsMask[i]]
+    // If any index in rhsMask points into lhsMask out-of-range, bail.
+    SIMDShuffleVector result;
+    for (unsigned i = 0; i < 16; ++i) {
+        uint8_t idx = rhsMask[i];
+        if (idx >= 32)
+            return std::nullopt;
+        result[i] = lhsMask[idx];
+    }
+    return result;
+}

Source/JavaScriptCore/b3/B3ReduceSIMDShuffle.cpp

+void reduceSIMDShuffle(Procedure& proc)
+{
+    // Analyze VectorSwizzle(VectorSwizzle(...)) chains and attempt
+    // to collapse them using composeShuffle, then re-run canonical
+    // pattern recognition on the composed mask.
+    ...
+}

B3 (Bare Bones Backend) is JSC's high-level JIT IR, sitting between the Wasm/JS frontend and the final Air (assembly-level) backend. WebAssembly's i8x16.shuffle selects 16 output bytes from two 128-bit input vectors using a 16-byte index mask; B3 represents this as VectorSwizzle. The naive lowering emits ARM64's tbl instruction (vector table lookup), which is general but slow. ARM64 NEON has specialized instructions for common permutation patterns — UZP (deinterleave), ZIP (interleave), TRN (transpose), EXT (extract/rotate), and REV (reverse) — each executing in fewer cycles than a table lookup.

This commit adds a new B3ReduceSIMDShuffle phase that inspects VectorSwizzle byte-index patterns at compile time and substitutes the equivalent NEON instruction. Pattern matchers (tryMatchCanonicalBinary, tryMatchCanonicalUnary) test the 16-byte mask against known instruction semantics. The phase also handles VectorSwizzle chains: when a VectorSwizzle feeds another VectorSwizzle, composeShuffle algebraically folds the two permutation tables into one (result[i] = lhsMask[rhsMask[i]]) and re-runs pattern recognition on the composed result. A separate ARM64 SHA3 XAR lowering rule is introduced for XOR-rotate patterns, gated on isARM64_SHA3().

Wasm i8x16.shuffle
        │
        ▼
  B3 VectorSwizzle(a, b, mask[16])
        │
        ├─► B3ReduceSIMDShuffle (new phase):
        │     ├─ VectorSwizzle(VectorSwizzle(a,b,m1), c, m2)
        │     │      └─► composeShuffle(m1, m2) → single VectorSwizzle
        │     └─► tryMatchCanonical*(mask) → specialized opcode
        │           UZP1/2, ZIP1/2, TRN1/2, EXT, REV, DupElement
        │
        └─► B3LowerToAir → ARM64 NEON instruction
              tbl (generic fallback) or uzp1/zip1/ext/rev*/...

Significance

This adds substantial new JIT code generation paths for WebAssembly SIMD on ARM64, replacing generic table lookups with specialized NEON instructions — pattern-matching JIT transforms are historically rich ground for miscompilation bugs that can yield incorrect computation or security-relevant type confusion.

Audit directions

Aaa Aaaaaaaaaaaaaaaa Aaaaaaaa Aaaaaaa Aaaaaaaaaaaaaaaaaaaaa Aaaa a Aaaaaa Aaaaa Aa Aaaa Aa Aaa Aaaa Aaaaaaaa Aaaaaaaaa Aaaaaaa Aaa Aa Aa Aaaaaa Aa Aaaaaaaa Aaaaa Aaa Aaaaa Aaa Aaaa Aaaa Aaaaaaaaa Aaaaaaaaa a Aaaaaaaaa Aaaa Aaa Aaaaa Aaaaaa Aaaaaaaaaaa Aaaa Aaaaaa Aaa Aaaa Aaaaa Aa Aaaaa Aaaaaaaaa Aaa Aaaaaaaaaaaaaaaaaaaaaaaaa a Aaaaaaaaaaaaaaaaaaaaaaaa Aaaaaaaa Aaaa Aaaaaaa Aaaaaaaaaaaa Aaaa Aaaaa Aaaaaaaaaaaaa Aaaaaaaaaaaaaa Aaaaaaaaaa a Aaaa Aaaaaaaaaa Aa Aaaa Aaa Aaaaaa Aaaaa Aaaa Aaaaaaaa Aaaaaaa Aaaaaaaaa Aaaaaaa Aaa Aaaaaaaaaaaaaa a Aaaaaaaaa Aaaaaaaaa Aa a Aaaaaaaa Aaaaaaaaaaa Aaaa Aaaa Aaaa Aaaaaa Aaa Aaaa Aaaa Aaaaaa Aaa Aaaaaaaaaaaaaaa Aaaaaaaa a Aaaaaaaaaaaaaa Aaaaaa Aaaa Aaaaa Aa a Aaaaaaaaa Aaaaaaaaaaaaaaa Aaa Aaaa Aaa Aaaaaaaa Aaaaaaaaaa a Aaa Aaaaa Aaaaaa Aaaaaaaaaaa Aaaaaaaa Aaaa Aaaaa Aa Aaaaaaaaaaaaaaaa a Aaaaaaaaaaaaa Aa Aaa Aaaaaaaa Aa Aaaaaaaaa Aaaaaaaaaaaaaaaaaa Aaaaaaaa Aaaa Aa Aaaa a Aaaaaaaaaaa Aaa Aaaaaaaaaaa Aaaaaaaaaaa Aaaaaaaaaaaaaa Aaaaa

🔒

New JIT pattern-matching for SIMD reductions introduces several correctness-sensitive transforms — audit directions for the fold and recognition logic are included.

Subscribe to read more