B3 32-Bit Unsigned Division-by-Constant Optimization

15c111a

Source/JavaScriptCore/b3/B3ReduceStrength.cpp

+                    if (magic.add) {
+                        if constexpr (isARM64() || isX86()) {
+                            ASSERT(!magic.preShift);
+                            uint64_t fullMagic = static_cast<uint64_t>(magic.magicMultiplier) | (1ULL << 32);
+                            uint64_t shiftedMagic = fullMagic << (31 - magic.shift);
+                            Value* ext = m_insertionSet.insert<Value>(m_index, ZExt32, m_value->origin(), dividend);
+                            Value* mulHigh = m_insertionSet.insert<Value>(
+                                m_index, UMulHigh, m_value->origin(), ext,
+                                m_insertionSet.insert<Const64Value>(m_index, m_value->origin(), static_cast<int64_t>(shiftedMagic)));
+                            replaceWithNew<Value>(Trunc, m_value->origin(), mulHigh);
+                            break;
+                        }
+                    }

This adds a 32-bit unsigned integer division-by-constant optimization in B3's strength reduction pass using the 33-bit magic multiplier technique from Mitsunari & Hoshino (2026). When the magic number requires a 33-bit constant (magic.add == true), the patch replaces a 5-operation sequence with a single UMulHigh64: x / d = Trunc(UMulHigh64(ZExt32(x), c << (31 - shift))).

Before (magic.add == true, 5 ops):          After (1 op):
  ZExt32(x)                                   ZExt32(x)
     │                                            │
  UMulHigh(x, magic)  ← 32-bit mul high       UMulHigh64(ZExt32(x), fullMagic << (31-shift))
     │                                            │
  sub (x - mulHigh)                            Trunc
     │
  Shr (sub, 1)
     │
  Add (shr + mulHigh)
     │
  Shr (result, shift)

Significance

Benchmarks show ~1.9x speedup on Wasm udiv-by-constant workloads, but any arithmetic error in the strength-reduction transform produces silently wrong division results rather than a crash.

The 31 - magic.shift expression: if magic.shift can legally be > 31 for any valid 32-bit divisor, this wraps and shiftedMagic becomes garbage. The ASSERT(!magic.preShift) only fires in debug builds — if a future divisor causes magic.add && magic.preShift > 0, release builds silently skip the preShift. static_cast<int64_t>(shiftedMagic) passes a uint64_t with a potentially-set high bit as a signed value into the IR constant — the UMulHigh semantics (signed vs. unsigned) need to be verified.