← All issues

HTMLMediaElement preservesPitch=true through AudioContext

1774dbe

Source/WebCore/Modules/webaudio/MediaElementAudioSourceNode.cpp

-void MediaElementAudioSourceNode::setPlaybackRate(double playbackRate)
-{
- playbackRate = abs(playbackRate);
- if (!playbackRate || playbackRate == m_playbackRate)
- return;
- Locker locker { m_processLock };
- m_playbackRate = playbackRate;
- updateResamplerIfNeeded();
-}
 
void MediaElementAudioSourceNode::updateResamplerIfNeeded()
{
- // Account for both sample rate conversion and playback rate
- double effectiveSampleRate = m_sourceSampleRate * m_playbackRate;
- if (effectiveSampleRate != sampleRate()) {
- double scaleFactor = effectiveSampleRate / sampleRate();
+ if (m_sourceSampleRate != sampleRate()) {
+ double scaleFactor = m_sourceSampleRate / sampleRate();
m_multiChannelResampler = makeUnique<MultiChannelResampler>(scaleFactor, ...);
} else {
// Bypass resampling.

When an HTMLMediaElement is connected to an AudioContext via createMediaElementSource(), audio flows through a provider chain that crosses the GPU process boundary on Apple platforms. The GPU process runs AVPlayer and taps the audio stream via AVAudioMix, then ships raw samples over IPC to the WebContent process where the Web Audio graph consumes them. At non-1x playback rates, AVPlayer's varispeed mode causes the tap to emit samples at a proportionally different rate — effectively a sample-rate shift with a pitch side-effect. Previously, MediaElementAudioSourceNode tried to compensate by baking the playback rate into the resampler's scale factor, but this didn't work through the cross-process path.

This commit restructures the approach entirely. A new PitchShiftAudioUnit class wraps CoreAudio's AUTimePitch AudioUnit, inserted into the AudioSourceProviderAVFObjC provider layer in the GPU process. When preservesPitch=true, the pitch shift unit counteracts the varispeed pitch distortion after the tap but before samples cross IPC to the WebContent process. The MediaElementAudioSourceNode resampler is simplified — it no longer factors in playback rate at all, handling only sample-rate mismatches between the source and the AudioContext. playbackRate and preservesPitch state are now propagated as IPC messages from GPU→WebContent.

AVPlayer (GPU process)
  └─[varispeed tap]─► raw samples at N×sampleRate
                         │
             AudioSourceProviderAVFObjC
               ├─ preservesPitch=true  → PitchShiftAudioUnit (AUTimePitch)
               │                            └─► pitch-corrected samples at sampleRate
               └─ preservesPitch=false → MultiChannelSampleRateConverter
                                            └─► resampled (pitch-shifted) samples
                         │  [IPC: setPlaybackRate / setPreservesPitch]
                         ▼
             WebContent process
               └─ MediaElementAudioSourceNode
                    └─ simple sample-rate resampler (no rate factor)
                         └─► AudioContext graph

This fixes a long-standing behavioral gap where preservesPitch was silently ignored when audio was routed through createMediaElementSource(). More importantly, it introduces a new cross-process IPC surface, a new CoreAudio AudioUnit callback chain, and non-trivial audio pipeline restructuring — all high-value attack surface for audio-related vulnerabilities.

🔒

The new CoreAudio callback chain and cross-process state synchronization introduce several timing-sensitive edge cases worth auditing.

Subscribe to read more