iOS 音视频高级编程:Audio Unit播放FFmpeg解码的音频
<p>本文档描述了iOS播放经FFmpeg解码的音频数据的编程步骤,具体使用Audio Toolbox框架的Audio Session和Audio Unit框架提供的接口实现,在iOS 7及以上平台Audio Session已标识为废弃,改用AVAudioSession实现即可,编程逻辑基本保持一致。所有测试数据均来自iPhone 6、iPhone 6p真机。</p> <h2>1<strong>、FFmpeg解码音频在iOS播放的编程流程</strong></h2> <p>播放音频的开发方式和渲染视频略有区别,音频略微被动,系统主动回调我们指定的函数,在回调函数中,我们向系统传递过来的指针拷贝将要播放的音频数据,而视频的播放是我们主动往屏幕的帧缓冲区刷像素数据。那么,iOS上播放FFmpeg解码后的音频数据,比如AAC,需要如下编程步骤:</p> <ol> <li>AudioSessionInitialize初始化一个iOS应用的音频会话对象</li> <li>配置Audio Session <ul> <li>配置属性 <ul> <li>kAudioSessionCategory_MediaPlayback指定为音频播放</li> <li><em>kAudioSessionProperty_PreferredHardwareIOBufferDuration配置更小的I/O迟延,通常情况不需要设置</em> 。</li> </ul> </li> <li><em>配置属性变化监听器(观察者模式的应用),非最小功能要求,可不实现。</em> <ul> <li>kAudioSessionProperty_AudioRouteChange</li> <li>kAudioSessionProperty_CurrentHardwareOutputVolume</li> </ul> </li> <li>AudioSessionSetActive激活音频会话</li> </ul> </li> <li>配置Audio Unit <ul> <li>描述输出单元AudioComponentDescription</li> <li>获取组件AudioComponent</li> <li>核对输出流格式AudioStreamBasicDescription</li> <li>设置音频渲染回调结构体AURenderCallbackStruct并指定回调函数,这是真正向音频设备提供PCM数据的地方</li> </ul> </li> <li>音频渲染回调函数传入未播放的音频数据</li> <li>释放资源</li> <li>FFmpeg解码流程</li> <li>音频重采样</li> </ol> <p>下面详细描述每个步骤的操作。</p> <h3><strong>1.1、AudioSessionInitialize初始化音频会话</strong></h3> <pre> <code class="language-objectivec">AudioSessionInitialize(NULL, kCFRunLoopCommonModes, sessionInterruptionListener, (__bridge void *) (self)</code></pre> <p>AudioSessionInitialize指定音频回调函数在特定的RunLoop及相应的RunLoop模式下运行及传递给中断函数的用户自定义值。回调函数的说明如下。</p> <p>The interruption listener callback function. The application’s audio session object invokes the callback when the session is interrupted and (if the application is still running) when the interruption ends. Can be NULL. See AudioSessionInterruptionListener.</p> <p>在调用其他音频会话相关服务前,必须先调用此函数。</p> <p>Your application must call this function before making any other Audio Session Services calls. You may activate and deactivate your audio session as needed (see AudioSessionSetActive), but should initialize it only once.</p> <p>回调函数AudioSessionInterruptionListener的签名如下:</p> <pre> <code class="language-objectivec">// Invoked when an audio interruption in iOS begins or ends. typedef void (*AudioSessionInterruptionListener)( void *inClientData, UInt32 inInterruptionState );</code></pre> <ul> <li>参数inClientData在AudioSessionInitialize中指定。</li> <li>参数inInterruptionState表明中断的状态。</li> </ul> <p>初始化完成后,可通过AudioSessionGetProperty查询音频会话相关的信息,比如kAudioSessionProperty_AudioRouteDescription(iOS 5以前使用kAudioSessionProperty_AudioRoute)获取音频输入输出信息,比如输入为麦克风、转出为扬声器。示例代码如下。</p> <pre> <code class="language-objectivec">UInt32 propertySize = sizeof(CFStringRef); CFStringRef route; AudioSessionGetProperty(kAudioSessionProperty_AudioRoute, &propertySize, &route); NSString *audioRoute = CFBridgingRelease(route);</code></pre> <p>kAudioSessionProperty_AudioRouteDescription比kAudioSessionProperty_AudioRoute输出更多信息,如下所示。</p> <ol> <li> <p>kAudioSessionProperty_AudioRoute的输出</p> <pre> <code class="language-objectivec">AudioRoute: Speaker</code></pre> </li> <li> <p>kAudioSessionProperty_AudioRouteDescription的输出</p> <pre> <code class="language-objectivec">AudioRoute: { "RouteDetailedDescription_Inputs" = ( { "RouteDetailedDescription_ChannelDescriptions" = ( { "ChannelDescription_Name" = "iPhone Microphone"; } ); "RouteDetailedDescription_DataSources" = ( { DataSourceID = 1835216945; DataSourceName = Bottom; MicrophoneOrientation = 1651799149; MicrophonePolarPattern = 1869442665; MicrophonePolarPatterns = ( 1869442665 ); MicrophoneRegion = 1819244402; }, { DataSourceID = 1835216946; DataSourceName = Front; MicrophoneOrientation = 1718775412; MicrophonePolarPattern = 1668441188; MicrophonePolarPatterns = ( 1869442665, 1668441188 ); MicrophoneRegion = 1970303090; }, { DataSourceID = 1835216947; DataSourceName = Back; MicrophoneOrientation = 1650549611; MicrophonePolarPattern = 1869442665; MicrophonePolarPatterns = ( 1869442665, 1935827812 ); MicrophoneRegion = 1970303090; } ); "RouteDetailedDescription_HiddenDataSources" = ( { DataSourceID = 1634495520; DataSourceName = All; } ); "RouteDetailedDescription_ID" = 344; "RouteDetailedDescription_IsHeadphones" = 0; "RouteDetailedDescription_Name" = "iPhone Microphone"; "RouteDetailedDescription_NumberOfChannels" = 1; "RouteDetailedDescription_PortType" = MicrophoneBuiltIn; "RouteDetailedDescription_SelectedDataSource" = 1835216946; "RouteDetailedDescription_UID" = "Built-In Microphone"; } ); "RouteDetailedDescription_Outputs" = ( { "RouteDetailedDescription_ChannelDescriptions" = ( { "ChannelDescription_Label" = "-1"; "ChannelDescription_Name" = Speaker; } ); "RouteDetailedDescription_ID" = 345; "RouteDetailedDescription_IsHeadphones" = 0; "RouteDetailedDescription_Name" = Speaker; "RouteDetailedDescription_NumberOfChannels" = 1; "RouteDetailedDescription_PortType" = Speaker; "RouteDetailedDescription_UID" = Speaker; } ); }</code></pre> </li> </ol> <h3><strong>1.2、配置Audio Session</strong></h3> <p>音频会话的配置由属性及属性值变化监听器组成,监听器可按需实现。</p> <p><strong>1.2.1、配置属性</strong></p> <p>音频会话为必备步骤,I/O缓冲区可使用默认值。</p> <p><strong>1.2.1.1、音频回放</strong></p> <p>播放音乐的场景需设置kAudioSessionProperty_AudioCategory为kAudioSessionCategory_MediaPlayback,其他值可表示音频处理、录音等,所有枚举值如下所示。</p> <pre> <code class="language-objectivec">/*! @enum AudioSession audio categories states @abstract These are used with as values for the kAudioSessionProperty_AudioCategory property to indicate the audio category of the AudioSession. @constant kAudioSessionCategory_AmbientSound Use this category for background sounds such as rain, car engine noise, etc. Mixes with other music. @constant kAudioSessionCategory_SoloAmbientSound Use this category for background sounds. Other music will stop playing. @constant kAudioSessionCategory_MediaPlayback Use this category for music tracks. @constant kAudioSessionCategory_RecordAudio Use this category when recording audio. @constant kAudioSessionCategory_PlayAndRecord Use this category when recording and playing back audio. @constant kAudioSessionCategory_AudioProcessing Use this category when using a hardware codec or signal processor while not playing or recording audio. */ enum { kAudioSessionCategory_AmbientSound = 'ambi', kAudioSessionCategory_SoloAmbientSound = 'solo', kAudioSessionCategory_MediaPlayback = 'medi', kAudioSessionCategory_RecordAudio = 'reca', kAudioSessionCategory_PlayAndRecord = 'plar', kAudioSessionCategory_AudioProcessing = 'proc' };</code></pre> <p><strong>1.2.1.2、配置硬件I/O缓冲区</strong></p> <p>当需要更小的I/O缓冲区时设置本属性,指定更小的缓冲区可让音频延时变小,但是占用更多CPU资源。然而,设置的值可能不被系统所采用,可由kAudioSessionProperty_CurrentHardwareIOBufferDuration查询设置后的实际值。文档说明如下。</p> <p>Your preferred hardware I/O buffer duration in seconds. Do not set this property unless you require lower I/O latency than is provided by default.</p> <p>A read/write Float32 value.</p> <p>The actual I/O buffer duration may be different from the value that you request, and can be obtained from the kAudioSessionProperty_CurrentHardwareIOBufferDuration property.</p> <p>Set the buffer size, this will affect the number of samples that get rendered every time the audio callback is fired</p> <p>A small number will get you lower latency audio, but will make your processor work harder</p> <p>参考代码:</p> <pre> <code class="language-objectivec">Float32 preferredBufferSize = 0.0232; AudioSessionSetProperty(kAudioSessionProperty_PreferredHardwareIOBufferDuration, sizeof(preferredBufferSize), &preferredBufferSize);</code></pre> <p>0.0232表示23毫秒。采样率44100、1024个采样点,那么,一包的时长就是</p> <pre> <code class="language-objectivec">1024 / 44100 = 0.0232</code></pre> <p>@暴走大牙在实践中发现,在Android上采集音频,3x23毫秒效果更佳。</p> <p>@二流提供了另一种理解:</p> <pre> <code class="language-objectivec">44100 / 1024 = 1秒 43包 1 / 43 = 0.023</code></pre> <p><strong>1.2.2、配置属性变化监听器</strong></p> <p><strong>1.2.2.1、音频输出变化</strong></p> <p>注册kAudioSessionProperty_AudioRouteChange可让我们知道音频输出变化,比如插入耳机,文档说明如下。</p> <p>A CFDictionaryRef object containing the reason the audio route changed along with details on the previous and current audio route.</p> <p>The dictionary contains the keys and corresponding values described in Audio Route Change Dictionary Keys.</p> <p>The kAudioSessionProperty_AudioRouteChange dictionary is available to your app only by way of the AudioSessionPropertyListener callback function.</p> <p>示例代码:</p> <pre> <code class="language-objectivec">AudioSessionAddPropertyListener(kAudioSessionProperty_AudioRouteChange, sessionPropertyListener, (__bridge void *) (self)</code></pre> <p><strong>1.2.2.2、输出音量变化</strong></p> <p>注册kAudioSessionProperty_CurrentHardwareOutputVolume可让我们知道输出音量出现变化,比如用户增大了音量,文档说明如下。</p> <p>Indicates the current audio output volume as Float32 value between 0.0 and 1.0. Read-only. This value is available to your app by way of a property listener callback function. See AudioSessionAddPropertyListener.</p> <p>示例代码:</p> <pre> <code class="language-objectivec">AudioSessionAddPropertyListener(kAudioSessionProperty_CurrentHardwareOutputVolume, sessionPropertyListener, (__bridge void *) (self))</code></pre> <p><strong>1.2.3、激活音频会话</strong></p> <p>传递true则AudioSessionSetActive激活音频会话,反之则禁用它,可进行多次启、禁用操作。</p> <p>Activating your audio session may interrupt audio sessions belonging to other applications running in the background, depending on categories and priorities. Deactivating your audio session allows other, interrupted audio sessions to resume.</p> <p>When another active audio session does not allow mixing, attempting to activate your audio session may fail.</p> <p>When active is true this call may fail if the currently active AudioSession has a higher priority.</p> <pre> <code class="language-objectivec">AudioSessionSetActive(YES)</code></pre> <h3><strong>1.3、配置Audio Unit</strong></h3> <p>Audio Unit才是真正进行音频输出的执行者。</p> <p><strong>1.3.1、描述输出单元</strong></p> <p>AudioComponentDescription用于描述音频组件的唯一性和标识符,拥有这些字段:</p> <ul> <li>componentType: OSType,用唯一的4字节码标识了音频组件的通用类型。</li> <li>componentSubType: OSType,表示此音频组件描述的具体类型。</li> <li>componentManufacturer: OSType,厂家标识符,只能设置为苹果公司。</li> <li>componentFlags: OSType,必须设置为0,除非已知请求的具体值。</li> <li>componentFlagsMask: OSType,必须设置为0,除非已知请求的具体值。</li> </ul> <p>componentType可设置为如下值:</p> <ul> <li> <p>kAudioUnitType_Output</p> <p>An output unit provides input, output, or both input and output simultaneously. It can be used as the head of an audio unit processing graph.</p> </li> <li> <p>kAudioUnitType_MusicDevice</p> <p>An instrument unit can be used as a software musical instrument, such as a sampler or synthesizer. It responds to MIDI (Musical Instrument Digital Interface) control signals and can create notes.</p> </li> <li> <p>kAudioUnitType_MusicEffect</p> <p>An effect unit that can respond to MIDI control messages, typically through a mapping of MIDI messages to parameters of the audio unit’s DSP algorithm.</p> </li> <li> <p>kAudioUnitType_FormatConverter</p> <p>A format converter unit can transform audio formats, such as performing sample rate conversion. A format converter is also appropriate for deferred rendering and for effects such as varispeed. A format converter unit can ask for as much or as little audio input as it needs to produce a given output, while still completing its rendering within the time represented by the output buffer. For effect-like format converters, such as pitch shifters, it is common to provide both a realtime and an offline version. OS X, for example, includes Time-Pitch and Varispeed audio units in both realtime and offline versions.</p> </li> <li> <p>kAudioUnitType_Effect</p> <p>An effect unit repeatedly processes a number of audio input samples to produce the same number of audio output samples. Most commonly, an effect unit has a single input and a single output. Some effects take side-chain inputs as well. Effect units can be run offline, such as to process a file without playing it, but are expected to run in realtime.</p> </li> <li> <p>kAudioUnitType_Mixer</p> <p>A mixer unit takes a number of input channels and mixes them to provide one or more output channels. For example, the kAudioUnitSubType_StereoMixer audio unit in OS X takes multiple mono or stereo inputs and produce a single stereo output.</p> </li> <li> <p>kAudioUnitType_Panner</p> <p>A panner unit is a specialized effect unit that distributes one or more channels in a single input to one or more channels in a single output. Panner units must support a set of standard audio unit parameters that specify panning coordinates.</p> </li> <li> <p>kAudioUnitType_OfflineEffect</p> <p>An offline effect unit provides digital signal processing of a sort that cannot proceed in realtime. For example, level normalization requires examination of an entire sound, beginning to end, before the normalization factor can be calculated. As such, offline effect units also have a notion of a priming stage that can be performed before the actual rendering/processing phase is executed.</p> </li> <li> <p>kAudioUnitType_Generator</p> <p>A generator unit provides audio output but has no audio input. This audio unit type is appropriate for a tone generator. Unlike an instrument unit, a generator unit does not have a control input.</p> </li> </ul> <p>componentSubType可设置为如下值:</p> <ul> <li> <p>kAudioUnitSubType_GenericOutput</p> <p>An audio unit that responds to start/stop calls and provides basic services for converting to and from linear PCM formats.</p> </li> <li> <p>kAudioUnitSubType_RemoteIO</p> <p>An audio unit that interfaces to the audio inputs and outputs of iPhone OS devices. Bus 0 provides output to hardware and bus 1 accepts input from hardware. Called an I/O audio unit or sometimes a Remote I/O audio unit.</p> </li> <li> <p>kAudioUnitSubType_VoiceProcessingIO</p> <p>An audio unit that interfaces to the audio inputs and outputs of iPhone OS devices and provides voice processing features. Bus 0 provides output to hardware and bus 1 accepts input from hardware. See the Voice-Processing I/O Audio Unit Properties enumeration for the identifiers for this audio unit’s properties.</p> </li> </ul> <p>示例代码如下。</p> <pre> <code class="language-objectivec">AudioComponentDescription description = {0}; description.componentType = kAudioUnitType_Output; description.componentSubType = kAudioUnitSubType_RemoteIO; description.componentManufacturer = kAudioUnitManufacturer_Apple;</code></pre> <p><strong>1.3.2、获取组件</strong></p> <p>现在,由前面配置的AudioComponentDescription查找系统的音频处理插件链表是否存在对应的结果,存在才可处理我们将要提供的音频数据。在可处理的情况下返回AudioComponent,根据此音频组件创建一个音频组件实例。</p> <pre> <code class="language-objectivec">AudioComponent component = AudioComponentFindNext(NULL, &description); AudioComponentInstanceNew(component, &_audioUnit);</code></pre> <p>AudioComponentFindNext的功能描述如下:</p> <p>Finds the next component that matches a specified AudioComponentDescription structure after a specified audio component.</p> <p><strong>1.3.2、核对输出流格式</strong></p> <p>这里主要为了设置AudioStreamBasicDescription成当前设置的采样率。</p> <pre> <code class="language-objectivec">AudioStreamBasicDescription _outputFormat; UInt32 size = sizeof(AudioStreamBasicDescription); AudioUnitGetProperty(_audioUnit, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Input, 0, &_outputFormat, &size); _outputFormat.mSampleRate = _samplingRate; AudioUnitSetProperty(_audioUnit, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Input, 0, &_outputFormat, size); UInt32 _numBytesPerSample = _outputFormat.mBitsPerChannel / 8; UInt32 _numOutputChannels = _outputFormat.mChannelsPerFrame;</code></pre> <p>_samplingRate为设备当前的采样率,查询代码如下:</p> <pre> <code class="language-objectivec">AudioSessionGetProperty(kAudioSessionProperty_CurrentHardwareSampleRate, &size, &_samplingRate)</code></pre> <p>有关AudioStreamBasicDescription各字段的说明如下:</p> <p>An audio data format specification for a stream of audio.</p> <p>Fields</p> <p><em>mSampleRate</em></p> <p>The number of frames per second of the data in the stream, when the stream is played at normal speed. For compressed formats, this field indicates the number of frames per second of equivalent decompressed data.</p> <p>The <em>mSampleRate</em> field must be nonzero, except when this structure is used in a listing of supported formats (see “kAudioStreamAnyRate”).</p> <p><em>mFormatID</em></p> <p>An identifier specifying the general audio data format in the stream. See “Audio Data Format Identifiers”. This value must be nonzero.</p> <p><em>mFormatFlags</em></p> <p>Format-specific flags to specify details of the format. Set to 0 to indicate no format flags. See “Audio Data Format Identifiers” for the flags that apply to each format.</p> <p><em>mBytesPerPacket</em></p> <p>The number of bytes in a packet of audio data. To indicate variable packet size, set this field to 0. For a format that uses variable packet size, specify the size of each packet using an AudioStreamPacketDescription structure.</p> <p><em>mFramesPerPacket</em></p> <p>The number of frames in a packet of audio data. For uncompressed audio, the value is 1. For variable bit-rate formats, the value is a larger fixed number, such as 1024 for AAC. For formats with a variable number of frames per packet, such as Ogg Vorbis, set this field to 0.</p> <p><em>mBytesPerFrame</em></p> <p>The number of bytes from the start of one frame to the start of the next frame in an audio buffer. Set this field to 0 for compressed formats.</p> <p>For an audio buffer containing interleaved data for n channels, with each sample of type AudioSampleType, calculate the value for this field as follows:</p> <pre> <code class="language-objectivec">mBytesPerFrame = n * sizeof (AudioSampleType);</code></pre> <p>For an audio buffer containing noninterleaved (monophonic) data, also using AudioSampleType samples, calculate the value for this field as follows:</p> <pre> <code class="language-objectivec">mBytesPerFrame = sizeof (AudioSampleType);</code></pre> <p><em>mChannelsPerFrame</em></p> <p>The number of channels in each frame of audio data. This value must be nonzero.</p> <p><em>mBitsPerChannel</em></p> <p>The number of bits for one audio sample. For example, for linear PCM audio using the kAudioFormatFlagsCanonical format flags, calculate the value for this field as follows:</p> <pre> <code class="language-objectivec">mBitsPerChannel = 8 * sizeof (AudioSampleType);</code></pre> <p>Set this field to 0 for compressed formats.</p> <p><em>mReserved</em></p> <p>Pads the structure out to force an even 8-byte alignment. Must be set to 0.</p> <p>You can configure an audio stream basic description (ASBD) to specify a linear PCM format or a constant bit rate (CBR) format that has channels of equal size. For variable bit rate (VBR) audio, and for CBR audio where the channels have unequal sizes, each packet must additionally be described by an AudioStreamPacketDescription structure.</p> <p>A field value of 0 indicates that the value is either unknown or not applicable to the format.</p> <p>Always initialize the fields of a new audio stream basic description structure to zero, as shown here:</p> <pre> <code class="language-objectivec">AudioStreamBasicDescription myAudioDataFormat = {0};</code></pre> <p>To determine the duration represented by one packet, use the mSampleRate field with the mFramesPerPacket field, as follows:</p> <pre> <code class="language-objectivec">duration = (1 / mSampleRate) * mFramesPerPacket</code></pre> <p>In Core Audio, the following definitions apply:</p> <ul> <li>An <em>audio stream</em> is a continuous series of data that represents a sound, such as a song.</li> <li>A <em>channel</em> is a discrete track of monophonic audio. A monophonic stream has one channel; a stereo stream has two channels.</li> <li>A <em>sample</em> is single numerical value for a single audio channel in an audio stream.</li> <li>A <em>frame</em> is a collection of time-coincident samples. For instance, a linear PCM stereo sound file has two samples per frame, one for the left channel and one for the right channel.</li> <li>A <em>packet</em> is a collection of one or more contiguous frames. A packet defines the smallest meaningful set of frames for a given audio data format, and is the smallest data unit for which time can be measured. In linear PCM audio, a packet holds a single frame. In compressed formats, it typically holds more; in some formats, the number of frames per packet varies.</li> <li>The <em>sample rate</em> for a stream is the number of frames per second of uncompressed (or, for compressed formats, the equivalent in decompressed) audio.</li> </ul> <p><strong>1.3.3、设置音频渲染回调函数</strong></p> <pre> <code class="language-objectivec">AURenderCallbackStruct callbackStruct; callbackStruct.inputProc = renderCallback; callbackStruct.inputProcRefCon = (__bridge void *) (self); AudioUnitSetProperty(_audioUnit, kAudioUnitProperty_SetRenderCallback, kAudioUnitScope_Input, 0, &callbackStruct, sizeof(callbackStruct));</code></pre> <p><strong>1.4、音频渲染回调函数传入未播放的音频数据</strong></p> <p>回调函数会传递音频缓冲区列表AudioBufferList给我们,在此先对此进行重置操作。</p> <pre> <code class="language-objectivec">for (int iBuffer = 0; iBuffer < ioData->mNumberBuffers; ++iBuffer) { memset(ioData->mBuffers[iBuffer].mData, 0, ioData->mBuffers[iBuffer].mDataByteSize); }</code></pre> <p>然后逐个传递已解码的音频包给回调函数的AudioBufferList参数。</p> <h2><strong>1.5、释放资源</strong></h2> <p>结束音频输出时,要先停止再逆初始化,流程如下所示。</p> <pre> <code class="language-objectivec">AudioOutputUnitStop(_audioUnit) AudioUnitInitialize(_audioUnit); AudioComponentInstanceDispose(_audioUnit); AudioSessionSetActive(NO); AudioSessionRemovePropertyListenerWithUserData(kAudioSessionProperty_AudioRouteChange, sessionPropertyListener, (__bridge void *) (self)); AudioSessionRemovePropertyListenerWithUserData(kAudioSessionProperty_CurrentHardwareOutputVolume, sessionPropertyListener, (__bridge void *) (self));</code></pre> <h2><strong>1.6、FFmpeg解码流程</strong></h2> <p>以FFmpeg 3.0为例,每次循环读取音频包数据,解码时可能有剩余数据,故需要判断是否解码完整,否则继续解码当前音频包的剩余数据,接着进行音频重采样,示例代码如下:</p> <pre> <code class="language-objectivec">av_register_all(); printf("%s Using FFmpeg: %s\n", __FUNCTION__, av_version_info()); AVFormatContext *context = avformat_alloc_context(); int ret; NSString *path = [[NSBundle mainBundle] pathForResource:@"Forrest_Gump_IMAX.mp4" ofType:nil]; const char *url = path.UTF8String; avformat_open_input(&context, url, NULL, NULL); avformat_find_stream_info(context, NULL); av_dump_format(context, 0, url, 0); int audioStreamIndex = -1; for (int i = 0; i < context->nb_streams; ++i) { if (AVMEDIA_TYPE_AUDIO == context->streams[i]->codec->codec_type) { audioStreamIndex = i; break; } } if (-1 == audioStreamIndex) { printf("%s audio stream not found.\n", __FUNCTION__); exit(-1); } AVStream *audioStream = context->streams[audioStreamIndex]; AVCodec *audioCodec = avcodec_find_decoder(context->audio_codec_id); avcodec_open2(audioStream->codec, audioCodec, NULL); AVPacket packet, *pkt = &packet; AVFrame *audioFrame = av_frame_alloc(); int gotFrame = 0; while (0 == av_read_frame(context, pkt)) { if (audioStreamIndex == pkt->stream_index) { // 循环解码,直到当前包无剩余数据 avcodec_decode_audio4(audioStream->codec, audioFrame, &gotFrame, pkt); if (gotFrame) // 进行音频重采样 } av_packet_unref(pkt); }</code></pre> <h2><strong>1.7、音频重采样</strong></h2> <p>根据之前看过的雷霄骅博士的博客,目前,FFmpeg 3.0 avcodec_decode_audio4函数解码出来的音频数据是单精度浮点类型,值范围为[0, 1.0]。iOS可播放Float类型的音频数据,范围和FFmpeg解码出来的PCM不同,故需要进行重采样。</p> <pre> <code class="language-objectivec">const int bufSize = av_samples_get_buffer_size(NULL, _audioCodecCtx->channels, _audioFrame->nb_samples, _audioCodecCtx->sample_fmt, 1); const NSUInteger sizeOfS16 = 2; const NSUInteger numChannels = _audioCodecCtx->channels; int numFrames = bufSize / (sizeOfS16 * numChannels); SInt16 *s16p = (SInt16 *) _audioFrame->data[0]; if (_swrContext) { if (!_swrBuffer || _swrBufferSize < (bufSize * 2)) { _swrBufferSize = bufSize * 2; _swrBuffer = realloc(_swrBuffer, _swrBufferSize); } Byte *outbuf[2] = {_swrBuffer, 0}; numFrames = swr_convert(_swrContext, outbuf, numFrames * 2, (const uint8_t **) _audioFrame->data, numFrames); if (numFrames < 0) { NSLog(@"fail resample audio"); return nil; } s16p = _swrBuffer; } const NSUInteger numElements = numFrames * numChannels; NSMutableData *data = [NSMutableData dataWithLength:numElements * sizeof(float)]; vDSP_vflt16(s16p, 1, data.mutableBytes, 1, numElements); float scale = 1.0 / (float) INT16_MAX; vDSP_vsmul(data.mutableBytes, 1, &scale, data.mutableBytes, 1, numElements);</code></pre> <p>_swrContext为重采样上下文,将给定的音频源的声道、声道布局和采样率转换为输出设备的声道、声道布局和采样率,具体转换由swr_convert函数完成。然而,上述代码存在重采样错误,主要体现中播放flt16类型的音频时,计算出来的numFrames值比AVFrame.nb_samples值大,比如,对于AAC编码的立体声音频,numFrames等于2048,而nb_samples只有1024,这导致播放时出现失真现象,修复代码在文档后续部分给出。_swrContext的初始化代码如下所示。</p> <pre> <code class="language-objectivec">_swrContext = swr_alloc_set_opts(NULL, av_get_default_channel_layout(hw.numOutputChannels), AV_SAMPLE_FMT_S16, hw.samplingRate, av_get_default_channel_layout(audioCodecCtx->channels), audioCodecCtx->sample_fmt, audioCodecCtx->sample_rate, 0, NULL);</code></pre> <p>根据FFmpeg注释,下面代码完成了平面浮点数采样格式至交错的16位带符号整数、从48kHz至44.1kHz的降采样与5.1声道至立体声的降混合的转换,当然最终转换得调用swr_convert函数。</p> <pre> <code class="language-objectivec">SwrContext *swr = swr_alloc(); av_opt_set_channel_layout(swr, "in_channel_layout", AV_CH_LAYOUT_5POINT1, 0); av_opt_set_channel_layout(swr, "out_channel_layout", AV_CH_LAYOUT_STEREO, 0); av_opt_set_int(swr, "in_sample_rate", 48000, 0); av_opt_set_int(swr, "out_sample_rate", 44100, 0); av_opt_set_sample_fmt(swr, "in_sample_fmt", AV_SAMPLE_FMT_FLTP, 0); av_opt_set_sample_fmt(swr, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0);</code></pre> <p>等效代码为</p> <pre> <code class="language-objectivec">SwrContext *swr = swr_alloc_set_opts(NULL, // we're allocating a new context AV_CH_LAYOUT_STEREO, // out_ch_layout AV_SAMPLE_FMT_S16, // out_sample_fmt 44100, // out_sample_rate AV_CH_LAYOUT_5POINT1, // in_ch_layout AV_SAMPLE_FMT_FLTP, // in_sample_fmt 48000, // in_sample_rate 0, // log_offset NULL); // log_ctx</code></pre> <h3><strong>1.8、Accelerate框架在重采样过程的应用</strong></h3> <p>音频重采样后,还调用了几个vDSP_函数,如下所示。这些函数都是Accelerate框架的成员,它提供了音频、信号处理、图像处理等应用需要的函数。</p> <pre> <code class="language-objectivec">vDSP_vflt16(s16p, 1, data.mutableBytes, 1, numElements); float scale = 1.0 / (float) INT16_MAX; vDSP_vsmul(data.mutableBytes, 1, &scale, data.mutableBytes, 1, numElements);</code></pre> <p>vDSP_vflt16将非交错的16位带符号整数(non-interleaved 16-bit signed integers)转换成单精度浮点数。 <strong>为什么是16位带符号整数?</strong> 原因是,这取决于AudioStreamBasicDescription.mBitsPerChannel字段的值。当AudioStreamBasicDescription.mBitsPerChannel为16时,则调用vDSP_vflt16。当AudioStreamBasicDescription.mBitsPerChannel为32时,则调用vDSP_vflt32。</p> <h2><strong>2、程序运行存在的问题</strong></h2> <p>基于前面的实现,进行一些测试。</p> <h3><strong>2.1、MP3播放正常</strong></h3> <p>播放正常,信息如下。</p> <pre> <code class="language-objectivec">Input #0, mp3, from '1A Hero s Sacrifice.mp3': Metadata: major_brand : mp42 minor_version : 0 compatible_brands: isommp42 encoder : Lavf55.19.100 Duration: 00:07:05.09, start: 0.025057, bitrate: 128 kb/s Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 128 kb/s</code></pre> <p>执行日志如下。</p> <pre> <code class="language-objectivec">AudioRoute: Speaker We've got 1 output channels Current sampling rate: 44100.000000 Current output volume: 0.687500 Current output bytes per sample: 4 Current output num channels: 2 audio codec smr: 44100 fmt: 6 chn: 2 tb: 0.000000 resample audio device smr: 44100 fmt: 4 chn: 2</code></pre> <h3>2.2、播放MP4中音频存在失真现象</h3> <p>简单起见,只播放MP4中音频,有明显失真现象,文件信息如下。</p> <pre> <code class="language-objectivec">Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'Forrest_Gump_IMAX.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf56.19.100 Duration: 00:00:31.21, start: 0.036281, bitrate: 878 kb/s Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 640x352, 748 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 47.95 tbc (default) Metadata: handler_name : VideoHandler Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default) Metadata: handler_name : SoundHandler</code></pre> <p>执行日志。</p> <pre> <code class="language-objectivec">AudioRoute: Speaker We've got 1 output channels Current sampling rate: 44100.000000 Current output volume: 0.687500 Current output bytes per sample: 4 Current output num channels: 2 audio codec smr: 44100 fmt: 8 chn: 2 tb: 0.000023 resample audio device smr: 44100 fmt: 4 chn: 2</code></pre> <p>不同于播放MP3文件的是,音频轨的编码信息。设备相关信息并不发生变化。下面做些小测试,失真问题基本不会由它们引起,保险起见,还是实测为准:</p> <ul> <li>替换kCFRunLoopDefaultMode为kCFRunLoopCommonModes,依然使用主运行循环,不进行任何屏幕操作,失真现象一样。</li> <li>非主运行循环与kCFRunLoopCommonModes。 <em>失真现象一样</em></li> </ul> <p>排除CPU资源占用问题后,那么就剩一种情况了, <strong>重采样时计算有误</strong> 。下面修复 - [KxMovieDecoder handAudio] 方法,此时播放fltp数据类型的音频不再失真,同时播放S16P类型的MP3也无失真现象。</p> <pre> <code class="language-objectivec">- (KxAudioFrame *) handleAudioFrame { // ... numFrames = swr_convert(_swrContext, outbuf, _audioFrame->nb_samples * 2, (const uint8_t **)_audioFrame->data, _audioFrame->nb_samples); // ...</code></pre> <h2> </h2> <h2> </h2> <p>来自:http://www.jianshu.com/p/0d5315bb81ee</p> <p> </p>
本文由用户 peix9330 自行上传分享,仅供网友学习交流。所有权归原作者,若您的权利被侵害,请联系管理员。
转载本站原创文章,请注明出处,并保留原始链接、图片水印。
本站是一个以用户分享为主的开源技术平台,欢迎各类分享!