VTDecompressionSession을 이용한 H.264 비디오 코덱

iOS Swift 2023. 6. 7. 17:34

이전 글에서 네트워크를 통해 Elementray Stream 형식으로 들어온 NAL 유닛을 CMSampleBuffer로 변환하는 과정을 보았습니다. 이 샘플 버퍼에는 H.264로 압축된 영상이 있는데 해당 영상은 VTDecompressionSession을 이용해 직접 압축을 해제할 수 있습니다. 이를 통해 MetalKit을 이용해 화면에 영상을 출력할 수 있습니다. 이번에는 VTDecompression을 이용해 H.264 디코더를 구현하는 방법에 대해 살펴보겠습니다.

VTDecompression 생성

func VTDecompressionSessionCreate(
    allocator: CFAllocator?,
    formatDescription videoFormatDescription: CMVideoFormatDescription,
    decoderSpecification videoDecoderSpecification: CFDictionary?,
    imageBufferAttributes destinationImageBufferAttributes: CFDictionary?,
    outputCallback: UnsafePointer<VTDecompressionOutputCallbackRecord>?,
    decompressionSessionOut: UnsafeMutablePointer<VTDecompressionSession?>
) -> OSStatus

VTDecompressionSession은 위의 VTDcompressionSessionCreate 함수를 통해서 생성됩니다. 해당 함수의 마지막 decompressionOut에 VTDecompression의 참조값을 전달하면 성공 시 해당 값에 객체를 생성해 전달합니다.

allocator 에는 default allocator를 사용하기 위해 kCFAllocatorDefault를 설정합니다.(NULL 값을 설정해도 됩니다.) 두 번째 파라미터로 CMSampleBuffer를 생성할 때 만든 CMVideoFormatDescription을 formatDescripton에 전달합니다. 이는 해당 디코더에게 현재 버퍼의 H.264 파라미터 셋을 전달하게 됩니다. 세번째 인자로 decoderSpecification 정보를 CFDictionary 형태로 전달하는 데, 여기서는 VideoToolbox 가 선택한 디코더를 사용하기 위해서 nill 값을 전달합니다. imageBufferAttributes 는 출력 영상의 포맷을 설정하는 데 우리는 420YCbCr8BiPlanarFullRange를 선택하겠습니다. 이제 outputCallback 에 영상이 디코딩이 완료되면 호출된 콜백을 등록하면 마무리 됩니다.

 func resetDecompressionSession() {
        destroyDecompressionSession()
        guard let videoFormat = self.videoFromat else { return }
        let pixelFormat = kCVPixelFormatType_420YpCbCr8BiPlanarFullRange //NV12
        //let pixelFormat = kCVPixelFormatType_32BGRA
        let attributes: [NSString: AnyObject] = [
            kCVPixelBufferPixelFormatTypeKey : NSNumber(value: pixelFormat),
            kCVPixelBufferIOSurfacePropertiesKey: [:] as  AnyObject,
            kCVPixelBufferOpenGLESCompatibilityKey: NSNumber(booleanLiteral: true)
        ]
        
        var outputCallback = VTDecompressionOutputCallbackRecord(
            decompressionOutputCallback: callback,
            decompressionOutputRefCon: Unmanaged.passUnretained(self).toOpaque()
        )
        
        let status = VTDecompressionSessionCreate(allocator: kCFAllocatorDefault,
                                     formatDescription: videoFormat,
                                     decoderSpecification: nil,
                                     imageBufferAttributes: attributes as CFDictionary,
                                     outputCallback: &outputCallback,
                                     decompressionSessionOut: &decompressionSession)
        if status != noErr {
            logger.error("Failed to create decompression session: \(status)")
            return
        }
        
        if let session = decompressionSession {
            VTSessionSetProperty(session, key: kVTDecompressionPropertyKey_RealTime, value: kCFBooleanTrue)
        }
        
    }

위 함수는 실제 영상 스트림에서 H.264 파라미터 셋이 변경되는 경우 세션을 재 설정하기 위한 코드입니다.

생성된 세션은 네트워크로 수신된 영상을 CMSampleBuffer로 만들고 해당 CMSampleBuffer 를 디코딩 요청하는 순으로 진행되고 해당 버퍼의 디코딩이 완료되면 위에 등록한 콜백으로 YUV420CbCr 영상을 받아올 수 있습니다.

  
    public func decode(inputImage: VideoPacket) {
        if inputImage.data.isEmpty {
            //logger.error("image is empty")
            return
        }
        guard let annexBBuffer = inputImage.data.withUnsafeBytes({ $0.bindMemory(to: UInt8.self).baseAddress })  else { return }
        //VideoFormatDescription 생성 -> SPS, PPS 가 있는 경우
        if let inputFormat = createVideoForamtDescription(buffer: annexBBuffer, count: inputImage.data.count) {
            //해당 포맷이 이전 포맷과 동일하지 않다면 세션 리셋
            if !CMFormatDescriptionEqual(inputFormat, otherFormatDescription: videoFromat) {
                //logger.info("format: \(inputFormat)")
                videoFromat = inputFormat
                resetDecompressionSession()
            }
        }
        
        if videoFromat == nil {
            logger.error("missing video format. keyframe is required")
            return
        }
        let decodeParam = DecodeParams(rotation: inputImage.rotation)
        guard let session = decompressionSession else { return }
        var sampleBuffer: CMSampleBuffer? = nil
        //AnnexB 버퍼를 CMSampleBuffer 변환
        let presentaionTimestamp = Double(inputImage.presentationTimestamp) / 1000.0
        if !h264ElementraryStreamToCMSampleBuffer(buffer: inputImage.data,
                                             video_format: videoFromat!,
                                             presentationTime: presentaionTimestamp,
                                             out_sample_buffer: &sampleBuffer,
                                             memory_pool: memoryPool) {
            logger.error("fail to create H264AnnexBBufferToCMSampleBuffer")
            return
        }
        
        let frameFlags: VTDecodeFrameFlags = [
            ._EnableAsynchronousDecompression,
            ._EnableTemporalProcessing
        ]
        let frameRefcon = Unmanaged.passRetained(decodeParam)
        var infoFlags = VTDecodeInfoFlags(rawValue: 0)
        let status = VTDecompressionSessionDecodeFrame(session,
                                                       sampleBuffer: sampleBuffer!,
                                                       flags: frameFlags,
                                                       frameRefcon: frameRefcon.toOpaque(),
                                                       infoFlagsOut: &infoFlags)
        if status != noErr {
            logger.error("VTDecompressionSessionDecodeFrame fail: \(status)")
            _ = frameRefcon.takeRetainedValue()
            resetDecompressionSession()
        }
        /*
        if #available(iOS 13.0, *) {
            logger.info("\(videoFromat!.dimensions.width) x \(videoFromat!.dimensions.height)")
        }
        */
    }

위의 함수는 RTP 패킷에서 elementary stream을 추출해 생성한 VideoPacket을 받아서 CMSampleBuffer를 생성해 디코딩을 요청하는 코드인데 해당 코드를 CMSampleBuffer를 받아서 처리하도록 변경하면 됩니다. 여기서 실제 압축을 해제하는 코드는 VTDecompressionDecodeFrame입니다. 이 함수는 위에서 생성된 session과 sampleBuffer, 압축 시 사용할 비트필드를 설정합니다. frameRefcon 에는 크기 속성 값이 사용되는 데 이는 디코딩 완료 시 해당 값이 전달되도록 해 이를 사용하도록 할 수 있게 합니다. VTDecodeInfoFlag 에는 비동기 / 동기 처리를 할 수 있습니다.

VTDecompressionOutputCallback

세션을 생성 시 등록될 콜백으로 디코딩 완료도면 해당 콜백으로 압축 해제된 영상을 받을 수 있습니다

private var callback: VTDecompressionOutputCallback = {(decompressionOutputRefCon: UnsafeMutableRawPointer?, param: UnsafeMutableRawPointer?, status: OSStatus, infoFlags: VTDecodeInfoFlags, imageBuffer: CVBuffer?, presentationTimeStamp: CMTime, duration: CMTime) in
        let decoder = Unmanaged<H264VideoDecoder>.fromOpaque(decompressionOutputRefCon!).takeUnretainedValue()
        let decodeParams = Unmanaged<DecodeParams>.fromOpaque(param!).takeRetainedValue()
        
        if status != noErr {
            //decoder.logger.error("Failed to decode frame. status: \(status)")
            return
        }
        //print("decode: \(presentationTimeStamp.seconds), duration: \(duration.seconds) infoFlags: \(infoFlags)")
        decoder.didOutputForSession(status, infoFlags: infoFlags, imageBuffer: imageBuffer, presentationTimeStamp: presentationTimeStamp, duration: duration, rotation: decodeParams.rotation)
    }

콜백은 decoder.didOutputForSession 함수를 호출해 생성된 CVBuffer를 통해서 얻을 수 있습니다. 이는 실제 CVPixelBuffer로 비압축된 래스터 이미지를 가지고 있습니다. 이제 디코딩이 완료되었습니다. 디코딩 결과 우리는 압축 해제된 이미지 버퍼와 타임 스탬프 값을 이용해 해당 이미지를 MetalKit으로 화면에 출력하거나 디코딩 출력 영상을 BGRA 형식으로 변환해 MetalKit으로 영상을 합성할 수도 있습니다.

'iOS Swift' 카테고리의 다른 글

[WWDC20] AVAssetWriter fmp4writer 소스 분석 (0)	2023.06.09
[WWDC20] AVAssetWriter를 이용한 fragmented MPEG-4 Content (0)	2023.06.08
iOS 에서 H.264 Elementary Stream to MPEG4 포맷으로 변환 (0)	2023.06.07
BLE 프로토콜 구조 (0)	2023.06.07
Adding Support for Background Tag Reading (0)	2023.06.06

ABOUT ME

ObjectOrientedWorld ObjectOrientedWorld

VTDecompression 생성

VTDecompressionOutputCallback

'iOS Swift' 카테고리의 다른 글

티스토리툴바

ABOUT ME

VTDecompression 생성

VTDecompressionOutputCallback

'iOS Swift' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바