audio aac audio-processing handbrake dolby

How to determine if an audio track is a Dolby Pro Logic II mixdown

I'm trying to find out if there's a way to determine if an AAC-encoded audio track is encoded with Dolby Pro Logic II data. Is there a way of examining the file such that you can see this information? I have for example encoded a media file in Handbrake with (truncated to audio options) -E av_aac -B 320 --mixdown dpl2 and this is the audio track output that mediainfo shows:

Audio #1
ID                                       : 2
Format                                   : AAC
Format/Info                              : Advanced Audio Codec
Format profile                           : LC
Codec ID                                 : 40
Duration                                 : 2h 5mn
Bit rate mode                            : Variable
Bit rate                                 : 321 Kbps
Channel(s)                               : 2 channels
Channel positions                        : Front: L R
Sampling rate                            : 48.0 KHz
Compression mode                         : Lossy
Stream size                              : 288 MiB (3%)
Title                                    : Stereo / Stereo
Language                                 : English
Encoded date                             : UTC 2017-04-11 22:21:41
Tagged date                              : UTC 2017-04-11 22:21:41

but I can't tell if there's anything in this output that would suggest that it's encoded with DPL2 data.

Solution

tl:dr; it's probably possible; it may be easier if you're a programmer.

Because the information encoded is just a stereo analog pair, there is no guaranteed way of detecting a Dolby Pro Logic II (DPL2) signal therein, unless you specifically store your own metadata saying "this is a DPL2 file." But you can probably make a pretty good guess.

All of the old analog Dolby Surround formats, including DPL2, store surround information in two channels by inverting the phase of the surround or surrounds and then mixing them into the original left and right channels. Dolby Surround type decoders, including DPL2, attempt to recover this information by inverting the phase of one of the two channels and then looking for similarities in these signal pairs. This is either done trivially, as in Dolby Surround, or else these similarities are artificially biased to be pushed much further to the left or right, or the left or right surround, as in DPL2.

So the trick is to detect whether important data is being stored in the surround channel(s). I'll sketch out for you a method that might work, and I'll try to express it without writing code, but it's up to you to implement and refine it to your liking.

Crop the first N seconds or so of program content into a stereo file, where N is between one and thirty. Call this file Input.
Mix down the Input stereo channels to a new mono file at -3dB per channel. Call this file Center.
Split the left and right channels of Input into separate files. Call these Left and Right.
Invert the right channel. Call this file RightInvert.
Mix down the Left and RightInvert channels to a new mono file at -3dB per channel. Call this file Surround.
Determine the RMS and peak dB of the Surround file.
If the RMS or peak DB of the Surround file are below "a tolerance", stop; the original file is either mono or center-panned and hence contains no surround information. You'll have to experiment with several DPL2 and non-DPL2 sources to see what these tolerances are, but after a dozen or so files the numbers should become clear. I'm guessing around -30 dB or so.
Invert the Center file into a new file. Call this file CenterInvert.
Mix the CenterInvert file into the Surround file at 0 dB (both CenterInvert and Surround should be mono). Call this new file SurroundInvert.
Determine the RMS and peak dB of the SurroundInvert file.
If either the RMS and/or peak dB of SurroundInvert are below "a tolerance," stop; your original source contains panned left or right front information, not surround information. You'll have to experiment with several DPL2 and non-DPL2 sources to see what these tolerances are, but after a dozen or so files the numbers should become clear -- I'm guessing around -35 dB or so.
If you've gotten this far, your original Input probably contains surround information, and hence is probably a member of the Dolby Surround family of encodings.

I've written this algorithm out such that you can do each of these steps with a specific command in sox. If you want to be fancier, instead of doing the RMS/peak value step in sox, you could run an ebur128 program and check your levels in LUFS against a tolerance. If you want to be even fancier, after you create the Surround and Center files, you could filter out all frequencies higher than 7kHz and do de-emphasis on them, just like a real DPL2 decoder would.

To keep this algorithm simple, I've sketched it out entirely in the amplitude domain. The calculation of the SurroundLevel file would probably be a lot more accurately done in the frequency domain, if you know how to calculate the magnitude and angle of FFT bins and you use windows of 30 to 100 ms. But this cheapo version above should get you started.

One last caution. AAC is a modern psychoacoustic codec, which means that it likes to play games with stereo phasing and imaging to achieve its compression. So I consider it likely that the mere act of encapsulating DPL2 into an AAC stream will likely hose some of the imaging present in DPL2. To be candid, neither DPL2 nor AAC belongs anywhere in this pipeline. If you must store an analog stream originally encoded with DPL2, do it in a lossless format like WAV or FLAC, not AAC.

As of this writing, operational concepts behind Dolby Pro Logic (I) are here. These basic concepts still apply to DPL2; operational concepts for DPL2 are here.