Problems with 5.1 Audio Encodes

It has come to our attention that many of our trailers with 5.1 audio were improperly encoded where it only included the left, right, and center channels, omitting the rear-left, rear-right, and subwoofer channels. I’ve already reported the issue to the creator of mp4tools (the utility which I’ve been using to re-encode the raw files).

Here’s what the 6 channels look like if you open the mp4tools encoded file in Audacity:

bad encode

In the mean time, I’ve been learning and playing with ffmpeg and I’ve come across a set of options that seem to do the trick. I’ve created a bash alias that does the following:

ffmpeg -i "$1" -y -vcodec libx264 -crf 18.0 -preset veryslow -vf "scale=852:trunc(ow/a/2)*2" -acodec libfaac -ab 384k -ac 6 -f mp4 "${1%.*}-480p-HDTN.mp4"
ffmpeg -i "$1" -y -vcodec libx264 -crf 18.0 -preset veryslow -vf "scale=1280:trunc(ow/a/2)*2" -acodec libfaac -ab 384k -ac 6 -f mp4 "${1%.*}-720p-HDTN.mp4"
ffmpeg -i "$1" -y -vcodec libx264 -crf 18.0 -preset veryslow -vf "scale=1920:trunc(ow/a/2)*2" -acodec libfaac -ab 384k -ac 6 -f mp4 "${1%.*}-1080p-HDTN.mp4"

Here’s the explanation of the options:

  • -i "$1" means the 1st argument passed in is the input file
  • -y means always answer ‘yes’ to questions (e.g. overwriting files)
  • -vcodec libx264 means to re-encode the video using the x264 codec
  • -crf 18.0 refers to a quality of the encode (18 is visually lossless according to the FFmpeg and x264 Encoding Guide)
  • -preset veryslow means to use a slowest preset which will provide the best compression
  • -vf "scale=852:trunc(ow/a/2)*2" means to scale the video to have a width of 852px and a variable height that maintains the aspect ratio. By default, -1 is used for variable heights/widths, but because x264 requires height and width to be divisible by 2, trunc(ow/a/2)*2 is required (see bug #309)
  • -acodec libfaac means to re-encode the audio using faac
  • -ab 384k means to use a 384k audio bitrate
  • -ac 6 means there’s 6 audio channels
  • -f mp4 means that the file format should be mp4
  • "${1%.*}-480p-HDTN.mp4" means that the output file name should use the original filename, but drop the extension while appending -480p-HDTN.mp4

Here’s what the audio channels look like in Audacity when encoded properly:

good encode

I’ll try and find some time to fix the existing bad encodes. No guarantees on when they’ll all be fixed.

Update 1: Emmgunn from mp4tools has gotten back to me and apparently the culprit is the version of ffmpeg that comes with mp4tools and its inability to handle dtshdma audio streams.

Update 2: @willydearborn has brought to my attention that my new encodes are using a newer H.264 profile which won’t stream on the PS3. After looking into this, I’ve discovered that by using the the veryslow preset, it was using High@L3.0 for 480p encodes, High@L3.2 for 720p encodes, and High@L5.0 for 1080p encodes.

By using the default preset, it will use High@L3.0 for 480p encodes, High@L3.1 for 720p encodes, and High@L4.0 for 1080p encodes. The file size is a bit bigger, but change in quality is unnoticeable. Therefore I’ll be switching back to using the default preset.

If you ever want to manually set the profile and level, the options to pass in is: -profile:v PROFILE -level:v LEVEL (More details at FFmpeg and x264 Encoding Guide)

Update 3: As a rule of thumb, I usually encode 1080p @ 10Mbps, 720p @ 5Mbps, and 480p @ 2.5Mbps. But after reading up on libx264 and how the CRF value works, I’ve learnt there’s no reason to do a 2-pass encode anymore. 2-pass encodes were useful if you wanted to hit a certain bitrate or filesize, but all I really cared about was quality and I can certainly see why certain trailers would have a higher or lower bitrate.

I’ve gone with the suggested CRF value of 18 (visually lossless), but it tends to give encodes with slightly lower bitrates than I usually expect. I’ve played with CRF values between 15-18 and can’t really tell any difference besides the increased file size.